The unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter is highly praised for its robust open models and detailed statistical insights, particularly excelling in handling large volumes of programming tokens. Users appreciate its flexibility and wide integration capabilities, especially in AI agent applications. Complaints highlight issues with token costs and efficiency, with some users developing complementary tools to mitigate these concerns. Overall, pricing sentiment is generally positive due to its open-source nature, and OpenRouter maintains a strong reputation in the developer and AI community for its functionality and adaptability.
Mentions (30d)
26
Avg Rating
5.0
1 reviews
Platforms
5
Sentiment
19%
15 positive
OpenRouter is highly praised for its robust open models and detailed statistical insights, particularly excelling in handling large volumes of programming tokens. Users appreciate its flexibility and wide integration capabilities, especially in AI agent applications. Complaints highlight issues with token costs and efficiency, with some users developing complementary tools to mitigate these concerns. Overall, pricing sentiment is generally positive due to its open-source nature, and OpenRouter maintains a strong reputation in the developer and AI community for its functionality and adaptability.
Features
Use Cases
Industry
information technology & services
Employees
51
Funding Stage
Venture (Round not Specified)
Total Funding
$160.0M
openrouter rankings for programming tokens show sharp rise in open models and stagnation of US frontier models
Site has extremely detailed stats by day/week for every model. Programming is by far the largest consumer of tokens, and in fact entire token growth in 2025 was only from programming. Other categories very flat. It is also a category where you would pay for better performance. IMO, its relevant to this sub in that one of the top models, minimax, fits in under 256gb, but also that the trends are for cost effectiveness rather than "the absolute best". There is a tangent insight as to whether US datacenter frenzy is needed. kimi k2.5 being free on openclaw is a big reason for its total dominance. In week of Feb 2, minimax was only other top model to increase token usage. Opus 4.6 release seems to be extremely flat in reception. Agentic trend tends to make LLM models disposable, since better ones are released every week, and the agents/platforms that can switch on the fly while keeping context, is something you can invest in improving while not being obsolete next month.
View originalPricing found: $10
g2
What do you like best about OpenRouter?Unified API Access: The ability to call a multitude of LLMs from different providers (like OpenAI, Anthropic, Google, and various open-source models) through a single, consistent API endpoint is a game-changer. This drastically reduces the integration overhead and code maintenance associated with managing individual provider APIs and SDKs. Simplified Cost Management & Tracking: OpenRouter provides a clear, consolidated view of our LLM usage costs across all models. The pay-as-you-go pricing, with standardized per-token rates for many models, makes budget forecasting and expense tracking much more straightforward than juggling multiple billing dashboards. Rapid Prototyping and Model Benchmarking: The platform is excellent for quickly testing and comparing the performance of different models for specific tasks. Switching between, for instance, a Llama model and a GPT variant for a text generation task requires minimal code changes Developer-Focused Features: Tools like the model explorer, the ability to see real-time model rankings based on community usage or specific metrics, and features like request fallbacks or automatic retries demonstrate a clear understanding of developer workflows and pain points in LLM Operations (LLMOps). Review collected by and hosted on G2.com.What do you dislike about OpenRouter?While the benefits are substantial, one aspect that I've noted is the potential for slightly increased latency compared to direct API calls to the model providers. This is somewhat expected given the nature of an aggregation service acting as an intermediary. For extremely latency-sensitive applications, this might require careful benchmarking, though for most of our use cases, the difference has been marginal and outweighed by the convenience and flexibility offered. Review collected by and hosted on G2.com.
I built a live ranking of every AI agent and foundation model (open source)
I built AgentTape because none of the existing model leaderboards quite cover all the things that I was interested in: benchmark performance is one part, but so is who's actually using a model, who's talking about it, and how it compared on cost and speed. It pulls hourly data from GitHub, Hugging Face, OpenRouter, MCP registries, npm, PyPI, arXiv, Hacker News, and more - to score and compare each public AI agent and foundation model. I'm still tweaking the scoring methodology (it's early days), so I'd love to hear your thoughts, if it's helpful, or anything you think I've got wrong! submitted by /u/Celestialien [link] [comments]
View original$18 to $4 on the same agent run after i stopped asking opus to rename css variables
I've been running an agent loop that refactors my static site. CSS variable renames, YAML config updates, running a linter through MCP. Really glamorous stuff for a blog that gets 40 visitors a month, most of whom are me refreshing to check if Vercel actually deployed. Every single step was going to Opus 4.7 because setting up routing felt like work and I am, apparently, the kind of person who'd rather burn $18 than spend 20 minutes writing an if statement. So I finally wrote the if statement. Hard subtasks still go to Opus: component architecture, debugging code I wrote at 2am and have zero memory of writing, anything where the model needs to hold a complex plan across a long conversation. Opus is genuinely unmatched at that kind of sustained reasoning. I tried routing a tricky auth middleware bug to a cheaper model once and got back something that looked perfectly plausible but silently broke session handling in a way that cost me an hour to trace. Lesson learned permanently. The routine stuff (lint, rename, config edits, tool orchestration) goes to cheap models. I landed on DeepSeek V4 Pro for general coding chores and Tencent Hunyuan Hy3 preview for anything with heavy tool calling. As of late April it was ranked number one on OpenRouter by tool call volume, and in my MCP loops it almost never botches a function call when the schema is clean. The listed rate on Tencent Cloud is around $0.18 per million input tokens and $0.59 per million output, so roughly 28x cheaper than Opus 4.7 on input. Same 212 step refactor, now with routing: 178 steps to the cheap tier, 34 to Opus. $18 became roughly $4. I couldn't spot a difference on the routine changes. My 40 monthly visitors certainly can't. I've since started doing stuff I used to skip entirely, like having the agent write and run tests for every CSS change or regenerating all my Open Graph images, because at a fraction of a cent per tool call there's just no reason not to. They do mess up in specific and annoying ways though. The tool calling model hallucinates parameters when my schemas get sloppy (honestly fair, the schemas were bad). DeepSeek V4 Pro occasionally writes code that's syntactically perfect but does the precise opposite of what you asked, in a way that survives a quick skim. And neither can touch Opus when you need it to reason through three layers of why your auth flow is silently eating a cookie. My routing logic boils down to one question: how expensive is a wrong answer to catch? Bad lint fix costs a 2 second git revert. Bad architecture call costs the whole afternoon. submitted by /u/After-Condition4007 [link] [comments]
View originalcdesktop — open-source Claude Code Desktop alternative, runs locally via npx, supports any provider
I built cdesktop with Claude Code — it's an open-source alternative to Anthropic's Claude Code Desktop, running locally on your machine via npx cdesktop. Free, Apache 2.0. It mirrors the Code tab of Anthropic's desktop app — see the video — and supports 5 agents in one UI. Claude Code Desktop does not support third party models, cdesktop does. Features: 5 coding agents in one UI: Claude Code, Codex, Gemini CLI, OpenCode, Hermes. Switch per session. Full third-party support — OpenRouter, DeepSeek, Kimi, GLM, custom ANTHROPIC_BASE_URL — any provider, any model. 20+ presets baked in. Agent teams — spawn teammates that share your workspace; mix agents and models per teammate; lead delegates via npx cdesktop team spawn. Routines — scheduled recurring agent runs (hourly/daily/weekdays/weekly). Side-by-side sessions — split workspace into up to 4 cells, drag any session between them. Optional Git worktrees per session, or work in-place. Non-Git directories work too. Diff review with inline comments routed back to the agent. 7 UI languages: English, Simplified Chinese, Traditional Chinese, Spanish, French, Japanese, Korean. Responsive UI — usable from a phone. Repo: https://github.com/cdesktop-ai/cdesktop How Claude Code helped build it: started from a fork of vibe-kanban; Claude Code (opus) rewrote the UI around a Claude-Code-Desktop-style session model and drafted most of the new Rust + React code. It's beta — expect rough edges. Feedback welcome, especially on Claude Code workflows where it falls short of the official app. submitted by /u/DomLiu [link] [comments]
View originalTools: Is This a Technical Victory, or a Price War Victory?
If you only follow discussions on social media, you might think AI coding is still dominated by Claude, GPT, and Gemini. But Kilo Code’s usage data on OpenRouter paints a somewhat counterintuitive picture: over the past 30 days, the top three most-used models on Kilo Code were Step 3.5 Flash, MiniMax M2.5, and Ling-2.6-1T. Together, they accounted for roughly 3.15T tokens, or about 58% of Kilo Code’s total token usage over the same period. In other words, in this real-world AI coding agent usage scenario, Chinese models are no longer just backup options. They have become a major source of token consumption. Kilo Code’s OpenRouter data does not necessarily prove that Chinese models have fully surpassed Claude or GPT. But it does show at least one thing: in high-frequency, high-token, highly automated AI coding agent workflows, Chinese models have already entered the core of real production usage. Why is this happening? Is it because Chinese models are cheaper, offer longer context windows, and are better suited for workloads that consume large amounts of tokens? submitted by /u/babyb01 [link] [comments]
View originalLLM-Rosetta — format conversion library across LLM API standards, doubles as a proxy
This started because we had a proprietary internal LLM API that spoke none of the standard formats. Built an internal conversion layer to bridge it, maintained that for over a year. As colleagues started adopting more and more coding tools — Claude Code, opencode, Codex, VS Code plugins, Goose, and whatever came out that week — each with its own API format expectations, maintaining separate adapters for each became the actual problem. That's what pushed the internal conversion layer into a proper generalized design, and llm-rosetta is the result. It's a Python library that converts between LLM API formats — OpenAI Chat, Responses/Open Responses, Anthropic, and Google GenAI. The idea is you convert through a shared IR so you don't end up writing N² adapters. The key difference from LiteLLM: LiteLLM is a unified calling layer that takes OpenAI-style input and transforms it into provider-native requests — one direction. llm-rosetta uses a hub-and-spoke IR, so each provider only needs one converter, and you get any-to-any conversion for free. Anthropic → Google, OpenAI Chat → Anthropic, whatever direction you need. Use it as a library — pip install and call convert() directly, no server needed. Or run the gateway if you want a proxy that handles the format translation for you. Zero required runtime dependencies either way. The HTTP server, client, and persistence layer are vendored from zerodep (https://github.com/Oaklight/zerodep), another project of mine — stdlib-only single-file modules, not someone else's library repackaged. The gateway ships with a Docker image if you'd rather not deal with Python env setup. You can also deploy it on HuggingFace Spaces or anything similar — admin panel, dashboard, request log, config management all included. Screenshots: https://llm-rosetta.readthedocs.io/en/latest/gateway/admin-panel/ We've been running it in production for about 5 months as the conversion layer for an internal multi-model access platform — needed to support various API standards and coding tool integrations before the upstream APIs were fully standardized. The Responses converter passes all 6 official Open Responses compliance tests (schema + semantic) from the spec repo. So if you're running Ollama, vLLM, or LM Studio with Responses endpoints, it should just work as one side of the conversion. There's a shim layer for provider-specific quirks — built-in shims for OpenRouter, DeepSeek, Qwen, xAI, Volcengine, etc. Converters stay generic per API standard, shims handle the edge cases declaratively. 24 cross-provider examples in the repo covering all provider pairs, SDK + REST, streaming, tool calls, image inputs, multi-turn with provider switching mid-conversation. GitHub: https://github.com/Oaklight/llm-rosetta Docs: https://llm-rosetta.readthedocs.io arXiv: https://arxiv.org/abs/2604.09360 Gateway screenshot: https://preview.redd.it/qzzjr2dcdw1h1.png?width=949&format=png&auto=webp&s=bce4293aae81059f794909fc37f85071cee34378 submitted by /u/Oaklight_dp [link] [comments]
View originalI built a Claude Code plugin so Claude remembers what I shipped
https://preview.redd.it/jnwg9n3i1t1h1.png?width=1440&format=png&auto=webp&s=827236ef5ca2e1070c4abd8e06455d41672749bf Every time I started a new Claude chat, I had to re-explain what I'd been working on. The previous chat was gone with every refinement I'd made to my own context. So I built LockedIn. A Claude Code plugin that captures your experience and work as you do it, so Claude remembers it next session. 1 router skill + 6 sub skills, designed around harness engineering principles. You can say things in the Claude Code session like save this commit as a project highlight meeting just wrapped, log it absorb this writeup It stores everything as structured markdown under ~/Documents/LockedIn/. (editable!) The point is accumulation. Different sources, one place. Over time LockedIn notices overlaps and asks you one question at a time how to reconcile. The vault gets richer. The outputs get more specific. Claude already has 'Projects'. But a few things that are different. Markdown on your filesystem instead of Anthropic's database. It's more like Obsidian. Edit it, version with git, carry it to any tool. Typed ontology with 15 entity types like person, project, achievement, decision, instead of unstructured uploads. The skill grounds each claim in a specific entity. Reconciliation. When new input overlaps existing knowledge, LockedIn asks you to merge or keep separate. Projects just accumulates context. Free and open source on GitHub. github.com/daypunk/LockedIn Or install directly in Claude Code. /plugin marketplace add daypunk/LockedIn /plugin install lockedin@lockedin /lockedin:setup Enjoy! Feedback welcome 😉 submitted by /u/Firm-Path7092 [link] [comments]
View originalBootstrapped founders: how are you managing Claude Code costs?
I’m currently building an AI startup solo and Claude Code has genuinely improved my development speed compared to most other tools I’ve tried. The challenge is that subscription/API costs add up quickly while bootstrapping. I wanted to ask other founders and developers here: Are you mainly using Claude subscriptions or OpenRouter/API? Which models/workflows give the best cost vs productivity ratio? Are there any startup programs, credits, or affordable setups you’d recommend? Right now I’m experimenting with mixing Claude, DeepSeek, and cheaper routing providers to keep costs manageable. Would love to hear how others are handling this. submitted by /u/vishalvanam [link] [comments]
View originalI built a desktop app that routes Claude Code to any LLM: DeepSeek, Ollama, Copilot, OpenRouter, and 7 more
Claude Code is the best AI coding tool I've used. But being locked to one provider, one pricing model, and one model catalog always bothered me. So I built CCPG, a desktop app (Mac/Windows/Linux) that proxies Claude Code to whatever provider you want. Install it, configure in the UI, launch with ccpg --DeepSeek. No YAML. No pip install. No config files. It also shows you every prompt Claude Code sends in the background, including the silent housekeeping calls you never see, with token count and latency per request. MIT, local-only, forever free. https://github.com/danielalves96/claude-code-provider-gateway submitted by /u/Livid_Individual3656 [link] [comments]
View originalMax20 user: anyone running Opus 4.7 as orchestrator + DeepSeek V4 as the worker via OpenRouter?
I'm on the Max20 plan, thinking about a setup before I sink time into it. Want to hear from anyone actually running it, not theorycraft. The idea: Opus 4.7 in Claude Code as the orchestrator. It plans, breaks down tasks, reviews code quality, catches mistakes. The actual implementation, the bulk token spend, gets delegated to DeepSeek V4 Pro through OpenRouter. DeepSeek lands credibly close to Opus 4.7 on agentic coding benchmarks at a fraction of the output-token cost, so the bet is: keep Opus for the judgment-heavy parts, don't burn it on routine implementation. I'm not expecting huge savings. Realistically maybe an extra 30% (guessing here) effective Opus headroom if delegation works cleanly, and even less margin now that the limits situation has loosened a bit. So part of the question is genuinely whether 30% is worth the integration friction at all, or whether it's a fun idea that doesn't pay for itself. Pre-empting the obvious responses, because I've already thought about these: "Just use Sonnet for the cheap parts." The easy answer. But I'm specifically curious whether an external model's cost delta beats the friction, and whether anyone's actually measured it. "Max20 already gives generous Opus limits, why bother." Fair. But I'd rather use Opus where it earns its keep and not think about rationing for the rest. It's about allocation, not desperation. "The quality gap means Opus spends all its effort fixing DeepSeek's output." This is the actual question. DeepSeek reportedly drifts more than Opus on long agentic loops with many sequential tool calls. So does a tight review loop close that gap, or does it eat the 30%? That's what I want real data on. "This fights how Claude Code is built." Probably. Claude Code's subagents run on Claude models, so I assume this needs a different tool (Aider, Cline, Kilo) or a custom routing layer. If the real answer is "don't do this in Claude Code at all," tell me what you'd use instead. I know the single-model answer. I'm after whether the split specifically works in practice. submitted by /u/theargen [link] [comments]
View original5 secret Claude skills nobody is talking about
The File Reading Skill Claude can't always read your uploads intelligently by default. This skill acts as a smart router — PDF, DOCX, XLSX, CSV, JSON, images, archives — and tells Claude exactly how much to read and how to handle each format. Upload a 40-page contract. Get a precise, structured summary. Every time. No more Claude skimming past the important parts or misreading table data. The difference? Instead of guessing how to process your file, Claude follows a tested protocol built for that exact file type. The Frontend Design Skill Stop getting generic, boring UI from Claude. This skill loads it with design tokens, component patterns, layout rules, and production-grade aesthetics before it writes a single line of code. The output actually looks like something a senior designer shipped — not a ChatGPT tutorial from 2023. Use it for landing pages, dashboards, React components, or full web apps. The visual quality gap between Claude with and without this skill is not subtle. The Skill Creator Skill Yes. A skill that builds skills. You describe a workflow you keep repeating. Claude writes the full SKILL.md file with instructions, triggers, and edge case handling. You install it. Claude gets smarter. This is the compounding play. Every skill you build saves you prompting time forever. People running this in their workflow are essentially programming Claude to think like them — without writing a single line of actual code. The PPTX Skill Claude builds full PowerPoint decks — slides, layouts, speaker notes, branded structure — and exports actual .pptx files. Not HTML. Not markdown. Files you open directly in PowerPoint or present to a client. I used this to build a full client proposal deck in under 10 minutes. The skill handles things like slide hierarchy, content density, and formatting consistency that Claude normally fumbles without guidance. The Instagram Reader Skill Paste an Instagram link. Claude extracts the caption, carousel copy, slide text, and thread content. Repurpose competitor content, study what's working in your niche, or bulk-extract your own posts for a content audit — without screenshot gymnastics or manual transcription. For anyone running a content operation at scale, this one alone saves hours per week. submitted by /u/IAmAzharAhmed [link] [comments]
View originalI offloaded bulk file reading from Claude Code to a cheaper model for a week. Here are the numbers.
Hey r/ClaudeAI — I use Claude Code a lot, and I noticed I was wasting a surprising amount of my usage limit on stuff that was basically just reading. Big files, long diffs, Jira/Linear tickets with comment history, docs pages, repo spelunking. Useful context, but not always something I need Claude to consume raw. So I built a small open-source sidecar tool called Triss. The rule is simple: Cheap model reads the bulky stuff. Claude gets the summary and does the thinking/editing. This is not a Claude replacement. I still keep architecture, debugging, careful edits, and final judgment with Claude. Triss is for the boring high-token intake step. One week of actual usage This is my real DeepSeek usage from May 6–13, 2026: Pro Flash Total Requests 143 66 209 Input tokens 3.74M 2.10M 5.84M Output tokens 833K 156K 990K Cost (USD) $1.88 $0.34 $2.22 That came out to about 1 cent per request on real coding work, not a benchmark. The important part is not only the DeepSeek bill. It is that Claude never had to carry those raw 5.8M input tokens in its own context. A ticket or file bundle that might have eaten tens of thousands of Claude tokens becomes a short summary, and the main conversation stays lighter. What I delegate The pattern that stuck for me: A single file over ~400 lines. 3+ files where I only need a structured summary. Jira/Linear/GitHub issues with comments and metadata. Web pages or docs pages. First-pass diff review. Commit message generation from a staged diff. What I do not delegate: Architecture decisions. Hard debugging. Precise edits. Small questions where the delegation overhead is larger than the task. What the tool does Triss can run as a CLI or as an MCP server, so Claude Code / Claude Desktop / Codex can call it as a native tool. The commands I use most: bash triss ask --paths src/foo.ts src/bar.ts --question "Summarize the control flow and risks" triss fetch https://example.com/docs --question "Extract the setup steps" triss review triss commit-msg triss usage --by-project It also has tracker integrations for Jira, Confluence, Linear, GitHub, and GitLab, because ticket/API payloads were one of the biggest hidden context sinks in my workflow. The default setup is DeepSeek, but it works with OpenAI-compatible endpoints too: DeepSeek, Kimi, Ollama, OpenRouter, etc. Credit where it is due The original idea came from Kunal Bhardwaj's write-up: https://medium.com/@kunalbhardwaj598/i-was-burning-through-claude-codes-weekly-limit-in-3-days-here-s-how-i-fixed-it-0344c555abda and his proof of concept: https://github.com/imkunal007219/claude-coworker-model My version is basically that pattern made more specific to my own workflow: MCP tools, tracker integrations, review/commit helpers, usage logging, and path sandboxing for agent calls. Links GitHub: https://github.com/ayleen/triss-coworker Install: npm install -g triss-coworker Setup: triss config wizard Open-source, MIT, unaffiliated with Anthropic. I do not get paid if you install it. I mostly wanted to share the numbers because "use a cheap model for bulk reading" sounded obvious to me in theory, but it only became habit once it was wired into Claude as a low-friction tool. Happy to answer any questions. submitted by /u/Proper-Mousse7182 [link] [comments]
View originalopenai/gpt-5.5-pro API In=$30.00 Out=$180.00
Is this an openrouter bug? https://preview.redd.it/sz826138ul0h1.png?width=879&format=png&auto=webp&s=066f38f4a6d5a8eeee142e7a8a356d8bc511c6f1 submitted by /u/ArtdesignImagination [link] [comments]
View original**Built my own model-agnostic AI workstation because I was tired of platform lock-in — free, BYOAK, open source**
Tired of rebuilding context every time I switched models. Tired of my personas living inside OpenAI's walled garden. Built something to fix it. **Architect's Domain**, a workstation UI that sits on top of any provider. Core features: - **Workspace system**, persistent environments with pinned context, imported files, notes. Think Claude Projects but provider-agnostic - **Manual memory curation**, fragments surface during chat, you approve or reject what gets remembered. No silent auto-memory - **Character/persona system via file injection**, load .txt files as system context. Works with character cards, lorebooks, personality files, anything - **Provider switching**, OpenRouter, Venice.ai, DeepSeek. Swap models without losing your setup - **BYOAK**, your keys, your data, runs fully static No React, no framework bloat. Vanilla JS + CSS + HTML. Deployable anywhere. I use it daily for prompt engineering and RP character testing across different frontier models. The workspace + memory combo is what makes it actually useful vs just another chat wrapper. Open source: https://github.com/HactoriXD/architects-domainv1 Feedback welcome! especially from people who've tried similar setups. submitted by /u/EnricoFiora [link] [comments]
View originalshaved $40 off my claude code bill last month by sending planning steps to a cheaper model
got tired of hitting pro limits by day 18 of the cycle so i started splitting where the tokens go. the planning steps eat 80% of token budget on multi-file refactors, and most of that planning is fine on a cheaper model. now the upfront 'figure out what to change' work hits haiku 3.5 via a 30-line wrapper, only the actual edits and decision-making land on opus or sonnet. setup took about 2 hrs the first time including figuring out which steps were worth handing off. last cycle ended with budget left over for the first time in 4 months. saved roughly $40 in overage fees plus didnt lose the usual 2-day wait for the reset window. caveat: haiku's planning quality is noticeably worse on architecture decisions. for refactor-and-test workflows where opus picks up the real decision anyway it's fine. for greenfield 'what should this app even be' i still let opus plan from scratch. probably obvious to anyone who's looked at the openrouter model pricing tables but the claude code subagent docs are kinda thin on this exact pattern so figured worth dropping. submitted by /u/AccomplishedFix3476 [link] [comments]
View originalMahoraga - Stop paying Anthropic and OpenAI so much
Are you sick of paying a million credits per month?!?!? I'm joking, i aint that enthusiastic. But really, this saves me a ton of credits by routing simple tasks to local agents. Clone the repo, fork the repo, star the repo, whatever you want. github.com/pockanoodles/Mahoraga This is Mahoraga, an open-source orchestrator that routes tasks across local and cloud AI agents using a contextual bandit (LinUCB) that learns from every decision. Context (skip): I only started integrating AI into my workflows in late 2025, so I came on the scene broke with no credits. This left me with local models. However, many students and employees also receive credits from their institution to work with. (I got claude yippee) I wanted to be able to flawlessly route between models when credits ran out, which made me build an orchestrator. I used to use claude more as a chatbot/complete workflow engine, which made it difficult to use local models due to the context window, reasoning, etc. Opus 4.5 running open-source "superpowers" ate my usage every month. Now I realize that wasn't an effective way to use claude, or AI in general. I was using claude for both heavy planning/brainstorming and minor tasks. How about tasks specifically for code generation? Code generation is a relatively constrained task, with correct answers and short outputs. Surely local models can compete in tasks that don't need cloud? So I switched Mahoraga to an adaptable router. I ran 192 tasks across 8 agents (4 local Ollama models, 4 cloud CLIs) on a 16GB MacBook Pro, forcing round-robin so every agent got every prompt. Quality is scored by a 4-layer heuristic system (novelty ratio, structural checks, embedding similarity, length ratio). Zero API cost for evaluation, and no LLM-as-judge. Qwen3 4B in nothink mode dominates code and refactor at 33.8 t/s and 6.1s average latency. Cloud agents cluster around 0.650 on code. The local model isn't just cheaper; it's measurably better for this task class. Other findings: LFM2 hits 77.1 t/s but trades ~5 quality points vs Qwen3 4B DeepSeek-R1 averages 123.5s per task on 16GB. The reasoning overhead makes it unusable as a default Security scores are flat at 0.650 across all agents due to my human error—the scorer doesn't capture security-specific signals well. The bandit (LinUCB) is the only routing strategy with sublinear regret (β=0.659) across a 200-task simulation—it actually converges The routing works in two stages: the keyword classifier puts the task in a capability bucket (code, plan, research, etc.), and then the bandit picks the best agent within that bucket. 9-dimensional context vector, persistent state across sessions, warm-start from the compatibility matrix. All local inference, all free. Cloud escalation exists but only fires on retry. Why pay for cloud when a local model handles it better? Looking for any feedback, any input. Feel free to be critical: I appreciate everyone who interacts on this subreddit. I will continue to work on this in the future. Again, this is open source and free. (Mods, please. i'm not making any money off this. submitted by /u/Own-Professional3092 [link] [comments]
View originalYes, OpenRouter offers a free tier. Pricing found: $10
OpenRouter has an average rating of 5.0 out of 5 stars based on 1 reviews from G2, Capterra, and TrustRadius.
Key features include: Product, Company, Developer, Connect.
OpenRouter is commonly used for: AI model comparison, Cost management for AI services, Token consumption tracking, Model discovery for developers, Routing AI requests with fallbacks, Integration of AI agents.
OpenRouter integrates with: OpenAI, AWS Lambda, Google Cloud, Microsoft Azure, Slack, GitHub, Zapier, Twilio, Jira, Trello.
Based on user reviews and social mentions, the most common pain points are: token usage, token cost, API costs, claude code cost.
Guillermo Rauch
CEO at Vercel
2 mentions

The OpenRouter Show
Jan 28, 2026
Based on 78 social mentions analyzed, 19% of sentiment is positive, 81% neutral, and 0% negative.