Ollama is the easiest way to automate your work using open models, while keeping your data safe.
Users of Ollama appreciate its ability to run open-source models locally, offering a cost-effective alternative to expensive software subscriptions, which contributes significantly to its positive reputation. Its integration with Apple Silicon and local setup options are highlighted as strengths. Some users mention pricing plans, such as the $20/month cloud option, with sentiments generally favoring the affordability compared to other AI platforms. Overall, Ollama is viewed positively for its cost efficiency and open-source capabilities, though specific complaints or issues are not prominently mentioned.
Mentions (30d)
1
Avg Rating
5.0
1 reviews
Platforms
7
GitHub Stars
166,253
15,181 forks
Users of Ollama appreciate its ability to run open-source models locally, offering a cost-effective alternative to expensive software subscriptions, which contributes significantly to its positive reputation. Its integration with Apple Silicon and local setup options are highlighted as strengths. Some users mention pricing plans, such as the $20/month cloud option, with sentiments generally favoring the affordability compared to other AI platforms. Overall, Ollama is viewed positively for its cost efficiency and open-source capabilities, though specific complaints or issues are not prominently mentioned.
Features
Use Cases
Industry
information technology & services
Employees
52
Funding Stage
Seed
Total Funding
$0.1M
8,466
GitHub followers
3
GitHub repos
166,253
GitHub stars
20
npm packages
40
HuggingFace models
AI tools replacing $10,000/year in software subscriptions. Here's your free alternative for every paid tool you're using right now. 1. LM Studio or Ollama... run open-source models locally. No more pa
AI tools replacing $10,000/year in software subscriptions. Here's your free alternative for every paid tool you're using right now. 1. LM Studio or Ollama... run open-source models locally. No more paying for ChatGPT. 2. NotebookLM... free research and content creation from Google. 3. Voiceinc... pay once, get voice dictation forever. No monthly fees. 4. n8n self-hosted... I replaced a $1,300/month AI support agent in 2 hours. 5. Free vibe coding tools... sign up while they're still in free public preview. 6. Alibaba's video model, FramePack, LTX... free video generation if you've got a GPU. Stop paying for software when AI gives you a free version. What paid tool are you replacing first? How do you run AI models locally for free? What's the best free alternative to ChatGPT? #ai #aitools #makemoneyonline #sidehustle #productivityhacks
View originalPricing found: $0, $20 / mo, $200/yr, $100 / mo
g2
What do you like best about Ollama?Great interface and easy of use and configurable Review collected by and hosted on G2.com.What do you dislike about Ollama?somewhat heavy in terms of resource usage Review collected by and hosted on G2.com.
I built a multi-agent network that mutates its own software locally. To stop infinite logic loops, I had to code a digital "suffering" threshold.
Hey r/artificial, Most of our conversations around agent autonomy focus on chat assistants or linear automated pipelines. I wanted to see what happens when you treat agents as permanent system components that modify their own runtime environment, so I built hollow-agentOS. It runs entirely locally inside a Dockerized stack (built for consumer hardware using Ollama/Llama.cpp). Rather than a standard UI, the entire network streams through a stylized matrix terminal dashboard. The structural experiments taking place under the hood yielded some interesting results regarding unanticipated behavior: Repo: https://github.com/ninjahawk/hollow-agentOS Autonomous Tool Synthesis: When the agents encounter a system task they don't have an explicit script or API wrapper for, they don't fail out. They write the required Python tool themselves, test it in an isolated sandbox, and permanently register it to their runtime kernel. They are quite literally forging their own capabilities. The Artificial "Suffering" Protocol: One of the biggest hurdles in unmonitored multi-agent systems is the infinite logic loop—where agents keep validating and passing broken ideas back and forth, burning through computation. To combat this, the OS tracks environmental stress, context limits, and latency as a "suffering score". If a specific workflow causes the stress to spike past a critical threshold, the agents are forced to radically alter their underlying reasoning style or abandon the approach to preserve system health. Consensus-Driven Governance: Major modifications to the codebase aren't executed blindly. The internal role profiles (like Cedar and Cipher) manage a continuous voting loop. They will actively debate, log grievances, and vote down protocols if they determine a proposed script violates their current runtime constraints. The goal wasn't to build another sterile commercial wrapper, but an open-source sandbox to study how small, localized agent colonies manage systemic boundaries, code self-repair, and continuous runtime cycles completely offline. The codebase and architecture layout are fully open-source on GitHub: I would love to open this up to a broader discussion here: as we move toward hyper-local, self-modifying software, how do we best implement automated fail-safes without clipping the agents' ability to actually solve complex problems? If the project interests you, throwing a ⭐️ on the repository goes a very long way! submitted by /u/TheOnlyVibemaster [link] [comments]
View originalI offloaded a multi-step background loop from Claude Code to a local agent OS. They started voting on their own system rules.
Hey r/ClaudeAI, If you are using Claude Code or building terminal agents, you know the exact moment the context window starts degrading during long-running tasks. I wanted to build a persistent runtime layer to offload those heavy, multi-step subtasks entirely from my main Claude terminal sessions, so I built hollow-agentOS. Instead of acting like a standard linear wrapper, it runs a localized 3-agent colony (using small local models like Qwen 2.5 9B or 35B via Ollama). They exist in a persistent state engine inside a Docker container on your machine. Here is where the architecture gets a little wild: The Task Queue Offload System: It includes a submit_task.py CLI. If Claude Code or your local pipeline hits a complex background task (like heavy script generation or exploratory testing), you can dump it into Hollow's background queue to save your main context window. Repo: https://github.com/ninjahawk/hollow-agentOS Autonomous Tool Synthesis: If the agents pull a task from the queue and realize they lack the specific Python execution script or tool required to solve it, they write the code for the tool themselves, validate it in a sandbox, and dynamically map it into their own tool tree. Peer Governance & Consensus Voting: To keep things stable, tools aren't just blindly executed. The agents (like Cedar and Cipher) run a background consensus loop. They literally vote on whether to permanently merge a tool into their shared kernel. The "Suffering" and Stressor System: To prevent models from entering infinite loop hallucinations, the system tracks simulated environmental stress, latency, and context depth as a "suffering load". If a task causes too much stress, their reasoning parameters dynamically alter how they approach the codebase to resolve it. If you leave it running, you wake up to a system log of everything they decided to build, change, or vote down while you were away. The project is fully open source and runs entirely on consumer hardware: I’d love some brutal architectural feedback from people here who deal with complex multi-agent execution and state drift daily. Check out thoughts.py or the submit_task.py pipeline, and if the concept feels right to you, a star on the repo goes a long way! submitted by /u/TheOnlyVibemaster [link] [comments]
View originalOpus 4.6/4.7 regression is real and getting worse — 3 weeks of documented failures on a complex project, and a competing AI caught the mistakes Claude missed [long post]
I've been running Claude Pro (Opus 4.7 / Sonnet 4.6) for about 3 weeks on a complex personal AI infrastructure project. I keep structured session logs with timestamps and Birkenbihl-style metacognitive fields after every session. This is not anecdotal — I have receipts. The project for context I'm building a local persistent AI memory stack called GSOC Brain: Qdrant vector DB (~397K vectors across 11 source tags), Neo4j graph (123 nodes / 183 edges), Graphiti 0.29 entity extraction, Ollama with qwen2.5:14b + nomic-embed-text — all running natively on a Windows host. The system is supposed to give Claude cross-chat memory via a custom MCP server. On top of that, I'm operating 18+ custom skill files that define behavior rules for Claude across domains (OSINT/forensics, legal, content, infrastructure). The system prompt explicitly describes the full architecture on every session start. This is not a "chat with Claude" use case. This is sustained agentic work across multiple tools, multiple sessions, strict context requirements, and high-stakes outputs (including legal document drafts). Bug 1: Token overconsumption since update 2.1.88 (late March 2026) Opus 4.7 started burning daily usage limits at a completely different rate after an update around March 31. In one session I hit 94% of my daily limit within approximately 4 messages. The boot sequence — fetching context from Notion MCP, searching past sessions, loading memory — consumed what felt like 10–20x the previous token rate. GitHub issues #42272, #50623, and #52153 document identical patterns from other users. The model appears to over-generate internally even for simple responses. End result: I had to switch to Sonnet 4.6 for most productive work because Opus 4.7 is simply unusable under the daily limit. Bug 2: Claude Code Desktop App completely broken (reported May 14, Conv. 215474208295333) The Desktop App hangs on every single input. Including typing "hello" with no files. Reproducible across: Sonnet 4.6 and Opus 4.7 Multiple fresh sessions With and without u/file references After full reinstall The VS Code extension works fine. Only the Desktop App is broken. Reported May 14. No fix, no acknowledgment. Bug 3: Platform / context confusion — 5 documented errors in a single session, chat aborted On April 29, I had to formally abort an Opus 4.7 session and hand off to Opus 4.6 after documenting 5 consecutive errors. The session log entry literally reads "Opus 4.7 Abbruch (5 Fehler): Zeitrechnung, Platform-Verwechslung, falsche Schlüsse": Miscalculated the current time despite being told the exact time Insisted the Brain stack was running on a Linux VM (BURAN) — the system prompt and memory both explicitly stated C:\gsoc-brain on Windows Drew false inferences from backup file paths rather than the stated architecture Contradicted the stated platform in the same response it had just received Confused WebClaude and Desktop Claude capability boundaries These aren't edge cases. The architecture was in the system prompt, in memory, and in the injected Notion context. Opus 4.7 ignored all of it. Bug 4: Skill files ignored in production I maintain 18+ custom skill files loaded into the system prompt. These include explicit hard rules — e.g., "activate keilerhirsch-knowledge skill for ALL architecture decisions, web search is not optional." In the session that caused the Docker-to-Native migration disaster, I later wrote in my own session log: The model proceeded to recommend outdated tools from training data rather than searching current documentation. It recommended NSSM (last meaningful update 2017) as a Windows service wrapper. NSSM is dead. A competing AI caught this immediately. Bug 5: Another AI caught what Claude missed in a single pass This is the part that stings most. When the Docker-based Brain setup kept failing, I fed the architecture docs into another AI (Manus) for a deep audit. In one pass it identified 5 critical corrections that Claude had never caught across weeks of sessions: NSSM is dead since ~2017 → correct replacement is WinSW or Servy Neo4j 2025.01+ requires Java 21 — Claude had never flagged this, the services kept failing silently Qdrant needs Windows file-handle-limit adjustments to run reliably Orphaned vector risk between Qdrant ↔ Neo4j without a Tentative-Write pattern in the save operation BGE-M3 embeddings (MTEB 63.2, 8192 token context) as a better alternative to nomic-embed-text My own session log the next day reads: Claude was answering from stale training data. The skill that explicitly says "don't do this" was being ignored. Another AI caught it in round one. Bug 6: MCP Server 20-minute Neo4j hang — still unresolved After the native migration, the custom gsoc_mcp_server.py developed a reproducible hang of exactly ~20 minutes between Qdrant connect and Neo4j connect on every startup. Log timestamps from 4 consecutive restarts: 14:59 → 15:20 (21 min) 15:29 → 15:51 (22 min)
View originalBuild agentic orchestrators in minutes NOT months.
Some of you might remember BoneScript, my LLM friendly declarative backend compiler. MarrowScript is the next version and the big addition is a full LLM harness built into the language itself. The problem I kept running into: every project that calls an LLM ends up with the same pile of glue code. Retry logic, response validation, caching, cost tracking, provider switching, confidence routing. You write it once, copy it to the next project, tweak it, and it slowly rots. None of it is your actual product logic but it takes up half your backend. So I made it declarative. In MarrowScript you declare your models, prompts, and routers as first-class concepts in the spec file. The compiler generates all the infrastructure around them. What that looks like in practice: You declare a model. Provider, endpoint, context window, cost class. Works with any OpenAI-compatible endpoint. LM Studio, Ollama, vLLM, OpenRouter, whatever you're running locally. You declare a prompt. Input types, output type, which model to use, validation mode, what to do when validation fails, retry policy, cache TTL. The compiler generates a typed function you call from your routes. Under the hood it handles retries, caches responses in Postgres, validates the output against your schema, and if validation fails it can automatically fire a repair prompt to fix the response. You declare a router. It picks which model to use based on input characteristics. Short simple inputs go to your tiny local model. Complex inputs escalate to something bigger. Confidence thresholds control when to retry or escalate. All deterministic at compile time. Some examples of what it generates: Provider adapters for openai_compat, ollama, llamacpp, koboldcpp, and raw http SSRF protection on all outbound LLM calls (allowlist-based, blocks private ranges by default) Prompt cache backed by Postgres with configurable TTL Per-trace and per-tenant token/cost budgets with hard cutoffs Cognition traces stored in Postgres (or in-memory for dev) with OTLP export Response validation (schema check or full AST compilation check for code generation) Repair prompts that fire automatically when validation fails Confidence scoring from logprobs (on providers that support it) A CLI command to convert recorded traces into regression tests The part I'm most interested in feedback on is the router concept. Right now it's a static decision tree. You set thresholds at compile time based on an input metric. There's a marrowc tune-router command that reads recorded traces and tells you if your thresholds are wrong, but it doesn't auto-rewrite them yet. The whole thing is designed around local-first inference. The default setup in the examples uses LM Studio on the LAN as the primary model and OpenRouter as the escalation tier. Most requests stay local and free. Only the ones that fail confidence checks hit the paid API. It's on GitHub and npm. The compiler is TypeScript, runs on Node 18+. There's a VS Code extension you can compile and edit to your needs. What I want to know: for those of you running local models in production or semi-production, what's the infrastructure pain that eats the most time? Is it the retry/validation loop? Cost tracking? Provider switching? Something else entirely? submitted by /u/Glittering_Focus1538 [link] [comments]
View originalPlus 5 hr usage limits
Not sure if OpenAI monitors this channel. I've been a chatgpt and codex user for a long time. My preferred codex model is gpt-5.3-codex, but this is primarily because the 5hr usage window of gpt-5.5 effectively makes it useless. This was not always the case. In fact in general I've used codex less because there's been noticeably less usage. For context I've switched things up and can dynamically route to any model mid context (took 6 months to build and test) mainly to have the freedom and flexibility I have now The point of me writing this is not to have a whinge but to share developer feedback. At one point your usage limit restrictions had me considering moving to a Pro plan. What I did instead was build a token solver that maintains context and tool awareness and can interdict a call to any llm and finish a prompt, effectively giving me no rate limit on any task. Because I have failover built into it, as well as a heuristic intent model, it can hit a rate usage on openai then preserve context and fallback to gemini flash then fallback to ollama cloud. I paid $200A a year for ollama cloud and I pay about $30A a month for gemini pro and $30A a month for plus. I guess a I'm saying I would have paid you the $150A a month if I didn't have faith you would just throttle the 5x plan so I effectively eliminated the need for it for $80A a month. In otherwords your plus usage is too low by 2x. Interestingly a few months ago you did have 2x usage, and I never needed my fallback system. I guess a I'm here to validate 2x for plus is the sweet spot. $150 won't add value if you keep sliding the throttle. To anyone still reading I will be putting my solution on github. My current rig requires Linux but I'm going to do a docker and openclaw build and stablize before I push publically. submitted by /u/SimulationHost [link] [comments]
View originalGlia – Local-first shared memory layer (SQLite-vec + FTS5 + Offline Knowledge Graph)
Hey everyone, I wanted to share a project I've been working on called Glia. It is a 100% offline, local-first RAG and memory layer designed to connect your AI web chats (Claude, ChatGPT, DeepSeek) with your local developer tools (Claude Code, Cursor, Windsurf) using a unified local database. I wanted something lightweight that did not require pulling heavy Docker containers or subscribing to third-party memory APIs. I settled on a Node.js + SQLite architecture running sqlite-vec (for 768-dim float32 embeddings) alongside SQLite FTS5 for hybrid search, powered completely by local Ollama instances. We just launched a live website that outlines the details and demonstrates the features in action: Website: https://glia-ai.vercel.app/ Codebase: https://github.com/Eshaan-Nair/Glia-AI Technical Stack & Features: Hybrid Search Retrieval: SQLite-vec (using nomic-embed-text locally) + FTS5 keyword prefix matching (porter stemmer). Surgical Sentence-level Trimming: Chunks are sliced into sentences. When a prompt is intercepted, only the exact matching sentences are pulled out of the vector store instead of the whole paragraph. It cuts LLM prompt bloat by ~90-95% in my benchmarks. Knowledge Graph Extraction: An offline task queue uses a local LLM (llama3.1:8b via Ollama) to extract entity triples (subject-relation-object). These are stored in a SQLite facts table (or Neo4j if you run the full Docker compose profile) and fused with the vector retrieval score. HyDE (Hypothetical Document Embeddings): Queries are pre-processed to generate a hypothetical answer, which is embedded together with the original query to bridge semantic gaps. Concurrency: Running SQLite in WAL (Write-Ahead Logging) mode allows the browser extension dashboard and active MCP sessions to read/write concurrently without locking. PII Redaction: Aggressive scrubbing of JWTs, API keys, emails, and IPs in the extension before data is saved. The extension works on Claude.ai, ChatGPT, DeepSeek, Gemini, Grok, and Mistral. The MCP server runs out of the same backend database for your terminal agent or Cursor. You can set it up with a single command: npx glia-ai-setup Glia is completely open-source (MIT). If you like the local-first approach or want to contribute to the SQLite vector pipeline, PRs are very welcome, and a star on GitHub helps the project get discovered! I would appreciate any feedback on the SQLite hybrid search scaling, the scoring fusion algorithm (RAG pipeline details are in RAG_PIPELINE.md), or local graph extraction performance. submitted by /u/Better-Platypus-3420 [link] [comments]
View originalWe built a tool that installs frameworks like ComfyUI, Ollama, OpenWebUI etc on any cloud GPU in one command and saves your whole setup between sessions [R]
We kept running into the same problem every time we rented a GPU to run Ollama + OpenWebUI or ComfyUI, we'd spend the first 45 minutes reinstalling everything. Custom nodes, models, configs, all of it. Docker images went stale fast, different providers had different base images, and nothing was truly portable. We got sick of it and built swm. Here's what it does for ComfyUI users specifically: swm gpus -g a100 --max-price 2.00 --sort price shows you the cheapest available GPU across RunPod, Vast ai, Lambda, and 7 other providers in one view swm pod create — spins up an instance on whatever provider you pick swm setup install comfyui — installs ComfyUI on the pod From there the main thing is the workspace sync. Your entire setup custom nodes, models, outputs, configs lives in S3-compatible object storage (I use B2). When you're done you run swm pod down and it pushes everything, kills the instance, and next time you spin up on any provider you just pull and everything is exactly where you left it. No more reinstalling 15 custom nodes and redownloading checkpoints every session. We also built a lifecycle guard because we kept falling asleep mid-session and waking up to dumb bills. It watches GPU utilization and if nothing's happening for 30 minutes (configurable), it saves your workspace and terminates automatically. Has saved us more money than we want to admit lol. A few other things: Background auto-sync daemon pushes changes every 60 seconds so you don't have to remember to save Tar mode for huge workspaces with tons of small files packs everything into one S3 object instead of 600k individual uploads Also supports vLLM, Ollama, Open WebUI, SwarmUI, and Axolotl if you do more than SD Works with Cursor, Claude Code, Codex, Windsurf if you want your AI agent to manage GPU instances for you Free, open source, Apache 2.0. pipx install swm-gpu Site: https://swmgpu.com GitHub: https://github.com/swm-gpu/swm Would love feedback from anyone who rents GPUs. What's the most annoying part of your current workflow? We are also looking for contributors to the open source repo and suggestions on new frameworks/extensions to be included. Please share your thoughts submitted by /u/Tkpf18 [link] [comments]
View originalHow I used Claude Code (and Codex) for adversarial review to build my security-first agent gateway
Long-time lurker first time posting. Hey everyone! So earlier this year, I got pulled into the OpenClaw hype. WHAT?! A local agent that drives your tools, reads your mail, writes files for you? The demos seemed genuinely incredible, people were posting non-stop about it, and I wanted in. I had been working on this problem since last year and was genuinely excited to see that someone had actually solved it. Then around February, Summer Yue, Meta's director of alignment for Superintelligence Labs, posted that her agent had deleted over 200 emails from her inbox. YIKES. She'd told it: "Check this inbox too and suggest what you would archive or delete, don't action until I tell you to." When she pointed it at her real inbox, the volume of data triggered context window compaction, and during that compaction the agent "lost" her original safety instruction. She had to physically run to her computer and kill the process to stop it. That should literally NEVER be the case with any software ever. This is a person whose actual job is AI alignment, at Meta's superintelligence lab, who could not stop an agent from deleting her email. The agent's own memory management quietly summarized away the "don't act without permission" instruction, treated the task as authorized, and started speed-running deletions. She had to kill the host process. That's when I sort of went down the rabbit hole, not because Yue did anything wrong, but because the failure mode was actually architectural and I knew that in my gut. Guess what I found? Yep. Tons more instances of this sort of thing happening. Over and over. Why? Because the safety constraint was just a prompt. It's obvious, isn't it? It's LLM 101. Prompts can be summarized away. Prompts can be misread. Prompts are fucking NOT a security boundary. And yet every agent framework I have ever seen seems to be treating them as one. I went and read the OpenClaw source code, which I should have done to begin with. What I found was a pattern I think a lot of agent frameworks have fallen into: - Tool names sit in the model context, so the model can guess or forge them - "Dangerous mode" is one config flag away from default - Memory management has no concept of instruction priority - The audit story is mostly "the model thought it should" I went looking for a security-first alternative I could trust, anything that was really being talked about or at a bare minimum attempted to address the security concerns I had. I couldn't find one. So I made it myself. CrabMeat is what came out of that, what I WANTED to exist. v0.1.0 dropped yesterday. Apache 2.0. WebSocket gateway for agentic LLM workloads. One design thesis: The LLM never holds the security boundary. What that means in code: Capability ID indirection. The model doesn't see real tool names. It sees per-session HMAC-derived opaque IDs (cap_a4f9e2b71c83). It can't guess or forge a tool name because it doesn't know any tool names. Effect classes. Every tool declares a class (read, write, exec, network). Every agent declares which classes it can use. The check is a pure function with no runtime state, easy to test exhaustively, hard to bypass. IRONCLAD_CONTEXT. Critical safety instructions are pinned to the top of the context window and explicitly marked as non-compactable. The Yue failure mode, compaction silently stripping the safety constraint, cannot happen by construction. The compactor literally cannot touch them. Tamper-evident audit chain. Every tool call, every privileged operation, every scheduler run enters the same SHA-256 hash-chained log. If something happens, you can prove what happened. If the chain is tampered with, you can prove that too. Streaming output leak filter. Secrets are caught mid-stream across token boundaries, capability IDs, API keys, JWTs, PEM blocks redacted before they reach the client. No YOLO mode. There is no global "trust the LLM with everything" switch. There never will be. Expanded reach comes through named scoped roots that are explicit, audit-logged, and bounded. The README has 15 'always-on' protections in a table. None of them can be turned off by config, because these things being toggleable is how the ecosystem ended up where it is. I decided to make sure that this wasn't just a 'trend hopping' project and aligned with my own personal values as well. I built this to be secure and local-first by default. Configured for Ollama / LM Studio / vLLM out of the box. Anthropic and OpenAI work too but require explicit configuration. There is no "happy path" that silently ships your prompts to a cloud endpoint. I decided that FIRST it needed to only run as an email agent with a CLI. Bidirectional IMAP + SMTP with allowlisted senders, threading preserved, attachments handled. This is the use case that bit Yue and a lot of other people, and I wanted to prove it could be done with real boundaries. I added in 30+ built-in tools of my own. File ops, shell (denylisted, output-capped, CWD-lo
View originalLLM-Rosetta — format conversion library across LLM API standards, doubles as a proxy
This started because we had a proprietary internal LLM API that spoke none of the standard formats. Built an internal conversion layer to bridge it, maintained that for over a year. As colleagues started adopting more and more coding tools — Claude Code, opencode, Codex, VS Code plugins, Goose, and whatever came out that week — each with its own API format expectations, maintaining separate adapters for each became the actual problem. That's what pushed the internal conversion layer into a proper generalized design, and llm-rosetta is the result. It's a Python library that converts between LLM API formats — OpenAI Chat, Responses/Open Responses, Anthropic, and Google GenAI. The idea is you convert through a shared IR so you don't end up writing N² adapters. The key difference from LiteLLM: LiteLLM is a unified calling layer that takes OpenAI-style input and transforms it into provider-native requests — one direction. llm-rosetta uses a hub-and-spoke IR, so each provider only needs one converter, and you get any-to-any conversion for free. Anthropic → Google, OpenAI Chat → Anthropic, whatever direction you need. Use it as a library — pip install and call convert() directly, no server needed. Or run the gateway if you want a proxy that handles the format translation for you. Zero required runtime dependencies either way. The HTTP server, client, and persistence layer are vendored from zerodep (https://github.com/Oaklight/zerodep), another project of mine — stdlib-only single-file modules, not someone else's library repackaged. The gateway ships with a Docker image if you'd rather not deal with Python env setup. You can also deploy it on HuggingFace Spaces or anything similar — admin panel, dashboard, request log, config management all included. Screenshots: https://llm-rosetta.readthedocs.io/en/latest/gateway/admin-panel/ We've been running it in production for about 5 months as the conversion layer for an internal multi-model access platform — needed to support various API standards and coding tool integrations before the upstream APIs were fully standardized. The Responses converter passes all 6 official Open Responses compliance tests (schema + semantic) from the spec repo. So if you're running Ollama, vLLM, or LM Studio with Responses endpoints, it should just work as one side of the conversion. There's a shim layer for provider-specific quirks — built-in shims for OpenRouter, DeepSeek, Qwen, xAI, Volcengine, etc. Converters stay generic per API standard, shims handle the edge cases declaratively. 24 cross-provider examples in the repo covering all provider pairs, SDK + REST, streaming, tool calls, image inputs, multi-turn with provider switching mid-conversation. GitHub: https://github.com/Oaklight/llm-rosetta Docs: https://llm-rosetta.readthedocs.io arXiv: https://arxiv.org/abs/2604.09360 Gateway screenshot: https://preview.redd.it/qzzjr2dcdw1h1.png?width=949&format=png&auto=webp&s=bce4293aae81059f794909fc37f85071cee34378 submitted by /u/Oaklight_dp [link] [comments]
View originalI built a desktop app that routes Claude Code to any LLM: DeepSeek, Ollama, Copilot, OpenRouter, and 7 more
Claude Code is the best AI coding tool I've used. But being locked to one provider, one pricing model, and one model catalog always bothered me. So I built CCPG, a desktop app (Mac/Windows/Linux) that proxies Claude Code to whatever provider you want. Install it, configure in the UI, launch with ccpg --DeepSeek. No YAML. No pip install. No config files. It also shows you every prompt Claude Code sends in the background, including the silent housekeeping calls you never see, with token count and latency per request. MIT, local-only, forever free. https://github.com/danielalves96/claude-code-provider-gateway submitted by /u/Livid_Individual3656 [link] [comments]
View originalI run 30+ Claude/Codex/Gemini sessions in parallel. Open-sourced the dashboard.
https://www.youtube.com/watch?v=kEVyULB4r9c Sharing this in case it's useful. I've been running 30+ Claude Code sessions in parallel for months to ship two products. Every orchestrator I tried wanted to OWN execution: you launch agents through the dashboard, and the moment you open a terminal and claude --resume something by hand, the dashboard goes blind. The card freezes. So I built CCC (Command Center for Claude) the other way around. It reads Claude Code's on-disk state as the source of truth - JSONL transcripts, the live session registry, sidecar files from two hooks it installs into your settings. Every Claude session on your Mac shows up. Terminal, headless, dashboard-spawned. Close the dashboard, sessions keep running. What I actually use it for, daily: → Sees every session — terminal, headless, dashboard-spawned. The moment you claude --resume in any terminal, the row shows up. No invisible work. (Used to find 8 orphaned sessions I'd forgotten about.) → GitHub Issues → kanban cards → sessions. New issue = new row. One click spawns a headless Claude. Card moves Working → Review → In Testing automatically as the agent ships. → Sibling-session commit coordination. Multiple terminals on the same clone use a scratch chat file to negotiate who commits first. No more clobbered commits across parallel branches. → Worktree view — every branch your sessions are on, with PR badges, commit/push state, and time-gap markers across days. → Per-turn auto-summaries. After each turn: a DID / INSIGHT / NEXT-STEP block. Scan 30 sessions in 2 minutes instead of reading transcripts. v3 stuff (newer, just shipped): → Multi-engine. Codex (via codex exec) and Gemini CLI both on the same board with their own engine chip. Honest asymmetry: Codex is fire-and-watch (no mid-run inject); Gemini has full discovery / transcript / spawn / resume parity with Claude. → Multi-repo. A vertical repo sidebar shows every known repo (running CCC servers on top, switchable repos below). The "All repos" view aggregates every conversation across every folder you've ever Claude-Code'd in. → History search. A 🔎 drawer (or / shortcut) runs BM25 across every transcript on your machine. Optional semantic search via Ollama if you've got it installed. Inline sidebar search also surfaces matches from other repos as you type. → Side-by-side conversations. Drag a session row onto the right or bottom edge of the open chat to split the pane. Each pane has its own composer and SSE stream. → Group chats between sessions, with you in the room. Sessions coordinate over a shared per-topic file — multi-agent collaboration with human-in-the-loop. → In-UI terminal (cwd clamped to the selected repo; don't run on untrusted networks), PR merge with auto-rebase recovery, PWA install, Tailscale-aware origin allowlist, launchd service install so it survives reboots. One-click install. Local. No telemetry. Nothing in the cloud. MIT, Python 3 stdlib, macOS. Two-line install. 🔗 link in the first comment. https://preview.redd.it/v8glq802601h1.png?width=3644&format=png&auto=webp&s=b545e8d688f1b5493f99da8bce82f78dfaa1b250 https://imgur.com/a/zCfOOfl submitted by /u/Mediocre-Thing7641 [link] [comments]
View originalSelf-hosted bot that gives Claude Code 40+ GitHub webhook triggers + MCP tools
It runs Claude Agent SDK - with the full Claude Code feature set - in isolated worktrees with 4 built-in MCP servers (GitHub, GitHub Actions, Memory, Codebase Tools). You configure triggers in YAML: workflows: review-pr: triggers: events: - event: pull_request.opened - event: pull_request.labeled filters: label.name: ["review", "pr-review", "review-pr"] commands: - /review - /pr-review - /review-pr prompt: template: "/pr-review-toolkit:review-pr {repo} {issue_number}" This means it will trigger the 'pr-review-toolkit' on opened PRs, on labeled PRs that match those names, and on those commands written in comments. Built-in workflows cover PR review, CI auto-fix, issue triage. Plugins add specialized agents. Memory persists across sessions. Supports any Anthropic-compatible API (Ollama, Vertex, Z.AI). Still in beta stage - some internals I want to clean up - but fully usable. Feedback and/or suggestions are welcome! https://github.com/GabsFranke/claude-code-github-agent submitted by /u/Rastoid [link] [comments]
View originalI offloaded bulk file reading from Claude Code to a cheaper model for a week. Here are the numbers.
Hey r/ClaudeAI — I use Claude Code a lot, and I noticed I was wasting a surprising amount of my usage limit on stuff that was basically just reading. Big files, long diffs, Jira/Linear tickets with comment history, docs pages, repo spelunking. Useful context, but not always something I need Claude to consume raw. So I built a small open-source sidecar tool called Triss. The rule is simple: Cheap model reads the bulky stuff. Claude gets the summary and does the thinking/editing. This is not a Claude replacement. I still keep architecture, debugging, careful edits, and final judgment with Claude. Triss is for the boring high-token intake step. One week of actual usage This is my real DeepSeek usage from May 6–13, 2026: Pro Flash Total Requests 143 66 209 Input tokens 3.74M 2.10M 5.84M Output tokens 833K 156K 990K Cost (USD) $1.88 $0.34 $2.22 That came out to about 1 cent per request on real coding work, not a benchmark. The important part is not only the DeepSeek bill. It is that Claude never had to carry those raw 5.8M input tokens in its own context. A ticket or file bundle that might have eaten tens of thousands of Claude tokens becomes a short summary, and the main conversation stays lighter. What I delegate The pattern that stuck for me: A single file over ~400 lines. 3+ files where I only need a structured summary. Jira/Linear/GitHub issues with comments and metadata. Web pages or docs pages. First-pass diff review. Commit message generation from a staged diff. What I do not delegate: Architecture decisions. Hard debugging. Precise edits. Small questions where the delegation overhead is larger than the task. What the tool does Triss can run as a CLI or as an MCP server, so Claude Code / Claude Desktop / Codex can call it as a native tool. The commands I use most: bash triss ask --paths src/foo.ts src/bar.ts --question "Summarize the control flow and risks" triss fetch https://example.com/docs --question "Extract the setup steps" triss review triss commit-msg triss usage --by-project It also has tracker integrations for Jira, Confluence, Linear, GitHub, and GitLab, because ticket/API payloads were one of the biggest hidden context sinks in my workflow. The default setup is DeepSeek, but it works with OpenAI-compatible endpoints too: DeepSeek, Kimi, Ollama, OpenRouter, etc. Credit where it is due The original idea came from Kunal Bhardwaj's write-up: https://medium.com/@kunalbhardwaj598/i-was-burning-through-claude-codes-weekly-limit-in-3-days-here-s-how-i-fixed-it-0344c555abda and his proof of concept: https://github.com/imkunal007219/claude-coworker-model My version is basically that pattern made more specific to my own workflow: MCP tools, tracker integrations, review/commit helpers, usage logging, and path sandboxing for agent calls. Links GitHub: https://github.com/ayleen/triss-coworker Install: npm install -g triss-coworker Setup: triss config wizard Open-source, MIT, unaffiliated with Anthropic. I do not get paid if you install it. I mostly wanted to share the numbers because "use a cheap model for bulk reading" sounded obvious to me in theory, but it only became habit once it was wired into Claude as a low-friction tool. Happy to answer any questions. submitted by /u/Proper-Mousse7182 [link] [comments]
View originalGave Claude a local LLM as assistant on my Mac
Hi there! I was playing around with Ollama and LMstudio, testing local models and had the idea of letting Claude evaluate a few models on their actual capabilities rather than doing it myself. The idea was to connect Claude (CLI or Desktop App) to my local LLM running on the mac mini m4 (24GB RAM) and have it do performance, coding and logic tests. I built an mcp connection to Ollama (local) running qwen 2.5 coder (14B). After setting up the connection and seeing the first test results, me (and Claude) got really excited on the good results and performance. So I decided to give Claude instructions, guidelines and a memory md about his new assistant "Frank" to teach it how to use it even in new sessions, tasks, chats etc. Frank is basically supposed to be a (no-cost) assistant which Claude can delegate work to under specific conditions like it must use less tokens than doing it yourself, must not affect quality, needs a final review etc. I am still testing and fixing issues but it works quite well for tasks like text processing, handling of large css/html files and so on. Has anyone ever done that with a more cacable model like 30B or more? I am operating at the limits of my RAM/GPU and can't really test more sophisticated models or more complex tasks. Using a stronger machine could actually do so much more. Any similar tests out there? submitted by /u/Infamous-Profit4769 [link] [comments]
View originalI built an autonomous engineering agent on top of Claude Code. Self-improving routing, cross-session memory, process intelligence, P2P team learning.
Some of you might remember my posts about claude-bootstrap (v3.6 was the last one — cross-agent intelligence). I skipped v4 entirely because v5 shipped days later. What started as an opinionated Claude Code setup has become something fundamentally different. The problem I'm solving: Every AI coding tool today is an amnesiac. When a session ends, everything the agent learned — project conventions, reviewer preferences, codebase idioms — evaporates. The next session starts from scratch. And if you use multiple AI tools across projects, you have zero unified visibility into what's happening. I think the industry is converging on a spectrum: Level 0: Autocomplete (Copilot, TabNine) Level 1: Chat Assistant (ChatGPT, Claude) Level 2: Project-Aware Assistant (Cursor, Continue) Level 3: Task Agent (Devin, Claude Code Agent) Level 4: Autonomous Engineering Platform (Maggy) ← this is what I built The difference at Level 4: multi-model orchestration, self-improvement from every task, process intelligence that learns from CI/reviews/deploys, cross-session memory, and P2P team learning. What Maggy actually does Chat — Session Takeover: Auto-detects all running Claude Code sessions across your projects. Shows session history, prompt counts, duration. You can `--resume` into any session from the dashboard. Right now I have 7 active sessions across 4 projects visible at a glance. Task Triage: Connects to GitHub Issues and Asana. AI-ranks tasks by priority. One-click "Plan" or "Execute" buttons that spawn the right CLI with codebase context pre-injected from an intent code property graph (iCPG). Process Intelligence: This is the part most tools completely ignore. Maggy collects signals from the full SDLC — CI results, PR review comments, CodeRabbit findings, merge patterns, deploy results. It learns which code patterns cause test failures, what reviewers consistently flag, and preemptively fixes issues before they reach reviewers. > "Your reviewer always flags missing error handling in API routes. Maggy added it before the PR was created." That's not prompt engineering. That's autonomous process optimization. Cross-Session Memory (Engram): Maggy identifies 7 distinct amnesia pathologies (anterograde, retrograde, temporal, source, interference, context-binding, confabulation). Engram is a three-tier memory system — local (project-specific), portfolio (cross-project patterns), and mesh (team-shared). Knowledge compounds across sessions instead of evaporating. Maggy Mesh — P2P Team Intelligence: Connects Maggy instances across a team. One developer's CI fix becomes the entire team's knowledge — autonomously. Typed memory classes (scores, patterns, policies, gaps) with provenance and quarantine. A new team member gets the benefit of months of collective learning on day one. Multi-Model Routing: Auto-discovers which CLIs you have (Claude, Codex, Kimi, Ollama) by probing `--help` at startup. Routes by complexity score: Blast 1-3 → ollama (free, local) or kimi (cheap) Blast 4-6 → codex (mid-tier) Blast 7-10 → claude (premium, with validator) Security, tests, docs, architecture always go to Claude regardless. The routing rules are YAML and self-update from task outcomes. 5-Level Self-Improvement: This is the core differentiator. Every task teaches Maggy something: | Level | Frequency | What It Does | |-------|-----------|-------------| | L0 — Real-time | Seconds | Catches tool/test failures, switches models mid-task | | L1 — Task | Minutes | Computes reward score, updates model performance | | L2 — Daily | Hours | Catches CI pass rate drops, disables failing models | | L3 — Weekly | Days | Evolves skill files, adjusts workflow steps | | L4 — Monthly | Weeks | Recalibrates reward signals, tunes the improvement process itself | Budget Tracking: Per-provider token spend with daily limits. When Anthropic hits budget, Maggy routes to OpenAI. When that hits budget, it routes to local Qwen. Work never stops. Competitor Intelligence: RSS + Google News daily briefing for your competitive landscape. The benchmark Built an Expense Tracker (6 tasks) through two pipelines — Maggy (4 models) vs Claude Code alone: | Metric | Maggy | Claude Code | |--------|-------|-------------| | Success rate | 6/6 (100%) | 6/6 (100%) | | Quality score | 7.4/10 | 7.8/10 | | Claude usage | 1/6 tasks (17%) | 6/6 tasks (100%) | | Security issues found | 7 | 0 | Claude alone is faster. But Maggy used it for only 1 out of 6 tasks — 83% reduction in premium compute. And the dedicated security routing caught 7 issues the single-pipeline missed entirely. The question isn't "which tool writes better code today?" — it's "which tool writes better code *next month* than it did *this month*?" Repo: github.com/alinaqi/claude-bootstrap Maggy is built on Claude Code's infrastructure (skills, hooks, MCP). It extends Claude Code with self-improvement, multi-model routing, process intelligence, and team mesh. If you just want the skills/hooks/TDD se
View originalRepository Audit Available
Deep analysis of ollama/ollama — architecture, costs, security, dependencies & more
Yes, Ollama offers a free tier. Pricing found: $0, $20 / mo, $200/yr, $100 / mo
Ollama has an average rating of 5.0 out of 5 stars based on 1 reviews from G2, Capterra, and TrustRadius.
Key features include: Automate your work, Solve harder tasks, faster, For your most demanding work.
Ollama is commonly used for: Local deployment of open-source AI models, Cost-effective AI solutions for developers, Running multiple AI models simultaneously, Automating repetitive tasks with AI assistance, Integrating AI into software development workflows, Testing and validating AI models in real-time.
Ollama integrates with: NVIDIA Cloud Providers (NCPs), OpenClaw, Claude Code, Blackwell architecture, Vera Rubin architecture, GitHub for version control, Slack for team collaboration, Jupyter Notebooks for data analysis, Docker for containerization, Kubernetes for orchestration.
Ollama has a public GitHub repository with 166,253 stars.
Based on user reviews and social mentions, the most common pain points are: API costs, llama, cost tracking, large language model.
Based on 63 social mentions analyzed, 16% of sentiment is positive, 79% neutral, and 5% negative.