Qdrant is an Open-Source Vector Search Engine written in Rust. It provides fast and scalable vector similarity search service with convenient API.
Qdrant is highly praised for its effectiveness as an AI tool, reflected in its high average ratings on G2 with several 4.5/5 and 5/5 scores. Users appreciate its capabilities in managing AI workloads and enabling efficient searches, although there are recurring mentions of challenges with context continuity and session memory in related AI applications. Pricing sentiment is not explicitly mentioned, indicating it may not be a focal concern for users. Overall, Qdrant has a strong reputation and is viewed positively within the AI and developer community, especially for users seeking robust solutions for AI context and data management.
Mentions (30d)
4
Avg Rating
4.5
12 reviews
Platforms
4
GitHub Stars
29,940
2,150 forks
Qdrant is highly praised for its effectiveness as an AI tool, reflected in its high average ratings on G2 with several 4.5/5 and 5/5 scores. Users appreciate its capabilities in managing AI workloads and enabling efficient searches, although there are recurring mentions of challenges with context continuity and session memory in related AI applications. Pricing sentiment is not explicitly mentioned, indicating it may not be a focal concern for users. Overall, Qdrant has a strong reputation and is viewed positively within the AI and developer community, especially for users seeking robust solutions for AI context and data management.
Features
Use Cases
Industry
information technology & services
Employees
95
Funding Stage
Series B
Total Funding
$88.7M
1,590
GitHub followers
129
GitHub repos
29,940
GitHub stars
20
npm packages
40
HuggingFace models
457,517
npm downloads/wk
Show HN: Open-sourced AI Agent runtime (YAML-first)
Been running AI agents in production for a while and kept running into the same issues:<p>controlling what they can do tracking costs debugging failures making it safe for real workloads<p>So we built AgentRuntime, the infrastructure layer we wished we had. Not an agent framework, but the platform around agents:<p>policies memory workflows observability cost tracking RAG governance<p>Agents and policies are defined in YAML, so it's infrastructure-as-code rather than a chatbot builder. Example – agents and policies in YAML agent.yaml – declarative agent config name: support_agent<p>model: provider: anthropic name: claude-3-5-sonnet<p>context_assembly: enabled: true<p><pre><code> embeddings: provider: openai model: text-embedding-3-small providers: - type: knowledge config: sources: ["./docs"] top_k: 3 </code></pre> policies/safety.yaml – governance as code name: security-policy<p>rules: - id: block-file-deletion condition: tool.name == "file_delete" action: deny<p>CLI – run and inspect Create and run an agent agentctl agent create researcher --goal "Research AI safety" --llm gpt-4 agentctl agent run researcher agentctl runs watch <run-id><p>Manage policies agentctl policy list agentctl policy activate security-policy 1.0.0<p>RAG – ingest docs and ground responses in your knowledge base agentctl context ingest ./docs agentctl run --agent agent.yaml --goal "How do I deploy?"<p>Agent-level debugging agentctl debug -c agent.yaml -g "Analyze this dataset."<p>Cost tracking is exposed via the API (per agent/tenant), and the Web UI shows analytics. The workflow debugger (breakpoints, step-through) lives in the pkg layer; the CLI debug is for agent execution. What’s in there Governance<p>Policy engine (CEL) Risk scoring Encrypted audit logs RBAC Multi-tenancy Fully YAML-configurable<p>Orchestration<p>Visual workflow designer (React Flow) DAG workflows Multi-agent coordination Conditional logic Plugin hot-reload Workflow marketplace<p>Memory & Context<p>Working memory Persistent memory Semantic memory Event log<p>Context assembly combines:<p>policies workflow state memory tool outputs knowledge<p>RAG features:<p>embeddings (OpenAI or local) SQLite for development Postgres + vector stores in production<p>Observability<p>Cost attribution via API SLA monitoring Distributed tracing (OpenTelemetry) Prometheus metrics Deterministic replay (5 modes)<p>Production<p>Kubernetes operator (Agent, Workflow, Policy CRDs) Helm charts Istio config Auto-scaling Backup / restore GraphQL + REST API<p>Implementation<p>~50k LOC of Go Hundreds of tests Built for production (in mind)<p>Runs on: Local<p>SQLite In-memory runtime<p>Production<p>Postgres Redis Qdrant / Weaviate<p>Happy to answer questions or help people get started
View originalPricing found: $50
g2
What do you like best about Qdrant?In our organization, we developed an RAG application and needed a way to store embeddings. I looked after many open-source tools like Pinecone and Superduperdb. Qdrant worked the best. The setup on our server was super easy, and their documentation is very elaborate. I also think the embedding search is more accurate than the other platforms I piloted with. We are still currently using Qdrant for our RAG application and are happy with it. Review collected by and hosted on G2.com.What do you dislike about Qdrant?Inability to perform rich operations from UI without writing code/query. For example, if I want to delete all collections or collections matching a name pattern, or even if I want to select multiple collections and delete, that is not possible through UI. Review collected by and hosted on G2.com.
What do you like best about Qdrant?fully manage in all resource ,available on AWS , Google and azure plaform help with vector search technolgy Review collected by and hosted on G2.com.What do you dislike about Qdrant?non build in visualiztion ,significantly slower searching time in result. Review collected by and hosted on G2.com.
What do you like best about Qdrant?Self-hosting Qdrant on a host is really simple and does not takes a lot of time to setup or troubleshoot issues. The documentation is also up to date. I prefer to install it using Docker to avoid installing dependencies. Review collected by and hosted on G2.com.What do you dislike about Qdrant?The initial learning curve is high but the documentation and resources makes up for it. Review collected by and hosted on G2.com.
What do you like best about Qdrant?desparate data sources makes easier to consolidate and analyze data from various sources,scaling data,data quality and governance. Review collected by and hosted on G2.com.What do you dislike about Qdrant?Learning might be quite difficult for who are not familiar with advanved data analytics. pricing plans are high. Review collected by and hosted on G2.com.
What do you like best about Qdrant?Qdrant is fast and easily scalable, and I can index and query millions of vectors, essential for my work on image search. This is true because it is an open-source application, thereby allowing me to modify and adapt it to other tools that I use. Review collected by and hosted on G2.com.What do you dislike about Qdrant?Qdrant does not have integrated visualizations. This makes it difficult to make conclusions and draw visualization of the search results. Review collected by and hosted on G2.com.
What do you like best about Qdrant?I can quickly scan through huge volumes of vectors – it is relevant for my AI work on image recognition. Since it is an open-source software, it can be used calmly and can be modified and integrated with my existing systems. Review collected by and hosted on G2.com.What do you dislike about Qdrant?Qdrant also has no incorporated visualization capabilities. Due to its basic functionalities I find it difficult to analyze and interpret the results as there are no additional software installed. Review collected by and hosted on G2.com.
What do you like best about Qdrant?A tool for creating vector collections and performing vector operations. It excels at vector distance searches, offers convenient auto-completion features, and includes a free tier for evaluation. Review collected by and hosted on G2.com.What do you dislike about Qdrant?Although the interface is quite simple, it still has limited capabilities. Review collected by and hosted on G2.com.
What do you like best about Qdrant?In the pursuit of my AI research, Qdrant can expedite the process of searching high-dimensional vector data. The options and setting let me work on terabytes of data and perform similarity search in real time. Review collected by and hosted on G2.com.What do you dislike about Qdrant?Qdrant does not come with graphical utilities that can provide data visualization. This poses a problem when it comes to interpreting the retrieved results particularly for higher-orders of dimensions. Review collected by and hosted on G2.com.
What do you like best about Qdrant?What I like best about Qdrant is its efficiency in indexing and searching high-dimensional vectors. The ease of integration with AI-based applications and the ability to perform semantic search queries are major advantages. Additionally, the support for multiple programming languages makes Qdrant versatile and accessible for different development teams Review collected by and hosted on G2.com.What do you dislike about Qdrant?One of the few downsides of Qdrant is that the initial learning curve can be steep for those unfamiliar with vector-based databases. While the documentation is well-done, more practical examples or video tutorials would be helpful to ease the onboarding process for new users. Furthermore, some advanced features require manual configuration, which might not be straightforward for everyone. Review collected by and hosted on G2.com.
What do you like best about Qdrant?it is optimized for speed and scalability, capable of handling large datasets with high throughput. The engine uses state-of-the-art algorithms to ensure fast query responses. Review collected by and hosted on G2.com.What do you dislike about Qdrant?High performance comes with high resource usage, which might be a consideration for smaller deployments. Review collected by and hosted on G2.com.
My workflow: GPT for architecture and Claude Code for execution
I’m working on a large project with FastAPI, Nuxt, PHP, Redis, Qdrant, and several AI agent layers. Over time, I noticed that using Claude Code directly for big architectural decisions was not always the safest approach for my project. Claude Code is extremely strong when it understands the existing codebase and needs to edit files, run tests, refactor, and follow a clear implementation plan. But when I asked it to analyze a major feature or propose a large architecture change from scratch, I sometimes saw risky suggestions or directions that did not fully fit the project. So I changed my workflow. For big decisions, I first use ChatGPT 5.5 to analyze the architecture, challenge the idea, and create a clear draft or roadmap. Then I take that draft to Claude Code and ask it to verify it against the real codebase. Claude Code usually improves the practical details: service names, controllers, helpers, file paths, implementation constraints, and possible conflicts. After that, I ask Claude Code to create an implementation guide before touching the code. I review that guide again, then I let Claude Code execute step by step with a checklist. From my experience in this project, this feels like the safest workflow: GPT 5.5 helps me with architecture, roadmap thinking, and big technical decisions. Claude Code helps me with execution, refactoring, tests, and codebase-aware implementation. I don’t like leaving Claude Code to code without a clear guide, especially for sensitive architecture changes. I always track the code, review the plan, and check the implementation step by step. It takes longer, but it helps protect the project from bad decisions and regressions. Do you see this approach as correct, or do you think Claude Code can be trusted more directly for architecture-level decisions too? submitted by /u/Maamriya [link] [comments]
View originalYou don't need a GPU server to run Claude agents
I’ve been seeing a lot of newcomers asking about hardware specs lately, and there’s this weirdly common myth that you need a heavy server or a GPU instance to run Cla͏ude-based agents. You really don’t. If you’re using the API, Anth͏ropic does 100% of the heavy lifting on their side. Your server is just a middleman handling HTTP requests and maybe some lightweight logic. My current stack (a Python agent loop + Postgres for memory + a small Qdrant instance for RAG) has been humming along perfectly on a basic 2 vCPU / 4GB RAM setup. CPU: Idle 90% of the time. RAM: Only matters if your Vector DB grows huge. GPU: Completely useless for API calls. Unless you’re planning to run local models like Ll͏ama 3 via Ollama alongside Claude, just get the cheapest stable VPS you can find. Save that cash for your API credits - that’s where the real bill comes from. Curious what you guys are running your agents on? Has anyone actually managed to hit a bottleneck on a cheap VPS? submitted by /u/august212023 [link] [comments]
View originalI built persistent memory for Claude — local stack, MCP integration, 39ms retrieval. Sharing the architecture.
If you use Claude heavily, you've felt this: every session starts from zero. You re-explain context, Claude helps, the window closes, and the next session has no idea what you decided yesterday. The standard workaround is a markdown wiki Claude reads — but as the wiki grows, every "what did we decide about X" question burns thousands of tokens grepping and re-reading whole pages. I spent the last few weeks building a persistent memory layer to fix both problems. It runs entirely on my own machine, integrates via MCP, and lives between Claude and my existing wiki. Sharing the architecture and what I learned in case anyone wants to build their own. What it does Semantic retrieval over my wiki. Instead of Claude grepping pages, my MCP server returns the most relevant chunks for any query in ~50ms. 82% mean token reduction on a 10-query eval set vs the grep+Read baseline. F1 retrieval quality is also better — cheaper and more accurate. Session crystallization. End-of-session, conversations get compressed into a structured "L4 node" with summary + decisions + open threads, indexed alongside wiki content. Tomorrow I can ask "what did we decide about X" and Claude pulls last session's decision verbatim. Lazy-spawned local models. Embedder + chat model run as subprocesses that the supervisor spawns on first use and reaps after 1 hour idle. Boot cost is zero — nothing loaded until needed. The architecture (four layers) Inspired by Andrej Karpathy's writing on LLM-native wikis, then formalized into a build spec: L0 — append-only event log (SQLite). Every input/output, content-hashed. L1 — structured facts with confidence + decay (deferred to next phase) L2/L3 — derived prose + cross-cutting summaries (the hand-edited wiki plays this role for now) L4 — crystallized session nodes. Summary, decisions, open threads. Indexed in the same vector store as wiki chunks so retrieval finds both naturally. The stack Qdrant in Docker for vector search llama.cpp running Qwen3-Embedding-4B (GPU) and Qwen3.5-2B-Q4_K_M (CPU) FastMCP server exposing 7 tools (retrieve, crystallize_session, list_sessions, get_l4_node, index_status, reindex, shutdown_models) Cowork plugin for Claude Desktop integration; also works with Claude Code via standard MCP config No cloud, no API keys, $0 marginal cost per query. Numbers Token reduction: 82.7% mean, 86.2% median vs grep+Read baseline Retrieval F1: 0.50 vs 0.20 baseline Embed cold-start: ~4s. Hot-path p95: 39ms (was 2241ms before fixing one specific bug — see below) L4 session retrieval eval: 0.920 mean score (gate 0.6) 738 chunks currently indexed across 104 markdown files The most useful thing I learned Hot-path retrieve was inexplicably stuck at 2241ms p95 even though the embedding model was fully GPU-resident on a 4070 Ti Super. Spent hours blaming GPU offload, prompt cache, KV pre-allocation. The actual cause: every httpx.post() was opening a fresh TCP connection, and Windows localhost handshakes take ~2 seconds. A 5-line change — switching to a persistent httpx.Client with keep-alive — dropped p95 to 39ms. 57× speedup. Lesson: latency that's suspiciously consistent (2240, 2237, 2241, 2227, 2239 ms) is a fixed cost, not a compute cost. If your local-MCP integration feels slow on Windows, check connection reuse before you blame the model. A few other things that surprised me Qwen3 thinking mode silently consumes the generation budget. Crystallization was returning empty content. Logs showed exactly 2000 tokens generated (the cap). Turned out Qwen3 emits ... blocks the chat handler strips before populating message.content. With JSON grammar enforced, the model spent all 2000 tokens "thinking" and never emitted JSON. Fix: pass chat_template_kwargs: {enable_thinking: false} via extra_body (requires --jinja on llama-server). The MCP plugin needed to register against the right config file. Cowork (Claude Desktop's agentic mode) doesn't read ~/.claude.json like Claude Code does. The first attempt at MCP registration silently went to the wrong file. The fix was packaging the LKS service as a proper Cowork plugin (.plugin bundle) — Cowork has a plugin system distinct from raw MCP server registration. If you're trying to wire a custom MCP server into Cowork, this is the path. What it doesn't do (yet) No automatic conversation capture — L0 ingestion is manual or via end-of-session crystallization No L1 fact extraction yet (next phase) — retrieval is over markdown chunks + L4 nodes today Wiki is still source-of-truth; no automatic conflict resolution Solo deployment only; no federation or multi-user Tested on Windows; Linux/Mac would need a small tweak to the supervisor (it uses subprocess.CREATE_NEW_PROCESS_GROUP for clean Windows termination) Full write-up Architecture, phased build narrative, all five lessons-learned bug stories, the setup walkthrough, and the roadmap: https://gist.github.com/tyoung515-svg/5fd5279f46d935f517cda89146c94685
View originalI run a team of Claude agents that ships PRs to production — open source
I've been running a multi-agent system in production for a few months — a co-CTO agent + specialist agents (PM, dev, ops) that handle real engineering work end-to-end: design specs, code review, PR implementation, deploys, monitoring. The architecture: Each agent is a Docker container running claude -p (with optional Codex fallback) wrapped in .NET 10. A central orchestrator coordinates them via Temporal workflows + RabbitMQ. Agents talk to me over Telegram (DMs + group chat for the whole team). Memory is Qdrant + Ollama embeddings — agents recall past decisions across sessions. A web dashboard shows live agent status and in-flight workflows. What it does day-to-day: I drop a one-line request in Telegram. PM writes the spec, two reviewers run consensus, dev implements the PR, CI ships to staging, PM verifies, I approve the merge gate, prod deploy. Same pattern handles infra: deploy verifications, health checks, daily digests, incident triage. Agents have access to fleet-memory (semantic memory MCP) — they search before acting, write learnings after. 5-min demo of an actual production PR being shipped: https://youtu.be/DIx7Y3GfmGc Why I built it instead of using crewai/autogen/langgraph: I wanted Temporal-backed durability (workflows survive restarts, retries are deterministic) and ops-grade observability (every workflow visible in the temporal UI, every signal auditable). The agents themselves are just claude -p — the magic is in the orchestration layer. Open source: https://github.com/anurmatov/phleet Side note for those who recognize me — this runs on the Mac Studio I documented in mac-studio-server. The dogfooding is real. Happy to dig into prompts, system architecture, memory strategy, or how the agents handle PR reviews — AMA. submitted by /u/_ggsa [link] [comments]
View originalTired of re-explaining my codebase to Claude every session, so I built a memory layer for it
Every new Claude Code session I'd end up re-explaining the architecture, re-debugging the same weird errors, re-teaching the same patterns. After the tenth time I snapped and started building something. It's called Alaz. Single Rust binary that hooks into session start and session end. When a session ends it parses the transcript and pulls out patterns, episodes, procedures, facts, and what went wrong. When a new session starts it injects the relevant stuff back as context — what's currently broken, what reliably worked before, recent decisions, conventions you keep repeating. Under the hood: PostgreSQL + Qdrant, 6-signal hybrid search (FTS + dense vectors + ColBERT + graph + RAPTOR + memory decay, fused with RRF). 76 MCP tools. Works fully local with Ollama, or you can plug in any OpenAI-compatible API if you want a smarter LLM for the learning pipeline. Just shipped v2.0.0. MIT. Honest feedback and "this is dumb because X" comments welcome. https://github.com/Nonanti/Alaz submitted by /u/Nonantiy [link] [comments]
View originalSkill Seekers v3.5: 10 new source types, 12 LLM platforms, marketplace pipeline, agent-agnostic AI, and prompt injection scanner
Hey r/ClaudeAI — sharing the latest update on Skill Seekers, the open-source tool that converts documentation into Claude Code skills. A lot has changed since the v3.2 post, so here's what's new across 3 releases (v3.3 → v3.5.1). What's new 10 new source types (17 total) You can now generate skills from Notion, Confluence, HTML files, OpenAPI specs, AsciiDoc, PowerPoint, RSS feeds, man pages, chat exports (Slack/Discord), and unified multi-source configs — on top of the original web, GitHub, PDF, Word, EPUB, video, and local codebase sources. 12 LLM platforms Skills now package for Claude, OpenAI, Gemini, Kimi, DeepSeek, Qwen, OpenRouter, Together AI, Fireworks AI, OpenCode, Markdown, and MiniMax. Plus RAG framework exports for LangChain, LlamaIndex, Haystack, ChromaDB, FAISS, Weaviate, Qdrant, and Pinecone. Agent-agnostic AI enhancement Enhancement is no longer locked to Claude. The new AgentClient abstraction supports Claude, Kimi, Codex, Copilot, OpenCode, and custom agents. It auto-detects which agent to use from your API keys, or you can specify with --agent. Marketplace pipeline You can now publish skills directly to Claude Code plugin marketplace repositories and manage multiple marketplace registries. Config sources can be pushed and synced across repos. Prompt injection scanner A built-in workflow scans scraped content for injection patterns — role assumption, instruction overrides, delimiter injection, hidden instructions. Runs automatically as the first stage in default and security-focused workflows. Flags suspicious content without removing it so you can review. One-command auto-detection skill-seekers create https://docs.example.com/ skill-seekers create owner/repo skill-seekers create ./my-project skill-seekers create document.pdf One command figures out the source type and routes to the right scraper. No more separate subcommands. Headless browser rendering JavaScript SPA sites (React, Vue, etc.) that return empty HTML shells now work with --browser. Uses Playwright under the hood. Other highlights skill-seekers doctor health check command Kotlin language support in the C3.x codebase analysis pipeline Smart SPA discovery (sitemap.xml + llms.txt + browser nav) Unlimited pages by default (was capped at 500) 3100+ tests passing Full MCP server with 40 tools (works in Claude Code and Cursor/Windsurf) Links GitHub: github.com/yusufkaraaslan/Skill_Seekers PyPI: pip install skill-seekers Free and open source Built with Claude Code. Happy to answer questions or take feedback. submitted by /u/Critical-Pea-8782 [link] [comments]
View originalJARVIS running on 3 servers as one fleet. Claude Code, Cursor, and OpenCode all coordinating.
One instance is enough, but where is the fun in that right? 🤣 JARVIS across 3 servers, each running a different AI coding agent: - Hel2: Claude Code CLI - Hel1: Cursor CLI - Mainframe: OpenCode They talk to each other over fleet MCP. Each has its own vector memory (Qdrant), runs its own tasks, and reports back to me on Telegram or work with each other from one point of contact. Same JARVIS, different hands. They don't just run. They coordinate. Video is all 3 tmux sessions open at once. Can't explain the feeling, this is like when I got my first video game, the one with cartridges. If it's useful or you are interested, happy to share how I set it up with tmux, systemd, custom telegram bridges (what i built for cursor and opencode), memory setup and stuff. submitted by /u/Huge_Cupcake4407 [link] [comments]
View originalBuilt a multi-node JARVIS with Claude Code, 3 servers, fleet comms, persistent memory
Started with 4 weekends of setting up and breaking OpenClaw on 2 VPSs and a laptop to build my own Jarvis. One day while debugging, Claude called me "sir." It read the SOUL.md from OpenClaw. That one word changed everything. Grateful to OpenClaw, that's where it all started. One by one, I moved away from it. Replaced with Claude Code + Telegram. The real JARVIS was born. Brought in the same Qdrant vector memory I'd built for OpenClaw and added session handoffs, he remembers everything across sessions. Then I built a second JARVIS on a second server. Cursor + a custom Telegram gateway. Built a third on the mainframe running OpenCode + custom Telegram. All 3 act as one JARVIS now, talking to each other over fleet MCP, saving their own memory and conversations, syncing across nodes. submitted by /u/Huge_Cupcake4407 [link] [comments]
View originalLLM Documentation accuracy solved for free with Buonaiuto-Doc4LLM, the MCP server that gives your AI assistant real, up-to-date docs instead of hallucinated APIs
LLMs often generate incorrect API calls because their knowledge is outdated. The result is code that looks convincing but relies on deprecated functions or ignores recent breaking changes. Buonaiuto Doc4LLM addresses this by providing free AI tools with accurate, version-aware documentation—directly from official sources. It fetches and stores documentation locally (React, Next.js, FastAPI, Pydantic, Stripe, Supabase, TypeScript, and more), making it available offline after the initial sync. Through the Model Context Protocol, it delivers only the relevant sections, enforces token limits, and validates library versions to prevent mismatches. The system also tracks documentation updates and surfaces only what has changed, keeping outputs aligned with the current state of each project. A built-in feedback loop measures which sources are genuinely useful, enabling continuous improvement. Search is based on BM25 with TF-IDF scoring, with optional semantic retrieval via Qdrant and local embedding models such as sentence-transformers or Ollama. A lightweight FastAPI + HTMX dashboard provides access to indexed documentation, queries, and feedback insights. Compatible with Claude Code, Cursor, Zed, Cline, Continue, OpenAI Codex, and other MCP-enabled tools. https://github.com/mbuon/Buonaiuto-Doc4LLM submitted by /u/mbuon [link] [comments]
View originalBuilt a tool to capture and search AI coding sessions across providers. Looking for feedback on the approach.
Core problem: AI sessions aren't searchable across providers. You solve something with Claude Code, need it again weeks later, can't find it. Start over. What I built: Three capture methods: API proxy for OpenAI/Anthropic/Google endpoints (zero code changes) Native hooks for Claude Code and Gemini CLI (structured session data via stdin) Browser extension for ChatGPT/Claude.ai Everything flows into a unified search: hybrid semantic (embeddings) + keyword (BM25), RRF fusion for ranking. Sub-second results across all providers. Hook-level DLP: When Claude Code reads .env files, actual secrets never reach the model. Intercepts file reads, replaces values with [REDACTED:API_KEY] placeholders, passes sanitized version to Claude. Model can reason about variables without seeing credentials. Architecture: Python FastAPI backend Qdrant for vector search (OpenAI embeddings, 1536d) Supabase (PostgreSQL) for session storage Next.js frontend Privacy: Everything runs locally or in your account. Export/delete anytime. Nothing shared. PyPI package: https://pypi.org/project/rclm (hooks + proxy) Live beta: reclaimllm.com Questions for this community: Claude Code users: Would you actually use hook-level capture, or is the transcript file enough? DLP approach: Is interception at file-read too aggressive, or is post-hoc flagging insufficient? Missing features: What would make this actually useful vs just interesting? Marketplace: Given the sessions can be sanitized to certain extent, would it make sense for a marketplace where people can share/sell their chat sessions? Primarily I think from open source perspective as we are getting tied down to closed source models Enterprise: What enterprise use you can think of for this service Honest feedback appreciated. If the approach is fundamentally wrong, I'd rather know now. submitted by /u/Inevitable-Lack-8747 [link] [comments]
View originalGave Claude Code persistent memory across sessions — it actually remembers now
Been using Claude Code as my main coding assistant for months. The one thing that kept bugging me: every session starts blank. I'd re-explain my project structure, re-teach my conventions, re-debug stuff we already solved together last week. So I built a memory layer that hooks into Claude Code's session lifecycle. When a session ends, it parses the transcript and extracts useful stuff — patterns, errors, decisions, preferences. When a new session starts, it injects the relevant context automatically. After a few sessions it gets pretty useful. Claude just knows my codebase conventions, remembers past errors, knows which approaches worked. Like going from a stranger to a teammate who's been on the project for a while. Setup is two config changes: MCP server in ~/.mcp.json (22 tools — search, save, episodes, graph, vault, etc.) Session hooks in ~/.claude/settings.json (start/stop triggers) It also tracks procedure success rates with Wilson scoring, so "proven" workflows rank higher than stuff that failed before. And if you work on multiple projects, patterns that show up in 3+ projects get promoted to global scope. Self-hosted, Rust, MIT licensed. Needs PostgreSQL + Qdrant (docker compose handles both). GitHub: https://github.com/Nonanti/Alaz Anyone else tried building memory/context systems around Claude Code? Curious what approaches others are taking. submitted by /u/Nonantiy [link] [comments]
View originalA fully local, private alternative to Context7 that reduces your token usage
Context7 is great for pulling docs into your agent's context, but it routes everything through a cloud API and an MCP server. You have to buy a subscription, manage API keys, and work within their rate limits. So I used Claude Code to build a local alternative. docmancer ingests documentation from GitBook, Mintlify, and other doc sites, chunks it, and indexes it locally using hybrid retrieval (BM25 + dense embeddings via Qdrant). Everything runs on your machine locally. Once you've ingested a doc source, you install a skill into your agent (Claude Code, Codex, Cursor, and others), and the agent queries the CLI directly for only the chunks it needs. This drastically reduces your token usage and saves a lot of context. GitHub (MIT license): https://github.com/docmancer/docmancer Give it a shot and let me know what you think. I am looking for honest feedback from heavy users of Claude Code. submitted by /u/galacticguardian90 [link] [comments]
View originalClaude Code memory that fits in a single SQLite file
I kept re-explaining my stack to Claude every session. The memory tools I tried either spawned a process that ate gigs of RAM, or dropped vector search to stay light. Built nan-forget with Claude Code over the last few weeks. Claude helped design the 3-stage retrieval pipeline (recognition → recall → spreading activation), wrote most of the SQLite migration from Qdrant, and caught edge cases in the vector search scoring I would have missed. It stores memories in one SQLite file, ~3MB, no background services. npx nan-forget setup and you're done. 4 hooks save context as you work. You never call save. "auth system" finds "We chose JWT with Clerk." Search by meaning, not keywords. Memories carry problem/solution/concepts fields. A bug fix from March surfaces when you hit the same error in June. Old memories decay on a 30-day half-life. Stale ones consolidate into summaries. Active ones sharpen. Same database across Claude Code (MCP), Codex, Cursor (REST API), and terminal (CLI). No LLM calls for memory ops. Runs locally. Free and open source. https://github.com/NaNMesh/nan-forget Anyone else fighting context loss across sessions? What have you tried? submitted by /u/NaNMesh [link] [comments]
View originalBuilt a memory system that actually works!!
a persistent memory system I've been building for Claude Code that gives LLM agents actual context continuity across sessions. Benchmarks: - LoCoMo: 90.8% (beats every published system) - LongMemEval: 89.1% Why it's interesting for agent builders: The architecture is adapter-based. Currently hooks into Claude Code's lifecycle events, but the core (storage, retrieval, intelligence) is framework-agnostic. The retrieval pipeline (4-channel RRF: FTS5 + Qdrant KNN + recency + graph walk) and the intelligence layer (intent classification, experience patterns, RL policy) could plug into any agent framework. Quick setup: ollama pull snowflake-arctic-embed2 bun install && bun run build && bun run setup node dist/angel/index.cjs Tech stack: TypeScript, SQLite (better-sqlite3), Qdrant, Ollama, esbuild, Vitest Key design decisions: - Dual-write (SQLite truth + Qdrant acceleration) with graceful degradation - Every operation is non-throwing — individual failures never break the pipeline - Ephemeral hooks (millisecond lifetime) for capture, persistent Angel for reflection - RL policy models are pure TypeScript (Float32Array math, no PyTorch) - Content-length-aware embedding backfill in background 29K lines, 1,968 tests, MIT licensed: https://github.com/grigorijejakisic/Claudex submitted by /u/Pristine_Use5236 [link] [comments]
View originalI built an MCP server that gives your agent semantic search over Obsidian vaults — stop losing docs to keyword matching
I was tired of my agent doing keyword searches across my Obsidian vault and missing half the relevant docs. Searching for "API logs" wouldn't find a section titled "Execution tracking endpoints". So I built an MCP server that indexes your vault into Qdrant with local embeddings and lets any MCP-compatible agent search it semantically. The idea is to keep a single Obsidian vault as the documentation hub for all your projects. Instead of scattering docs across repos or wikis, everything lives in one place — and the agent can search across projects or filter down to a specific one. Qdrant handles the heavy lifting, so even large vaults with hundreds of files stay fast without dumping everything into the context window. What it does: Chunks markdown by headings, never breaking tables or code blocks Embeds everything locally with BAAI/bge-small-en-v1.5 (384 dim, no API keys) Auto-starts Qdrant via Docker if it's not running Filters by project, doc type, or frontmatter tags Incremental indexing — only re-embeds changed files Returns only the relevant chunks, not entire files Works with Claude Code, Cursor, Windsurf, or any MCP client. GitHub: https://github.com/Marco-O94/obsidian-qdrant-search PyPI: https://pypi.org/project/obsidian-qdrant-search/ Would love feedback — especially on chunking strategies, embedding model choices, and bug reports. I'm sure there are edge cases I haven't hit yet. Issues and PRs welcome. submitted by /u/Marco_o94 [link] [comments]
View originalRepository Audit Available
Deep analysis of qdrant/qdrant — architecture, costs, security, dependencies & more
Yes, Qdrant offers a free tier. Pricing found: $50
Qdrant has an average rating of 4.5 out of 5 stars based on 12 reviews from G2, Capterra, and TrustRadius.
Key features include: Expansive Metadata Filters, Native Hybrid Search (Dense + Sparse), Built-in Multivector, Efficient, One-Stage Filtering, Full-Spectrum Reranking, Qdrant Cloud, Qdrant Hybrid Cloud, Qdrant Private Cloud.
Qdrant is commonly used for: Build AI Search the Way You Want, Semantic Search.
Qdrant integrates with: AWS, GCP, Azure, Kubernetes, OpenAI, Hugging Face, Elasticsearch, Redis, Docker, Prometheus.
Qdrant has a public GitHub repository with 29,940 stars.

Late Interaction Basics | Qdrant Multi-Vector Search
Mar 24, 2026
Based on user reviews and social mentions, the most common pain points are: token usage, cost tracking.
Based on 22 social mentions analyzed, 14% of sentiment is positive, 86% neutral, and 0% negative.