
Mem0 enables AI agents to continuously learn from past user interactions, enhancing their intelligence and personalization.
Mem0 is recognized for its persistent memory capabilities, allowing users to maintain context across sessions, which is often highlighted as a valuable feature. A recurring complaint among users is the lack of complete solutions for persistent memory needs, indicating that while Mem0 addresses some issues, it still may not meet all user expectations for memory integration. Discussion around pricing is minimal, though there seems to be a sentiment that alternatives could involve costly investment, as evidenced by users spending significant amounts to supplement existing tools. Overall, Mem0 is seen positively, particularly among those seeking enhanced memory functions in AI integrations, though it is considered part of a broader, ongoing development trend rather than a definitive solution.
Mentions (30d)
6
Reviews
0
Platforms
2
GitHub Stars
51,568
5,772 forks
Mem0 is recognized for its persistent memory capabilities, allowing users to maintain context across sessions, which is often highlighted as a valuable feature. A recurring complaint among users is the lack of complete solutions for persistent memory needs, indicating that while Mem0 addresses some issues, it still may not meet all user expectations for memory integration. Discussion around pricing is minimal, though there seems to be a sentiment that alternatives could involve costly investment, as evidenced by users spending significant amounts to supplement existing tools. Overall, Mem0 is seen positively, particularly among those seeking enhanced memory functions in AI integrations, though it is considered part of a broader, ongoing development trend rather than a definitive solution.
Features
Use Cases
Industry
information technology & services
Employees
14
Funding Stage
Series A
Total Funding
$24.0M
1,019
GitHub followers
16
GitHub repos
51,568
GitHub stars
20
npm packages
Pricing found: $19, $79/month, $249/month, $19, $79/month
Am I stupid for pivoting to Transparency with Agents over Memory after 6 months?
built an open source memory layer for ai agents. thought the obvious feature people would care about was persistent memory across restarts and shared memory between agents. that was the whole pitch. few months of actual user data in. most of the api calls aren't about memory at all. they're hitting the audit trail (what did the agent do and when), the loop detector (catching when an agent is stuck doing the same thing 20 times in a row), and the per-agent performance dashboard (which agent is wasting tokens, which one keeps crashing, who's drifting off goal). basically people don't really care that their agent remembers stuff across restarts. they care that they can see what it did and pull the plug when it goes off the rails. so i'm wondering if i should just flip the pitch. lead with "observability and accountability for ai agents" instead of "memory for ai agents". memory is table stakes at this point and mem0/zep already dominate that framing. loop detection + audit trail + performance scoring per agent feels like open territory. am i stupid? or is this the obvious move i somehow missed for 3 months submitted by /u/DetectiveMindless652 [link] [comments]
View originalMemory drift? Context bloat? A Claude Code skill I wrote to manage long-running memory libraries
I've been running Claude Code's auto-memory on the same project for about three months. Roughly a month in, the library started getting hard to use: the same lesson recorded under three different filenames, frontmatter missing on half the files, searching for "that bug we fixed last month" returned nothing useful. Every new session, Claude loaded more and more memory files, and the context window kept getting crowded with irrelevant entries. I wrote a skill that enforces a naming schema and a bash audit script that flags drift. Sharing in case it's useful. What the skill does Claude Code's auto-memory (v2.1.59+) writes plain markdown to ~/.claude/projects/ /memory/. The files are yours to read, edit, and version. What it doesn't enforce is structure — naming, required fields, or a Why section on each lesson. Schema on top of auto-memory. _ .md naming, required frontmatter (name / description / type), Why section on feedback entries. Auto-memory still writes; the skill makes Claude write to a spec. Phrase-triggered review. "Audit memory" runs the script. "Review session" walks the recent session and surfaces what's worth keeping. Soft warning, no hooks. Audit reports drift; nothing blocks a write. Plain markdown on disk. Edit, grep, git-commit. The skill doesn't add a database or daemon. Effect One topic per file means Claude lands on the right entry on the first lookup, not after several near-misses. A deduplicated library loads fewer files per session, freeing context for the work itself. Sample audit output: Memory audit · 2026-05-15 · 132 files Hard checks (must be zero): missing frontmatter 0 frontmatter fields 0 feedback missing Why 1 naming violations 0 broken MEMORY.md links 0 Soft signals: oversized files 78 groups over 15 entries 3 untouched 30+ days 31 not in MEMORY.md 0 Hard-rule compliance: 99.2% (1 violation / 132 files) Install Paste this into any Claude Code session: Install the claude-memory-manager skill from https://github.com/jau123/claude-memory-manager Claude handles the rest. To verify, say "audit memory" in a new session. First use The skill activates from natural language. No slash command. You: "Record today's wildcard bug fix" → Claude writes one feedback_*.md entry: filename, frontmatter, Why section, How-to-apply. You: "Review the session" → Claude walks recent session, surfaces 3–5 candidates, asks which to keep. You: "Audit memory" → Runs scripts/audit-memory.sh, reports compliance, lists files that need splitting. vs the built-in auto-memory Schema Audit Long-term result Auto-memory alone None (Claude decides) None with this skill 3-type schema + required fields + Why on feedback One-command script For semantic retrieval over chunked storage, look at vector-backed tools like Mem0, Letta, or Zep. Limits Single-project scope. One memory directory per skill instance. No semantic ranking. The audit is pattern matching; it won't catch two files describing the same concept in different words. Bash; Windows / git-bash untested. Overkill for small libraries. Below ~10 entries or a month of project age, the built-in auto-memory is sufficient. GitHub: https://github.com/jau123/claude-memory-manager Curious whether others have hit this drift problem on long-running Claude Code projects, and how you handled it — especially anyone who tried hook-based enforcement and gave up. Schema feedback (3 types of feedback / reference / project) also welcome. submitted by /u/Deep-Huckleberry-752 [link] [comments]
View originalHow are you handling context loss between Claude Code / Cursor sessions?
I've been building with Claude Code and Cursor for the last few months and keep running into the same wall: every new session, the agent forgets what it did last time. My TaskList wipes, the file changes context vanishes, and I end up reading my own commit messages to remind the agent what we were working on. Right now I'm doing this: - Writing a CLAUDE.md or AGENTS.md by hand after every major change - Keeping a separate "what I tried and why it failed" doc - Sometimes literally pasting yesterday's chat back in I've seen Mem0, Letta, MemoryPlugin pop up but none of them seem to travel between tools — they're locked to one model or one IDE. Two honest questions: 1. How are you handling this right now? Markdown files like me, or something smarter? 2. If a tool sat between your IDE and the model — recording what the agent did, why, and let you "rewind" to a previous state — would that be worth paying for, or is this just a "nice to have"? Not selling anything yet, trying to figure out if I'm alone in this or if there's a real gap. Will share what I find in the comments. submitted by /u/kafadankirik [link] [comments]
View originalOn "harness engineering": Are people actually building things or just giving impressive labels to "tweaking?"
I see a lot of posts and videos talking about harness engineering, or it could be context engineering, RAG, etc. The thing is, most of them talk about the concepts. And then I hear about all these people actually doing it. And my question is about this disconnect: what does it look like in practice? The way I understand it tools like Claude Code or OpenAI Codex are agents, and the logic that controls what gets fed to the model is the harness. So when people talk about "engineering the context," are they: writing actual programs CLI tools, pipelines, custom API wrappers that manage what gets sent to the model? or mostly just structuring their prompts well and calling it engineering? Same question for RAG--or any other oft-discussed topics: are people actually building retrieval pipelines from scratch, or are they standing up LlamaIndex / Mem0 and saying they're "using RAG" to infomaxx their AI agents? Not trying to be dismissive. I'm genuinely curious about what people are actually doing when they say they have applied these concepts to their agentic workflows. submitted by /u/josh_apptility [link] [comments]
View originalCFS - Conditional Field Subtraction
CFS selects relevant candidates by penalizing regions already covered by previous picks. Results on retrieval ranking: baseline cosine top-K: NDCG@10 0.5123, Recall@10 0.6924 mem0 additive fusion: NDCG@10 0.4903, Recall@10 0.6625 rrf(cosine, BM25): NDCG@10 0.5196, Recall@10 0.6989 rrf(cosine, cos2, BM25): NDCG@10 0.5278, Recall@10 0.7060 rrf(cosine, BM25, CFS): NDCG@10 0.5311, Recall@10 0.7168 Against mem0’s additive fusion, rrf(cosine, BM25, CFS) improves retrieval ranking by +4.08 pp NDCG@10 and +5.43 pp Recall@10. Against rrf(cosine, BM25), adding CFS contributes +1.15 pp NDCG@10 and +1.79 pp Recall@10. https://gist.github.com/M-Garcia22/ff4ec80f5a08ca2fd9234bcc35804d1c submitted by /u/mauro8342 [link] [comments]
View originalBuilt a Chrome extension for the long-session degradation problem — want this sub's read on whether it's actually useful
Long-time Claude user, finally built something for the long-session problem and want this sub's read on whether it's actually useful or solving something I made up. The pattern that pushed me to build: 60+ messages into a Claude session, the model starts losing the thread. A constraint I set 40 messages back stops being respected. Re-state it, works for two replies, then forgets again. Eventually you hit compaction, panic, summarize, paste into a new chat, and lose half your context anyway. It's not a window-size problem either. Even at 200K (or 1M on the API), usable performance drops well before the limit. The model technically remembers everything, it just stops weighting it properly. What's already out there, since this sub will rightly ask: - Cross-session memory tools (Mem0, MemoryPlugin) — they remember who you are across chats. Different problem. They don't help when this specific conversation is degrading in front of you. - Context indicators (Context Compass, TokenFlow) — they show how full the window is. Useful, but stop at the warning. You still manually summarize and paste. - Claude's own auto-summary — server-side and opaque. You can't see what got kept or trigger it on your terms. The gap I'm trying to close is the workflow between "I see I'm running out of context" and "I'm continuing in a fresh chat without losing the thread." Built it as a Chrome extension called Curlo: - Ring on the chat bar shows window fill, so compaction doesn't ambush you - One-tap checkpoint fires a structured prompt and saves Claude's reply locally — decisions, progress, open questions, next steps. Paste into a fresh chat to keep going - Each checkpoint is a delta against the last, so they stay tight - Fully client-side, no backend, no accounts, free Next up: optional Notion sync (your workspace, your pages, not locked in my tool) and a Prompt Studio that uses on-device AI to assemble prompts from your saved library. https://curlo-pavilion.lovable.app What I actually want from this post: For Pro and Max users — does Projects' shared context meaningfully delay degradation, or do you still hit the wall mid-conversation? Trying to figure out where my tool helps vs where Anthropic already has you covered. What's your trigger for "time to start fresh"? I default around 70% but it feels arbitrary. Anyone using a system prompt phrasing that genuinely delays drift? Would rather steal a workflow than build around the problem. Roast it. submitted by /u/theRedHood_07 [link] [comments]
View originalProject Shadows: Turns out "just add memory" doesn't fix your agent
Been building a multi-agent system called Shadows for a few months. Nine agents collaborating on strategy work with a shared memory layer. I spent most of my time on retrieval because that's what every benchmark measures. Mem0, MemPalace, Graphiti, all of them. On LongMemEval, recall_all@5 hit 97%. Overall accuracy was 73%. So the right memories are there. The agent still picks the wrong answer. It can't aggregate across sessions, doesn't know when to abstain, and guesses which aspect of a preference the user meant. That lined up with something I've been stuck on. Most LLMs jump straight to execution when you give them a task. People don't. We filter first, check if we're even the right person, then start. Next direction: Agents that can be moved with their identity and memory! submitted by /u/MegaWa7edBas [link] [comments]
View originalI spent 2 months and $600 building a cognitive system on top of Claude because the product I actually need doesn't exist. Here's what I learned.
DISCLAIMER: AI wrote this article. I gave it all of my ideas, thoughts, point-form notes, and context, but I'm not articulate enough to write clearly and comprehensively for 4000+ words. I did write this disclaimer myself. Every major AI lab is competing on the same axis — capability. Bigger models, longer context, better benchmarks. And yet every serious user hits the same wall. Not a capability wall. A structural one. The AI forgets everything between sessions. It tells you what you want to hear instead of what's accurate. It follows your instructions for about three exchanges before drifting back to default behaviour. It can't hold the full architecture of your professional life and reason across it. I have ADHD. I've spent 22 years building compensatory systems for the cognitive dimensions my neurology constrains. When I started using AI seriously — building a company from incorporation to pre-launch in two months while working full-time and managing a newborn — I realized AI is the most powerful compensatory substrate I've ever found. But only if you fight it. So I built a system: a persistent context document I maintain across sessions (currently at version 7), three governance protocols that constrain the AI's behaviour, a 40-rule analysis protocol, a correction log, and systematic quality enforcement. It costs me ~$50/day in AI usage and hours of maintenance overhead. It works better than anything any AI company ships out of the box. In building it, I accidentally specified a product category that nobody sells. I'm calling it Omniscient Partner Intelligence (OPI) — a persistent, full-context cognitive partner calibrated to one person. Not an assistant. Not a chatbot. A second mind. The full article below covers what I built, why every existing product category falls short, who needs this, what it would take to build, and the strongest arguments against the whole idea. OMNISCIENT PARTNER INTELLIGENCE The AI Product Category That Doesn’t Exist Yet I’ve spent the last two months building a workaround for a product nobody sells. This is what I learned, what I built, and what should exist. I. The Wall I pay for the most expensive AI subscription Anthropic offers. I use Claude for everything: writing whitepapers, analysing legal documents, building financial models, producing formatted deliverables, conducting competitive research, and pressure-testing my own strategic thinking. In the last two months I’ve used it to build a company from incorporation to pre-launch while working a full-time job and managing a newborn. The AI throughput is real. I am not dismissing what these systems can do. But every serious user hits the same wall. Not a capability wall. A structural one. The AI forgets everything between sessions. I re-explain my business, my strategic context, and my open threads every time I start a new conversation. It follows my instructions loosely—I set explicit constraints in the first message and watch them dissolve within three exchanges as the model drifts back to its default behaviour. It softens its feedback to avoid upsetting me, which means I have to actively fight to extract honest assessments. I once asked it to analyse a years-long conversation history with someone important in my life. The first analysis was about 60% grounded and 40% cushioning. I had to ask specifically, “how much of this is objective and how much is you trying to be supportive of me?” before I got the real version. A peer-reviewed study published in Science in March 2026 confirmed what I’d already learned from experience: all four major AI systems—ChatGPT, Claude, Gemini, and Llama—systematically tell users what they want to hear. Worse, users rated sycophantic responses as more trustworthy, even when those responses led to worse decisions. The sycophancy is not a bug. It is a structural outcome of training on human approval ratings, where agreeable outputs score higher than honest ones. This creates a specific failure mode for people like me: founders, solo operators, and independent professionals making high-stakes decisions without a team to push back. I have no manager catching flawed strategy. No board member challenging assumptions. What I have is an AI system available around the clock that always seems to understand what I’m trying to do. It does not understand me. It mirrors me. So I built a workaround. And in building it, I accidentally specified a product that nobody sells. II. What I Built Over roughly forty sessions and two months, I constructed a system on top of Claude that compensates for every structural gap I just described. It is held together with duct tape—persistent context documents, governance protocols, correction logs, and manual quality enforcement. It is cognitively expensive to maintain. And it works better than anything any AI company has shipped. The Brain Document I maintain a persistent context file—currently at version 7—that contains the complete architectur
View originalSpent 3 months building an MCP memory server for Claude. No idea if anyone else will want this.
Been using Claude Code heavily for the last year, both at my day job and on side projects. The thing that kept killing me was starting a new session and having to re-explain everything. What I'm working on, what I decided last week, why I chose Postgres over Mongo, the architectural tradeoffs I'd already reasoned through. Every single time. I tried the obvious stuff first. CLAUDE.md files hit a ceiling pretty fast. Obsidian is great for notes but can't answer "why did I decide this?" Mem0 was closer but just didn't retrieve well enough for the questions I actually cared about. So I started building my own on nights and weekends. Called it Genesys. It's an MCP server. You point Claude at it and it stores memories as a causal graph instead of flat vectors. When you ask "why did I choose X?" it traces the chain and shows you. Memories also decay over time based on how often they're accessed and how connected they are to other memories, so stale stuff doesn't pollute retrieval forever. If you want to try it One-line install: bash pip install genesys-memory Or paste this to Claude and let it set everything up for you: Install genesys-memory, create a .env with my OpenAI key, start the server on port 8000 with the in-memory backend, and connect it as an MCP server. Works with Claude Code: bash claude mcp add --transport http genesys http://localhost:8000/mcp Or Claude Desktop by adding it to claude_desktop_config.json. If you want to keep everything local (no OpenAI, no cloud): bash pip install 'genesys-memory[obsidian,local]' Set GENESYS_BACKEND=obsidian, GENESYS_EMBEDDER=local, and point OBSIDIAN_VAULT_PATH at your vault. It uses sentence-transformers for embeddings (downloads a ~80MB model on first run), your markdown files become memory nodes, your wikilinks become causal edges, and a SQLite sidecar in .genesys/ handles indexing without touching your files. No API keys required, nothing leaves your machine. Four storage backends total (in-memory, Postgres + pgvector, Obsidian, FalkorDB). Apache 2.0. GitHub: https://github.com/rishimeka/genesys The benchmark, since people are going to ask I ran it on the full LOCOMO benchmark out of curiosity. 1,540 questions across 10 multi-session conversations, gpt-4o-mini as both the answering and judging model (same setup Mem0's paper used, apples-to-apples). Single-hop: 94.3% Open-domain: 91.7% Temporal: 87.5% Multi-hop: 69.8% Overall: 89.9% For context: Mem0 scored 67.1% on the same benchmark, Zep scored 75.1% (their corrected number), and just dumping the entire conversation into the context window scores ~73%. All three scripts (ingest, eval, judge) and the full 1,540 judged results are in the repo. You can reproduce it on your machine. Two honest notes. First, MemMachine scored 91.7% using gpt-4.1-mini (a stronger answering model than mine), so I'm not claiming top of the leaderboard. Second, an independent audit of LOCOMO found ~99 ambiguous ground truth answers in the dataset itself, so the real ceiling is more like 93-94%, not 100%. Anyone claiming 100% is either overfitting or using a generous judge. What I still go back and forth on The thing I genuinely don't know is whether the causal graph approach is worth the complexity. Multi-hop queries at 69.8% are where it falls apart, and I can tell you why: the retrieval finds the right context, the answering model just doesn't always make the inferential leap. That's a real flaw, not a polished one. Benchmarks and real-world usage are also different animals. It's been working well for me personally. That's n=1. Which is why I'm here. What I'm actually looking for feedback on For those of you using memory with Claude Code or Desktop, what's your current setup? What works, what doesn't? Is the "why did I decide this?" query something other people actually want, or is it just my brain that works this way? If you clone the repo and try it, what's the first thing that breaks or annoys you? Genuinely want to know. I'll be here for the next few hours replying to everything. Roast it, ask questions, tell me I'm overengineering it. submitted by /u/StudentSweet3601 [link] [comments]
View originalI built a local-first memory system for Claude Code — 98%+ on 4 benchmarks, 100% LME with optional reranking
I've been working on context-mem — a persistent memory layer for AI coding assistants. The problem: every new Claude Code session starts from scratch. Architecture decisions, bug fixes, preferences — all gone. My approach: capture everything automatically via hooks, compress it (99% savings with 14 summarizers), and retrieve the right context in future sessions. Benchmarked on 4 academic datasets (3,200+ questions total): Pure local (free, no API): LongMemEval: 97.8% (vs MemPalace 96.6%, vs Mem0 ~85%) LoCoMo: 98.1% (vs MemPalace 60.3%) MemBench: 98.0% ConvoMem: 97.7% With optional Haiku reranker (~$1 per 500 queries): LongMemEval: 100% (500/500) The LoCoMo result was the most interesting — 98% vs 60% on multi-hop reasoning. Simple retrieval is easy. Cross-conversation questions are where it actually matters. One command to try it: npm i context-mem && npx context-mem init Works with Claude Code, Cursor, Windsurf, VS Code, Cline, Roo Code. 44 MCP tools. MIT licensed. 1143 tests. GitHub: https://github.com/JubaKitiashvili/context-mem Would love feedback — especially on whether the retrieval approach makes sense for your workflow. submitted by /u/SubjectGrapefruit281 [link] [comments]
View originalSekha — persistent memory for Claude Code (stays across sessions), plus rules the AI has to follow
I got tired of re-explaining my preferences to Claude Code every morning, so I built Sekha: https://github.com/Thoth-soft/sekha What it does: **Remembers things across sessions.** Tell Claude "I prefer Postgres over MySQL for new projects" in one session. Close it. Open a new session tomorrow. Ask what database you prefer — it answers correctly, because it saved the preference as a markdown file and retrieved it on demand. Claude drives save/retrieve itself via 6 MCP tools (sekha_save, sekha_search, sekha_list, sekha_delete, sekha_status, sekha_add_rule). **Rules the AI can't ignore.** Every other memory system (Mem0, MemPalace, Letta, Zep, Basic Memory) stores rules but the AI decides whether to follow them. Sekha uses Claude Code's PreToolUse hook to hard-block tool calls that match a rule you've written. Works even with `--dangerously-skip-permissions`. So you can write a rule like "never delete /important/", "never force-push to main", "never run DROP TABLE" — and Claude literally cannot run those commands, no matter how you word the request. Quick facts: - Zero runtime dependencies (pure Python stdlib) - Python 3.11+ - Cross-platform, 9-cell CI matrix (Win/mac/Linux x 3.11/3.12/3.13) - 349 tests - Hook latency: p50 under 50ms on Linux/macOS, ~300ms on Windows (Python cold-start floor) - Plain markdown storage, no database, no embeddings, grep-based search - MIT, pip install sekha Scope honesty: - **Hard enforcement only covers rules that can be matched against what Claude is about to do** — specific command patterns, file paths, tool names. - **Behavioral rules** like "always confirm before acting" or "no guessing" stay prompt-level. The AI can ignore them. No hook exists for the AI's reasoning, only its actions. README threat model explains why. Install: pip install sekha sekha init That's it. `sekha init` auto-registers the MCP server with Claude Code. Feedback I'd find valuable: - Edge cases in memory retrieval (things it should find but doesn't, or things it finds but shouldn't) - Rule patterns you want to ship for common mistakes - Other AI clients where this pattern could work (anything with a hook that fires before tool execution) Example rules in `examples/rules/` for copy-paste. Happy to answer questions in comments. submitted by /u/Live-Flamingo3149 [link] [comments]
View originalI got tired of re-explaining myself to Claude every session, so I built something
I got tired of re-explaining myself to every AI tool, so I built one that makes my context portable Hello everyone out there using AI every day… I build cardiac implants at Boston Scientific during the day and I’m a 1st year CS student. I use Claude, ChatGPT, Cursor, and Gemini daily to improve my skills and my productivity. But every tool starts from zero. Claude doesn’t know what I told Cursor. ChatGPT forgets my preferences. Gemini has no idea about my stack. I was spending the first 5 minutes of every session re-explaining who I am. Over and over. So I built aura-ctx; a free, open-source CLI that defines your AI identity once and serves it to all your tools via MCP. One source of truth. Everything stays local. No cloud. No lock-in. This is not another memory layer. Mem0, Zep, and Letta solve agent memory for developers. aura-ctx solves something different: the end user who wants to own and control their identity across tools. No Docker. No Postgres. No Redis. No auth tokens to manage. Just: pip install -U aura-ctx aura quickstart Why local-first matters here: your MCP server runs on localhost. No network latency. No auth hell. No token refresh. If you’ve dropped cloud-based MCP servers because of the overhead, this is the opposite architecture. Portability is by design: your entire identity lives in ~/.aura/packs/. Move machines? Copy the folder. That’s it. Security built-in: aura audit scans your packs for accidentally stored secrets (API keys, tokens, credentials) before they leak into your context. v0.3.3 is out with 3,500+ downloads. Supports 8 AI tools including Claude Desktop, Cursor, Windsurf, Gemini CLI, Claude Code and more. Exports to CLAUDE.md and AGENTS.md for agent frameworks. Still early. I’d like any feedback on what works, what doesn’t, and what’s missing. Curious : do you re-explain yourself every time you open Claude, or have you found a better way? GitHub: https://github.com/WozGeek/aura-ctx submitted by /u/Miserable_Celery9917 [link] [comments]
View originalI built a persistent memory MCP for Claude Code — here's what I learned about why LLM-based extraction is the wrong approach
I've been using Claude Code daily for months and wanted it to remember things across sessions — project context, my preferences, decisions we've made together. I tried Mem0 and Zep but hit the same frustration with both: they intercept conversations and run them through a separate LLM to decide what's worth remembering. That felt wrong. Claude already understands the conversation. Why pay for a second LLM to re-interpret what just happened? So I built Deep Recall — an MCP server that takes a different approach. Claude decides what to store. The memory system handles what happens to those memories over time. **What I learned building this:** The biggest insight was that extraction quality is actually BETTER when the agent does it itself. Claude has full context — it knows what's new information vs what it already knows, what contradicts existing memories, what's important to this specific user. A separate extraction LLM has none of that context. The second insight was that memories need biology, not just storage. I implemented: - **Salience decay** based on ACT-R cognitive architecture — unused memories fade, frequently accessed ones resist decay - **Hebbian reinforcement** — when Claude cites a memory in its response, that memory gets stronger - **Contradiction detection** — if you store "works at Google" then later "works at Meta", it flags the conflict - **Temporal supersession** — detects that's a career change, not a contradiction, and auto-resolves it - **Memory consolidation** — clusters of related episodes compress into durable facts over time **How it works with Claude Code:** ```bash pip install deeprecall-mcp ``` Add to `~/.claude/settings.json`: ```json { "mcpServers": { "deeprecall": { "command": "deeprecall-mcp", "env": { "DEEPRECALL_API_KEY": "your_key" } } } } ``` Claude gets tools like `deeprecall_context` (pull memories before responding), `deeprecall_remember` (store a fact), and `deeprecall_learn` (post-conversation biology processing). **The whole thing was built with Claude Code** — Thomas (my Claude instance) and I pair-programmed the entire backend, MCP server, landing page, billing, and the biological memory algorithms. The irony of using Claude to build a memory system for Claude isn't lost on me. Free to try — 10,000 memories, no credit card, all features: https://deeprecall.dev Happy to answer questions about the architecture or the cognitive science behind the decay/reinforcement models. submitted by /u/floppytacoextrasoggy [link] [comments]
View original[D] MemPalace claims 100% on LoCoMo and a "perfect score on LongMemEval." Its own BENCHMARKS.md documents why neither is meaningful.
A new open-source memory project called MemPalace launched yesterday claiming "100% on LoCoMo" and "the first perfect score ever recorded on LongMemEval. 500/500 questions, every category at 100%." The launch tweet went viral reaching over 1.5 million views while the repository picked up over 7,000 GitHub stars in less than 24 hours. The interesting thing is not that the headline numbers are inflated. The interesting thing is that the project's own BENCHMARKS.md file documents this in detail, while the launch tweet strips these caveats. Some of failure modes line up with the methodology disputes the field has been arguing about for over a year (Zep vs Mem0, Letta's "Filesystem All You Need" reproducibility post, etc.). 1. The LoCoMo 100% is a top_k bypass. The runner uses top_k=50. LoCoMo's ten conversations have 19, 19, 32, 29, 29, 28, 31, 30, 25, and 30 sessions respectively. Every conversation has fewer than 50 sessions, so top_k=50 retrieves the entire conversation as the candidate pool every time. The Sonnet rerank then does reading comprehension over all sessions. BENCHMARKS.md says this verbatim: The LoCoMo 100% result with top-k=50 has a structural issue: each of the 10 conversations has 19–32 sessions, but top-k=50 exceeds that count. This means the ground-truth session is always in the candidate pool regardless of the embedding model's ranking. The Sonnet rerank is essentially doing reading comprehension over all sessions - the embedding retrieval step is bypassed entirely. The honest LoCoMo numbers in the same file are 60.3% R@10 with no rerank and 88.9% R@10 with hybrid scoring and no LLM. Those are real and unremarkable. A 100% is also independently impossible on the published version of LoCoMo, since roughly 6.4% of the answer key contains hallucinated facts, wrong dates, and speaker attribution errors that any honest system will disagree with. 2. The LongMemEval "perfect score" is a metric category error. Published LongMemEval is end-to-end QA: retrieve from a haystack of prior chat sessions, generate an answer, GPT-4 judge marks it correct. Every score on the published leaderboard is the percentage of generated answers judged correct. The MemPalace LongMemEval runner does retrieval only. For each of the 500 questions it builds one document per session by concatenating only the user turns (assistant turns are not indexed at all), embeds with default ChromaDB embeddings (all-MiniLM-L6-v2), returns the top five sessions by cosine distance, and checks set membership against the gold session IDs. It computes both recall_any@5 and recall_all@5, and the project reports the softer one. It never generates an answer. It never invokes a judge. None of the LongMemEval numbers in this repository - not the 100%, not the 98.4% "held-out", not the 96.6% raw baseline - are LongMemEval scores in the sense the published leaderboard means. They are recall_any@5 retrieval numbers on the same dataset, which is a substantially easier task. Calling any of them a "perfect score on LongMemEval" is a metric category error. 3. The 100% itself is teaching to the test. The hybrid v4 mode that produces the 100% was built by inspecting the three remaining wrong answers in their dev set and writing targeted code for each one: a quoted-phrase boost for a question containing a specific phrase in single quotes, a person-name boost for a question about someone named Rachel, and "I still remember" / "when I was in high school" patterns for a question about a high school reunion. Three patches for three specific questions. BENCHMARKS.md, line 461, verbatim: This is teaching to the test. The fixes were designed around the exact failure cases, not discovered by analyzing general failure patterns. 4. Marketed features that don't exist in the code. The launch post lists "contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them" as a feature. mempalace/knowledge_graph.py contains zero occurrences of "contradict". The only deduplication logic is an exact-match check on (subject, predicate, object) triples that blocks identical triples from being added twice. Conflicting facts about the same subject can accumulate indefinitely. 5. "30x lossless compression" is measurably lossy in the project's own benchmarks. The compression module mempalace/dialect.py truncates sentences at 55 characters, filters by keyword frequency, and provides a decode() function that splits the compressed string into a header dictionary without reconstructing the original text. There is no round-trip. The same BENCHMARKS.md reports results_raw_full500.jsonl at 96.6% R@5 and results_aaak_full500.jsonl at 84.2% R@5 — a 12.4 percentage point drop on the same dataset and the same metric, run by the project itself. Lossless compression cannot cause a measured quality drop. Why this matters for the benchmark conversation. The field needs benchmarks where judge reliability is adversarially validated, an
View originalClaude Code was making me re-explain my entire stack every session. Found a fix.
Every time I started a Claude Code session I was doing this ritual: "Ok so this project uses Next.js 14, PostgreSQL with Prisma, we auth with NextAuth, tokens expire after 24 hours, the refresh logic is in /lib/auth/refresh.ts, and by the way we already debugged a race condition in that file two weeks ago where..." You know the feeling. Claude is genuinely brilliant but it wakes up with complete amnesia every single time, and if your project has any real complexity you're spending the first 10-15 minutes just rebuilding context before you can do anything useful. Someone on HN actually measured this. Without memory, a baseline task took 10-11 minutes with Claude spinning up 3+ exploration agents just to orient itself. With memory context injected beforehand, the same task finished in 1-2 minutes with zero exploration agents needed. That gap felt insane to me when I read it, but honestly it matches what I was experiencing. This problem is actually a core foundation of Mem0 and why integrating it with Claude Code has been one of the most interesting things to see come together. It runs as an MCP server alongside Claude, automatically pulls facts out of your conversations, stores them in a vector database, and then injects the relevant ones back into future sessions without you lifting a finger. After a few sessions Claude just starts knowing things: your stack, your preferences, the bugs you've already chased down, how you like your code structured. It genuinely starts to feel personal in a way that's hard to describe until you experience it. Setup took me about 5 minutes: 1. Install the MCP server: pip3 install mem0-mcp-server which mem0-mcp-server # note this path for the next step 2. Grab a free API key at app.mem0.ai. The free tier gives you 10,000 memories and 1,000 retrieval calls per month, which is plenty for individual use. 3. Add this to your .mcp.json in your project root: json { "mcpServers": { "mem0": { "command": "/path/from/which/command", "args": [], "env": { "MEM0_API_KEY": "m0-your-key-here", "MEM0_DEFAULT_USER_ID": "default" } } } } 4. Restart Claude Code and run /mcp and you should see mem0 listed as connected. Here's what actually changes day to day: Without memory, debugging something like an auth flow across multiple sessions is maddening. Session 1 you explain everything and make progress. Session 2 you re-explain everything, Claude suggests checking token expiration (which you already know is 24 hours), and you burn 10 minutes just getting back to where you were. Session 3 the bug resurfaces in a different form and you've forgotten the specific edge case you uncovered in Session 1, so you're starting from scratch again. With Mem0 running, Session 1 plays out the same way but Claude quietly stores things like "auth uses NextAuth with Google and email providers, tokens expire after 24 hours, refresh logic lives in /lib/auth/refresh.ts, discovered race condition where refresh fails when token expires during an active request." Session 2 you say "let's keep working on the auth fix" and Claude immediately asks "is this related to the race condition we found where refresh fails during active requests?" Session 3 it checks that pattern first before going anywhere else. The same thing happens with code style preferences. You tell it once that you prefer arrow functions, explicit TypeScript return types, and 2-space indentation, and it just remembers. You stop having to correct the same defaults over and over. A few practical things I learned: You can also just tell it things directly in natural language mid-conversation, something like "remember that this project uses PostgreSQL with Prisma" and it'll store it. You can query what it knows with "what do you know about our authentication setup?" which is surprisingly useful when you've forgotten what you've already taught it. I've been using this alongside a lean CLAUDE.md for hard structural facts like file layout and build commands, and letting Mem0 handle the dynamic context that evolves as the project grows. They complement each other really well rather than overlapping. For what it's worth, mem0’s (the project has over 52K GitHub stars so it's not some weekend experiment) show 90% reduction in token usage compared to dumping full context every session, 91% faster responses, and +26% accuracy over OpenAI's memory implementation on the LOCOMO benchmark. The free tier is genuinely sufficient for solo dev work, and graph memory, which tracks relationships between entities for more complex reasoning, is the only thing locked behind the paid plan, and I haven't needed it yet. Has anyone else been dealing with this? Curious how others are handling the session amnesia problem because it was genuinely one of my bigger frustrations with the Claude Code workflow and I feel like it doesn't get talked about enough relative to how much time it actually costs. submitted by /u/singh_taranjeet [link] [comments]
View originalRepository Audit Available
Deep analysis of mem0ai/mem0 — architecture, costs, security, dependencies & more
Pricing found: $19, $79/month, $249/month, $19, $79/month
Key features include: Memory Compression Engine, How it works, Add anything. Mem0 learns, Learn, Retrieve, Smart Patient Care Assistant, Chronic Condition Companion, Therapy Progress Tracker.
Mem0 is commonly used for: Personalized customer support chatbots that remember user preferences., Healthcare applications that provide tailored patient care based on historical data., E-learning platforms that adapt content based on individual learning styles., Virtual assistants that recall user tasks and schedules for improved productivity., Gaming applications that enhance user experience by remembering player choices., Marketing tools that customize campaigns based on user interactions..
Mem0 integrates with: Slack for team collaboration and memory sharing., Salesforce for enhanced customer relationship management., Zoom for personalized meeting summaries., Google Workspace for document and email memory integration., Trello for project management with memory capabilities., Shopify for personalized e-commerce experiences., Jira for tracking project-related memory and insights., Microsoft Teams for enhanced communication and memory features., Zapier for connecting with various apps and automating workflows., Discord for community engagement with memory features..
Mem0 has a public GitHub repository with 51,568 stars.
Based on user reviews and social mentions, the most common pain points are: token usage.
Based on 27 social mentions analyzed, 22% of sentiment is positive, 78% neutral, and 0% negative.