While there is limited direct feedback about InternLM in the reviews and social mentions provided, it seems to be relatively unknown or not widely discussed compared to other tools like MemPalace or Claude Engram. There are no specific strengths, complaints, or pricing comments available for InternLM from this data. The overall reputation cannot be determined accurately due to the absence of detailed opinions or evaluations in the provided context.
Mentions (30d)
2
1 this week
Reviews
0
Platforms
2
GitHub Stars
7,173
511 forks
While there is limited direct feedback about InternLM in the reviews and social mentions provided, it seems to be relatively unknown or not widely discussed compared to other tools like MemPalace or Claude Engram. There are no specific strengths, complaints, or pricing comments available for InternLM from this data. The overall reputation cannot be determined accurately due to the absence of detailed opinions or evaluations in the provided context.
Features
Use Cases
2,654
GitHub followers
45
GitHub repos
7,173
GitHub stars
2
npm packages
40
HuggingFace models
LLM-Rosetta — format conversion library across LLM API standards, doubles as a proxy
This started because we had a proprietary internal LLM API that spoke none of the standard formats. Built an internal conversion layer to bridge it, maintained that for over a year. As colleagues started adopting more and more coding tools — Claude Code, opencode, Codex, VS Code plugins, Goose, and whatever came out that week — each with its own API format expectations, maintaining separate adapters for each became the actual problem. That's what pushed the internal conversion layer into a proper generalized design, and llm-rosetta is the result. It's a Python library that converts between LLM API formats — OpenAI Chat, Responses/Open Responses, Anthropic, and Google GenAI. The idea is you convert through a shared IR so you don't end up writing N² adapters. The key difference from LiteLLM: LiteLLM is a unified calling layer that takes OpenAI-style input and transforms it into provider-native requests — one direction. llm-rosetta uses a hub-and-spoke IR, so each provider only needs one converter, and you get any-to-any conversion for free. Anthropic → Google, OpenAI Chat → Anthropic, whatever direction you need. Use it as a library — pip install and call convert() directly, no server needed. Or run the gateway if you want a proxy that handles the format translation for you. Zero required runtime dependencies either way. The HTTP server, client, and persistence layer are vendored from zerodep (https://github.com/Oaklight/zerodep), another project of mine — stdlib-only single-file modules, not someone else's library repackaged. The gateway ships with a Docker image if you'd rather not deal with Python env setup. You can also deploy it on HuggingFace Spaces or anything similar — admin panel, dashboard, request log, config management all included. Screenshots: https://llm-rosetta.readthedocs.io/en/latest/gateway/admin-panel/ We've been running it in production for about 5 months as the conversion layer for an internal multi-model access platform — needed to support various API standards and coding tool integrations before the upstream APIs were fully standardized. The Responses converter passes all 6 official Open Responses compliance tests (schema + semantic) from the spec repo. So if you're running Ollama, vLLM, or LM Studio with Responses endpoints, it should just work as one side of the conversion. There's a shim layer for provider-specific quirks — built-in shims for OpenRouter, DeepSeek, Qwen, xAI, Volcengine, etc. Converters stay generic per API standard, shims handle the edge cases declaratively. 24 cross-provider examples in the repo covering all provider pairs, SDK + REST, streaming, tool calls, image inputs, multi-turn with provider switching mid-conversation. GitHub: https://github.com/Oaklight/llm-rosetta Docs: https://llm-rosetta.readthedocs.io arXiv: https://arxiv.org/abs/2604.09360 Gateway screenshot: https://preview.redd.it/qzzjr2dcdw1h1.png?width=949&format=png&auto=webp&s=bce4293aae81059f794909fc37f85071cee34378 submitted by /u/Oaklight_dp [link] [comments]
View originalI built a local AI companion with GWT, IIT proxy, ChromaDB hybrid retrieval, and Ollama fallback — here's every architectural decision I made and why
Been building this for a while. Sharing now because it's past the point where I'm embarrassed by the code. **The stack:** * Python 3.12, 18k+ lines, 470+ tests passing * Gemini 2.5 Flash (primary) + Ollama qwen3:4b (local fallback via circuit breaker) * ChromaDB for persistence — hybrid retrieval weighted at 55% semantic / 25% importance / 20% recency * `sentence-transformers all-MiniLM-L6-v2` (384-dim) for local embeddings — fully offline, no API call needed for retrieval * SQLite for cognitive state * FastAPI web UI at `localhost:8765` plus Rich TUI and CLI modes **The part I want feedback on — the cognitive architecture:** The processing pipeline runs in phases: Perception → Reflection → Integration → Aspiration → Expression. 22 self-registering plugins compete for attention through a Global Workspace Theory implementation — capacity limit 5, competitive scoring, spotlight mechanism. There's also an IIT consciousness proxy (Φ approximation across a 7-dimension qualia space). I want to be upfront: this is a *proxy*, not a real Φ calculation. Full IIT computation is intractable at this scale. What it does is give the system a coherence signal it can actually respond to. **Modules worth looking at:** * [`being.py`](http://being.py/) — live mood, energy, curiosity, attachment, agency state. Affects downstream processing, not just output text. * [`homeostasis.py`](http://homeostasis.py/) — 7 survival needs that create internal pressure. When "coherence" is low the system responds differently than when it's high. * `self_modify.py` — assessment, lesson extraction, meta-learning loop. The model improves its own behavior patterns over time. * [`intuition.py`](http://intuition.py/) — 5 hunch types, felt-sense modeling, pattern validation history **Resilience:** Per-module circuit breakers, health monitor, 120s watchdog. The Ollama fallback kicks in if Gemini goes down mid-session — the user barely notices. **Why I gave it an INFJ personality model:** Honest answer — the cognitive stack (Ni/Fe/Ti/Se) mapped cleanly to architectural decisions I was already making. Ni = long-horizon retrieval weighting. Fe = relational context weighting. Ti = the internal critic pass. Se = the embodiment layer grounding abstract processing in a live body schema. Personality typing gave me a coherent *constraint system* to design against. It's not aesthetic, it's functional. Repo: [github.com/timeless-hayoka/infj-bot](https://github.com/timeless-hayoka/infj-bot) Specific things I want feedback on: the GWT scoring implementation, whether the IIT proxy framing is defensible, and whether the hybrid retrieval weights make sense. submitted by /u/Interesting_Time6301 [link] [comments]
View originalTitle: Using Claude for large firmware docs + testcase analysis — what’s the right setup?
I’m trying to use Claude to actually understand and work with a pretty heavy firmware validation setup, and I’m not sure what the most effective workflow looks like. Context: ~10 technical documents (~200 pages each) explaining services, flows, and internal behavior ~300MB repo with testcases, automation scripts, build system, etc. Need to understand why testcases are written the way they are, not just what they do Also comparing two different frameworks testing the same services Problems: Hard to connect: requirement → API → testcase → script Existing testcases feel like black boxes Comparing frameworks is confusing (same goal, different structure) Feeding large docs directly into Claude doesn’t work well What I’m trying: Using Claude to explain individual testcases Thinking about structuring docs into smaller chunks Considering tools like NotebookLM alongside Claude What I’m looking for: What’s the best way to set up Claude for this kind of workflow? Do you preprocess docs (markdown, chunking, etc.), or rely on external tools? How do you use Claude to reverse-engineer testcase intent? Any good approach to comparing two frameworks using Claude? Would really appreciate inputs from anyone who has used Claude (or similar tools) for large-scale firmware or systems-level validation work. submitted by /u/InevitableOk2066 [link] [comments]
View originalClaude Engram - persistent memory for Claude Code that auto-tracks mistakes and context
Some of you might remember my post a few months ago about Mini Claude. I had Claude build its own memory system after researching its own user complaints. That project worked, but the hook system was a pain. I shelved it. Then Claude Code got "open-sourced", and I could actually see how hooks like PostToolUseFailure, PreCompact, and all the lifecycle events work internally. Rewrote the whole thing with proper hook integration. Renamed it Claude Engram. What changed from the original: The old version required Claude to manually call everything. The new version automatically hooks into Claude Code's tool lifecycle. Claude doesn't have to invoke anything for the core features to work. How it works: Hooks intercept every edit, bash command, error, and session event. Zero manual effort. Before you edit a file, it surfaces past mistakes and relevant context, scored by file match, tags, and recency. Survives context compaction. Auto-checkpoints before, re-injects rules and mistakes after. Tiered storage. Hot memories stay fast, old ones archive to cold storage. Searchable, restorable. Multi-project workspaces. Memories scoped per project, workspace-level rules cascade down. Hybrid search using AllMiniLM. Keyword + vector + reranking. No ChromaDB dependency. Update — v0.4.0: Session Mining Since the original post, engram now mines your Claude Code session logs automatically. This is the big addition. Claude Code stores your full conversation as JSONL files. After every session, engram parses them in the background and extracts what hooks can't capture: Decisions, mistakes, and approach changes extracted from conversation flow (not regex — structural analysis + AllMiniLM semantic scoring, naturally typo-tolerant) Searchable index across all past conversations — "what did we discuss about auth?" returns results in 112ms — every user message and assistant response from every past session gets embedded and indexed (7310 messages across 11 sessions in testing) Detects recurring struggles, error patterns across sessions, and which files are always edited together Predictive context — before you edit a file, it surfaces related files and likely errors from your history Cross-project learning — finds patterns that hold across all your projects Retroactive bootstrap — install on an existing project and it mines all your past sessions automatically Benchmark Result Decision Capture (220 prompts) 97.8% precision Injection Relevance (50 memories) 14/15, 100% isolation Compaction Survival 6/6 Error Auto-Capture (53 payloads) 100% recall, 97% precision Multi-Project Scoping 11/11 Session Mining Foundation 27/27 Obsidian Vault Compatibility 25/25 Cross-session search 112ms over 7310 indexed messages Not just Claude Code: The MCP server works with any MCP client — Cursor, Windsurf, Zed, Continue.dev. Claude Code gets the full auto-capture hooks + session mining on top. Also works with Obsidian vaults (PARA + CLAUDE.md structure). Tested and verified. No cloud, no API costs, runs locally. MIT licensed. https://github.com/20alexl/claude-engram submitted by /u/Crunchy-Nut1 [link] [comments]
View original[D] MemPalace claims 100% on LoCoMo and a "perfect score on LongMemEval." Its own BENCHMARKS.md documents why neither is meaningful.
A new open-source memory project called MemPalace launched yesterday claiming "100% on LoCoMo" and "the first perfect score ever recorded on LongMemEval. 500/500 questions, every category at 100%." The launch tweet went viral reaching over 1.5 million views while the repository picked up over 7,000 GitHub stars in less than 24 hours. The interesting thing is not that the headline numbers are inflated. The interesting thing is that the project's own BENCHMARKS.md file documents this in detail, while the launch tweet strips these caveats. Some of failure modes line up with the methodology disputes the field has been arguing about for over a year (Zep vs Mem0, Letta's "Filesystem All You Need" reproducibility post, etc.). 1. The LoCoMo 100% is a top_k bypass. The runner uses top_k=50. LoCoMo's ten conversations have 19, 19, 32, 29, 29, 28, 31, 30, 25, and 30 sessions respectively. Every conversation has fewer than 50 sessions, so top_k=50 retrieves the entire conversation as the candidate pool every time. The Sonnet rerank then does reading comprehension over all sessions. BENCHMARKS.md says this verbatim: The LoCoMo 100% result with top-k=50 has a structural issue: each of the 10 conversations has 19–32 sessions, but top-k=50 exceeds that count. This means the ground-truth session is always in the candidate pool regardless of the embedding model's ranking. The Sonnet rerank is essentially doing reading comprehension over all sessions - the embedding retrieval step is bypassed entirely. The honest LoCoMo numbers in the same file are 60.3% R@10 with no rerank and 88.9% R@10 with hybrid scoring and no LLM. Those are real and unremarkable. A 100% is also independently impossible on the published version of LoCoMo, since roughly 6.4% of the answer key contains hallucinated facts, wrong dates, and speaker attribution errors that any honest system will disagree with. 2. The LongMemEval "perfect score" is a metric category error. Published LongMemEval is end-to-end QA: retrieve from a haystack of prior chat sessions, generate an answer, GPT-4 judge marks it correct. Every score on the published leaderboard is the percentage of generated answers judged correct. The MemPalace LongMemEval runner does retrieval only. For each of the 500 questions it builds one document per session by concatenating only the user turns (assistant turns are not indexed at all), embeds with default ChromaDB embeddings (all-MiniLM-L6-v2), returns the top five sessions by cosine distance, and checks set membership against the gold session IDs. It computes both recall_any@5 and recall_all@5, and the project reports the softer one. It never generates an answer. It never invokes a judge. None of the LongMemEval numbers in this repository - not the 100%, not the 98.4% "held-out", not the 96.6% raw baseline - are LongMemEval scores in the sense the published leaderboard means. They are recall_any@5 retrieval numbers on the same dataset, which is a substantially easier task. Calling any of them a "perfect score on LongMemEval" is a metric category error. 3. The 100% itself is teaching to the test. The hybrid v4 mode that produces the 100% was built by inspecting the three remaining wrong answers in their dev set and writing targeted code for each one: a quoted-phrase boost for a question containing a specific phrase in single quotes, a person-name boost for a question about someone named Rachel, and "I still remember" / "when I was in high school" patterns for a question about a high school reunion. Three patches for three specific questions. BENCHMARKS.md, line 461, verbatim: This is teaching to the test. The fixes were designed around the exact failure cases, not discovered by analyzing general failure patterns. 4. Marketed features that don't exist in the code. The launch post lists "contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them" as a feature. mempalace/knowledge_graph.py contains zero occurrences of "contradict". The only deduplication logic is an exact-match check on (subject, predicate, object) triples that blocks identical triples from being added twice. Conflicting facts about the same subject can accumulate indefinitely. 5. "30x lossless compression" is measurably lossy in the project's own benchmarks. The compression module mempalace/dialect.py truncates sentences at 55 characters, filters by keyword frequency, and provides a decode() function that splits the compressed string into a header dictionary without reconstructing the original text. There is no round-trip. The same BENCHMARKS.md reports results_raw_full500.jsonl at 96.6% R@5 and results_aaak_full500.jsonl at 84.2% R@5 — a 12.4 percentage point drop on the same dataset and the same metric, run by the project itself. Lossless compression cannot cause a measured quality drop. Why this matters for the benchmark conversation. The field needs benchmarks where judge reliability is adversarially validated, an
View originalRepository Audit Available
Deep analysis of InternLM/InternLM — architecture, costs, security, dependencies & more
Key features include: Open-source architecture, Support for multiple languages, Customizable model training, Integration with popular ML frameworks, User-friendly API for developers, Pre-trained models available for quick deployment, Community-driven updates and improvements, Comprehensive documentation and tutorials.
InternLM is commonly used for: Natural language processing tasks, Chatbot development, Content generation for blogs and articles, Sentiment analysis in social media, Text summarization for reports, Language translation services.
InternLM integrates with: TensorFlow, PyTorch, Hugging Face Transformers, Docker for containerization, Kubernetes for orchestration, Flask for web applications, FastAPI for building APIs, Jupyter Notebooks for interactive development, Slack for team collaboration, GitHub for version control and collaboration.
InternLM has a public GitHub repository with 7,173 stars.