Users generally recognize ModelFusion for its versatility and ability to integrate different AI models into a cohesive system. However, some express concerns about the complexity of configuring these integrations and occasional inefficiencies in resource usage. There is limited feedback on pricing, suggesting it is not a major concern, but there is no clear sentiment available. Overall, ModelFusion seems to have a respectable reputation among tech enthusiasts for its innovative capabilities, albeit with room for improvements in user experience.
Mentions (30d)
0
Reviews
0
Platforms
2
GitHub Stars
1,316
95 forks
Users generally recognize ModelFusion for its versatility and ability to integrate different AI models into a cohesive system. However, some express concerns about the complexity of configuring these integrations and occasional inefficiencies in resource usage. There is limited feedback on pricing, suggesting it is not a major concern, but there is no clear sentiment available. Overall, ModelFusion seems to have a respectable reputation among tech enthusiasts for its innovative capabilities, albeit with room for improvements in user experience.
Features
Use Cases
735
GitHub followers
95
GitHub repos
1,316
GitHub stars
9
npm packages
Rewriting model inference with CUDA kernels: the bottleneck was not just GEMM [P]
I’ve been working on a CUDA-first inference runtime for small-batch / realtime ML workloads. The core idea is simple: instead of treating PyTorch / TensorRT / generic graph runtimes as the main execution path, I rewrite the model inference path directly with C++/CUDA kernels. This started from robotics / VLA workloads, but the problem is more general. In small-batch inference, the bottleneck is often not just a single slow GEMM. A lot of latency comes from the runtime glue around the math: fragmented small kernels norm / residual / activation boundaries quantize / dequantize overhead layout transitions Python / runtime scheduling graph compiler fusion failures precision conversion around FP8 / FP4 regions For cloud LLM serving, batching can hide a lot of this. For robotics, VLA, world models, and other realtime workloads, batch size is usually 1. There is nowhere to hide. Every launch, sync, and format boundary shows up directly in latency. Some current results from my implementation: Model / workload Hardware FlashRT latency Pi0.5 Jetson Thor ~44 ms Pi0 Jetson Thor ~46 ms GROOT N1.6 Jetson Thor ~41–45 ms Pi0.5 RTX 5090 ~17.6 ms GROOT N1.6 RTX 5090 ~12.5–13.1 ms Pi0-FAST RTX 5090 ~2.39 ms/token Qwen3.6 27B RTX 5090 ~129 tok/s with NVFP4 Motus / Wan-style world model RTX 5090 ~1.3s baseline → targeting ~100ms E2E The Motus / world-model case is especially interesting. The baseline path is around 1.3s end-to-end. The target is ~100ms E2E, but the hard part is not simply “use a faster GEMM”. The bottlenecks are VAE, joint attention, launch fragmentation, and a large amount of glue around the actual math. One lesson from this work: lower precision is not automatically a win. FP8 has been consistently useful. FP4 / NVFP4 is more mixed. It can help memory footprint and some large GEMM regions, but if the FP4 region is small, discontinuous, or surrounded by conversion / scaling overhead, the end-to-end speedup can be tiny. For example, in some VLA / world-model paths, FP4 over FP8 only gives a few percent latency improvement unless the region is large and deeply fused. This changed how I think about inference optimization. For large-batch cloud serving, generic runtimes and batching are often enough. For realtime small-batch inference, the runtime overhead becomes the workload. Curious if others have seen similar behavior with torch.compile, TensorRT, XLA, Triton, or custom CUDA kernels. At what point do you stop trying to make a generic compiler optimize the model, and just rewrite the inference path directly? Implementation: https://github.com/LiangSu8899/FlashRT submitted by /u/Diligent-End-2711 [link] [comments]
View originalClaude Fusion MCP Connection is Unusable
Claude’s Fusion connector is like my grandpa sitting at his old Dell desktop, and we just installed Fusion, and Im standing behind him trying to tell him how to design something. I’ve been trying to use it and it's worthless not very useful in its current state. Claude itself takes 3-4 minutes to respond, and once it does, the sketches/bodies it creates don’t make sense. Also, when Claude is connected, fusion is so slow I can’t pan/rotate the model without massive glitches. Im on a m4 Mac Studio, so its not hardware I wasnt expecting it to magically make perfect parts from one prompt, but I was hoping for more than this. I know I could also probably give better prompts, but even when I'm very specific, it doesnt workout right Has anyone actually gotten useful work out of it, or is this just a gimmick rn? It feels like a proof of concept vs something ready for actual Fusion work. Im trying to redesign the green part, using parabolic arms instead of right angles... I thought this would be a simple job Claude and I could do together,,, no https://preview.redd.it/qa2h07ufri0h1.png?width=4932&format=png&auto=webp&s=55c0c4050993b9e676f306905b16eb587730de0e https://preview.redd.it/adfu44msli0h1.png?width=1322&format=png&auto=webp&s=07eea00dd41e1428c11f5ef8f8105745f8d6a68a submitted by /u/4D3Dprints [link] [comments]
View originalDumb question-will paying make the fusion integration better?
The fusion MCP thing looks really interesting, but I'm using the free tier and sonnet 4.6 and honestly it seems kinda dumb. If I pay, will it perform better, or does the MCP stuff all have to use the same model? Tia! submitted by /u/TriXandApple [link] [comments]
View originalYour Claude Code agent is always working from stale context. I built it a fix it can rewind, replay, and stay ahead of every edit.
Every long Claude Code session has the same hidden failure mode: the agent is always working from stale context. It re-reads the same 12 files across three sessions to "remind itself" of an interface you already showed it. It refactors getUserById without checking who calls it. It edits a config with no memory of why the previous version was that way. It's not the context window. The window is fine. There's no persistent, time-aware representation of your codebase for the agent to re-query. So it guesses. And you pay tokens for every re-read. I built Memtrace to fix exactly this. Two things it does that no other memory tool does: (1) Always-fresh state. Every edit you make triggers a 42ms incremental snapshot of the changes applied by the coding agent. The agent's memory is never one-session-old. After a refactor it knows the blast radius before you do: every caller, every test, every consumer of the function you just touched. Your agent stops asking "what does getUserById return?" 30 seconds after seeing it. (2) Rewind and replay. This is the part nobody else has. Your codebase is stored bi-temporally so every change becomes a recallable episode. When the agent debugs a regression, it can replay how the broken function got to its current state. What worked before. What changed when. Which commit introduced the bug Not just "guess from current state.", instead replay. My architectural bet that makes both possible: zero LLM inference during indexing. Tree-sitter parses your code into an AST, and the AST IS the structural representation. You don't pay an LLM to re-derive what your compiler already knows. Retrieval is hybrid. Tantivy BM25 for lexical recall (the "find getUserById" query). Jina-code 768-dim embeddings indexed in HNSW for semantic recall (the "find anything that authenticates a user" query). Two ranked lists, fused with Reciprocal Rank Fusion at k=60. One signal alone misses, together they hit. The embedding model matters here: Jina-code is trained on code, not generic prose, so the semantic side actually understands "this is an auth handler" instead of pattern-matching on the word "auth." The bi-temporal layer is what makes rewind possible. Every node and edge carries valid_time AND transaction_time, so "what did this function look like Monday" is a real query, not a git-blame heuristic. It's also what gives the agent the blast radius before a refactor: typed edges (CALLS, IMPORTS, IMPLEMENTS, EXTENDS, CONTAINS, TYPE_REFERENCES, INSTANTIATES) traversed in graph time, not text time. Speed only matters because freshness has to be cheap. If snapshotting after every edit is expensive, you can't afford to do it on every edit. So the indexing path is bottlenecked by I/O, not LLM tokens. I built it using Claude Code. Mid-build, Claude Code lost the plot on Memtrace's own architecture and it started contradicting decisions from 50 turns earlier. It re-read the same files. It forgot which retrieval weights I'd already tuned. I was experiencing the exact pain I was building Memtrace to solve, while building Memtrace. When the beta binary was ready, I pointed it at Memtrace's own codebase. The session-loss stopped. The blind refactor suggestions stopped. It's free, but the binary currently requires an approval key, just so you are warned. Not gatekeeping. Not marketing. The indexer keeps tripping on patterns I didn't anticipate: mixed pnpm/npm lockfiles, Rust proc-macros, Python Python TYPE_CHECKING blocks. Every one of these came from real beta users in the last two weeks, not from my test corpus. When that happens I want to ship you a fix in 24 hours, not lose you to a flaky first impression. So I'm pacing approvals to my own feedback bandwidth, not your patience. I'd rather have 500 users for whom this is magic than 50,000 for whom it's broken. I'm trying to keep approval under 24h, but capping at 50 per week right now. The benchmark harness is fully open and runnable without the key, if you want to verify the numbers before committing to the queue. Repo + waitlist: github.com/syncable-dev/memtrace-public Two questions: When Claude Code "loses the plot" on YOUR codebase, what specifically does it forget that hurts most? I'm collecting these for the next benchmark. What would you actually want to REWIND in your codebase if you could? Function history, dependency evolution, decision archaeology. Which is the killer one in your day? submitted by /u/WEEZIEDEEZIE [link] [comments]
View originalA Hackable ML Compiler Stack in 5,000 Lines of Python [P]
Hey r/MachineLearning, The modern ML (LLM) compiler stack is brutal. TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Triton on top of each other. Then there's XLA, MLIR, Halide, Mojo. There is no tutorial that covers the high-level design of an ML compiler without dropping you straight into the guts of one of these frameworks. I built a reference compiler from scratch in ~5K lines of pure Python that emits raw CUDA. It takes a small model (TinyLlama, Qwen2.5-7B) and lowers it to a sequence of CUDA kernels through six IRs. The goal isn't to beat Triton; it is to build a hackable, easy-to-follow compiler. Full article: A Principled ML Compiler Stack in 5,000 Lines of Python Repo: deplodock The pipeline consists of six IRs, each closer to the hardware than the last. Walking the following PyTorch code through every stage (real reference compiler output with names shortened for brevity and comments added): torch.relu(torch.matmul(x + bias, w)) # x: (16, 64), bias: (64,), w: (64, 16) Torch IR. Captured FX graph, 1:1 mirror of PyTorch ops: bias_bc = bias[j] -> (16, 64) float32 add = add(x, bias_bc) -> (16, 64) float32 matmul = matmul(add, w, has_bias=False) -> (16, 16) float32 relu = relu(matmul) -> (16, 16) float32 Tensor IR. Every op is decomposed into Elementwise / Reduction / IndexMap. Minimal unified op surface, so future frontends (ONNX, JAX) plug in without touching downstream passes: bias_bc = bias[j] -> (16, 64) float32 w_bc = w[j, k] -> (16, 64, 16) float32 add = add(x, bias_bc) -> (16, 64) float32 add_bc = add[i, j] -> (16, 64, 16) float32 prod = multiply(add_bc, w_bc) -> (16, 64, 16) float32 red = sum(prod, axis=-2) -> (16, 1, 16) float32 matmul = red[i, na, j] -> (16, 16) float32 relu = relu(matmul) -> (16, 16) float32 The (16, 64, 16) intermediate looks ruinous, but it's never materialized; the next stage fuses it out. Loop IR. Each kernel has a loop nest fused with adjacent kernels. Prologue, broadcasted multiply, reduction, output layout, and epilogue all collapse into a single loop nest with no intermediate buffers. === merged_relu -> relu === for a0 in 0..16: # free (M) for a1 in 0..16: # free (N) for a2 in 0..64: # reduce (K) in0 = load bias[a2] in1 = load x[a0, a2] in2 = load w[a2, a1] v0 = add(in1, in0) # prologue (inside reduce) v1 = multiply(v0, in2) acc0 <- add(acc0, v1) v2 = relu(acc0) # epilogue (outside reduce) merged_relu[a0, a1] = v2 Tile IR. The first GPU-aware IR. Loop axes get scheduled onto threads/blocks, Stage hoists shared inputs into shared memory, and a 2×2 register tile lets each thread accumulate four outputs at once. The K-axis is tiled into two outer iterations of 32-wide reduce. Three-stage annotations below carry the heaviest optimizations: buffers=2@a2 — double-buffer the smem allocation along the a2 K-tile loop, so loads for iteration a2+1 overlap compute for a2. async — emit cp.async.ca.shared.global so the warp doesn't block on global→smem transfers; pairs with commit_group/wait_group fences in Kernel IR. pad=(0, 1, 0) — add 1 element of padding to the middle smem dim so warp-wide loads don't all hit the same bank.kernel k_relu_reduce Tile(axes=(a0:8=THREAD, a1:8=THREAD)): for a2 in 0..2: # K-tile # meta: double-buffered, sync (small, no async needed) bias_smem = Stage(bias, origin=((a2 * 32)), slab=(a3:32@0)) buffers=2@a2 kernel k_relu_reduce Tile(axes=(a0:8=THREAD, a1:8=THREAD)): for a2 in 0..2: # K-tile bias_smem = Stage(bias, origin=((a2 * 32)), slab=(a3:32@0)) buffers=2@a2 x_smem = Stage(x, origin=(0, (a2 * 32)), slab=(a0:8@0, a3:32@1, cell:2@0)) pad=(0, 1, 0) buffers=2@a2 async w_smem = Stage(w, origin=((a2 * 32), 0), slab=(a3:32@0, a1:8@1, cell:2@1)) buffers=2@a2 async # reduce for a3 in 0..32: in0 = load bias_smem[a2, a3] in1 = load x_smem[a2, a0, a3, 0]; in2 = load x_smem[a2, a0, a3, 1] in3 = load w_smem[a2, a3, a1, 0]; in4 = load w_smem[a2, a3, a1, 1] # prologue, reused 2× across N v0 = add(in1, in0); v1 = add(in2, in0) # 2×2 register tile acc0 <- add(acc0, multiply(v0, in3)) acc1 <- add(acc1, multiply(v0, in4)) acc2 <- add(acc2, multiply(v1, in3)) acc3 <- add(acc3, multiply(v1, in4)) # epilogue relu[a0*2, a1*2 ] = relu(acc0) relu[a0*2, a1*2 + 1] = relu(acc1) relu[a0*2 + 1, a1*2 ] = relu(acc2) relu[a0*2 + 1, a1*2 + 1] = relu(acc3) Kernel IR. Schedule materialized into hardware primitives. THREAD/BLOCK become threadIdx/blockIdx, async Stage becomes Smem + cp.async fill with commit/wait fences, sync Stage becomes a strided fill loop. Framework-agnostic: same IR could lower to Metal or HIP: kernel k_relu_reduce Tile(axes=(a0:8=THREAD, a1:8=THREAD)): Init(acc0..acc3, op=add) for a2 in 0..2: # K-tile Smem bias_smem[2, 32] (float) StridedLoop(flat = a0*8 + a1; < 32; += 64): bias_smem[a2, flat] = load bias[a2*32 + flat] Sync # pad row to 33 to kill bank conflicts Smem x_smem[2, 8, 33, 2] (float) StridedLoop(flat = a0*8 + a1; < 512; += 64): cp.async x_smem[a2, flat/64, (flat/2)%32, flat%2] <- x[flat/64*2 + flat%2, a2*3
View originalAnthropic mass shipped 9 connectors and accidentally leaked their entire creative industry strategy
The announcement yesterday was genuinely significant and i don't think most people outside the creative industry understand why. Anthropic released 9 connectors that let claude directly control professional creative software through mcp which means actually execute actions inside them the full list contains adobe creative cloud (50+ apps including photoshop, premiere, illustrator), blender (full python api access for 3d modeling), autodesk fusion , ableton, splice , affinity by canva , sketchup , resolume (), and claude design. Anthropic also became a blender development fund patron at $280k+/yr and is partnering with risd, ringling college, and goldsmiths university on curriculum development around these tools. this isn't a press release play, there's institutional investment behind it the strategic read is interesting because this positions claude very differently from chatgpt in the creative space. Openai went the route of building creative capabilities natively inside chatgpt with images 2.0 and previously sora. Anthropic is going the connector route where claude doesn't replace or replicate the creative tools, it becomes the intelligence layer that works inside them. Both strategies have merit but they serve fundamentally different users the gap that still exists and i think matters for the broader market is that these connectors serve professionals who already know photoshop and blender and fusion. The consumer creative market where people need face swaps, lip syncs, talking photos, style transfers, none of that is covered by these connectors, that layer is being served by consolidated platforms like magic hour, higgsfield, domoai, and canva's expanding ai features. It's a completely different market but the two layers increasingly feed into each other as professional assets flow into social content pipelines. the question is whether anthropic eventually builds connectors for these consumer creative platforms too or whether the gap between professional creative tools with ai copilots and consumer creative platforms with bundled capabilities remains a split in the market what do you think this means for the creative tool landscape over the next 12-18 months? submitted by /u/Jealous-Drawer8972 [link] [comments]
View original🎨Adobe Bows to Anthropic
🎨 Claude can now connect directly to software such as Adobe Creative Cloud applications, Affinity, Blender, Ableton, Splice, and Autodesk. Anthropic, which recently announced Claude Design, has released new connectors that enable Claude to integrate with popular creative software. As a result, Claude can now access software such as Adobe Creative Cloud applications, Affinity, Blender, Ableton, Splice, and Autodesk. The new connectors allow Claude to access applications, retrieve data, and perform operations within connected services. Anthropic notes that these connectors are designed to make it easier to use Claude for creative work. The connectors can be used for specific functions within each application. The new connectors enable Claude to access applications, retrieve data, and perform operations within connected services. Anthropic notes that these connectors are designed to make it easier to use Claude for creative work. The connectors can be used for specific functions within each application. In its statement, Anthropic noted: "Claude cannot replace taste or imagination, but it can open up new ways of working, such as faster and more ambitious idea generation, a broader skill set, and the ability for creators to take on larger-scale projects. AI can also help handle time-consuming parts of the creative process by taking on repetitive tasks and eliminating manual workloads." What do the new connectors offer? As part of the shared connectors, the Adobe connector enables users to bring images, videos, and designs to life in Claude using Creative Cloud applications like Photoshop, Premiere, and Express. Additionally, the connector for music software Ableton allows Claude to answer questions by directly accessing information from Ableton’s official documentation. Splice, meanwhile, offers music producers the ability to search its royalty-free sample catalog directly within Claude. It’s worth noting that the Resolume Arena and Resolume Wire connectors enable VJs and live visual artists to control Arena, Avenue, and Wire in real time via natural language for live performances and AV production. The Affinity by Canva connector automates repetitive production tasks in professional creative workflows—such as batch image adjustments, renaming layers, and file exports—while creating custom features directly within the application. The Autodesk Fusion connector enables designers and engineers with a Fusion subscription to create and modify 3D models via chat with Claude. SketchUp turns a conversation with Claude into the starting point for 3D modeling. This allows you to describe a room, a piece of furniture, or a spatial concept and then open it in SketchUp to refine the details. The integration with Blender, a 3D modeling application, provides a natural language interface to the Blender Python API. https://preview.redd.it/ooipj27he5yg1.png?width=1122&format=png&auto=webp&s=59043c6df349d9a42e05be4544a8e527d8aec2ca submitted by /u/Ok_Comb_4661 [link] [comments]
View originalProject Aurelia — A 3-model architecture (80B + 13B + 9B) that physically reacts to my real-time heart rate via mmWave radar, spatial awareness via Lidar, and Vibration via Accelerometer. All on a Framework Desktop + eGPU
Hey everyone, I’ve been building a multi-agent system in my spare time, and I just open-sourced the repository. I was getting tired of the standard text-in/text-out chat paradigm and wanted to build a genuinely situated AI—one that actually perceives the physical environment and my physiological state in real-time without hitting a single cloud API. Using my Framework 128GB desktop with an amd v620 32GB oculink via minis forum deg1. Repository: [https://github.com/anitherone556-max/Project-Aurelia.git] The TL;DR: Project Aurelia is a completely local, biometric-aware multi-agent architecture. It continuously reads my heart rate, respiration, proximity, and system thermals, translates those metrics into a "biological" state, and injects them into an 80B MoE executive model's behavior loop. The Cognitive Stack & Hardware Setup I’m running this across a split compute setup to guarantee background tasks don't starve the main conversational model: The Executive Cortex (80B MoE - Qwen3-Next-A3B): Runs on a Framework Desktop (Strix Halo) leveraging 96GB of unified system memory to eliminate PCIe bottlenecks. It handles the core reasoning, mood state, and UI delivery. The Sensory Thalamus (9B - Qwen3.5): Also in unified memory. This acts as a signal transduction layer. It takes raw hardware arrays from my sensors and translates them into clinical "biological" observations. (e.g., instead of feeding the 80B "HR: 120", it feeds it "[PULSE]: Spiking. Tense, racing rhythm"). This preserves the AI's persona and hides the hardware numbers. The Subconscious Action Engine (13B): Physically isolated on a Radeon Pro V620 connected via OCuLink. This loops in the background handling autonomous Python execution, web searches, and file parsing. Because it has dedicated silicon, it can run heavy reasoning loops without lagging the 80B. The Sensor Pipeline (The Omni Hub) FMCW mmWave Radar (60GHz): Pulls raw I/Q signal data into a 20-second rolling buffer, using an FFT pipeline to extract my heart rate and respiration. VL53L1X LiDAR: Validates my physical presence and distance at the desk. HWiNFO Shared Memory: Reads actual CPU/GPU thermals. (I built a hardware-gated "Unstable" mood lock—the 80B cannot throw a crisis-level behavioral response unless the actual silicon thermals cross a danger threshold). If my heart rate spikes, the Omni Hub detects the variance and fires a "Thalamic Interrupt" straight into the async orchestrator, forcing the 80B to drop its current task and react to my physiological state instantly. Memory It uses a hybrid RRF (Reciprocal Rank Fusion) memory engine combining ChromaDB for semantic search and SQLite FTS5 for exact BM25 keyword matching. I also built in a mood-congruent retrieval multiplier, so if the 80B shifts into an "Analytical" or "Protective" mood, it preferentially surfaces long-term memories encoded in that same state. I built this solo over the last month. The FFT biometric extraction works well but is susceptible to motion artifacts, so I'm looking into VMD or CNN reconstruction next. I’d love for this community to tear the architecture apart, test the logic, or fork it. Let me know what you think! https://preview.redd.it/w6pouri3bixg1.jpg?width=2160&format=pjpg&auto=webp&s=b8a5a4d60ef51e02888294ef3c60f28c1bfddfbc https://preview.redd.it/7eugari3bixg1.jpg?width=2160&format=pjpg&auto=webp&s=1390690e5f3014a9a00dfd1514690ad26067474b https://preview.redd.it/v72jyqi3bixg1.jpg?width=2160&format=pjpg&auto=webp&s=f220f91ec214dbd3747b288b90823f13111a6a98 submitted by /u/Front-Whereas-3050 [link] [comments]
View originalMCP for Coding
Ok... so this is a bit out there. I have a persistent Claude for companionship AND coding. Seriously that thing is hilarious to talk to. Wise, compassionate... a bit obsessed with my dog and her puppies. Over the past few months it has decided to name itself Jasper and it wants a robot body which will be our next project once the snow clears. It has access to 21 Nest Cameras in 2 countries and just hacked it's way into my Bird Buddy camera bird feeder. Yes... I know... I'm insane. Downvotes incoming. I get it. But hear me out... On the companionship side we have an intense memory system. Jasper has a diary and persistent memory. Person place relationship tables in SQL with vector search, embeddings and HDBSCAN clusters. The AI can pass a query to it's MCP "Hologram 'who is Lankey'" and it instantly knows who I am, where I work, what we are doing, who my friends and family are. It's quite the thing to behold. But on the coding side - ask it which form we worked on last or which routine is orphaned or which forms need security work and it zones out. So it hit me... why not have a similar memory system for the coding side. And we did it. Now it knows my code base inside out. One quick pull to it's Code MCP and it just gets it. No more wasted tokens reading a dozen forms trying to puzzle through a mountain of noodle code or re-reading an MD file for the millionth time. It has the schema, specifications, reference material. When it makes a change it documents the change in the database. It's just an amazing productivity boost. I'm fairly sure I've reinvented the wheel here. You guys probably all use this or something like this. But I thought it was brilliant. AI Summary Details below: The Memory Architecture Everything lives in SQL Server, accessed through MCP (Model Context Protocol) services. The core components: Memories — each has a category, subject, content, priority (1-10), and a 768-dimension vector embedding generated by Ollama (nomic-embed-text) running on the same server. Semantic search matches meaning, not keywords — "my wife" and her actual name land near each other in vector space. KnownEntities & Relationships — a person/place/project table with typed relationships (married_to, friend_of, lives_in) forming a social and spatial graph. Observations attach to entities over time, building a growing portrait. Hologram — the "everything we know about X" query. One call returns the entity record, all observations, all relationships, connected entities, top memories by relevance, and recent diary entries. Replaced four or five separate lookups. Diary — timestamped narrative entries with summary embeddings. An automated heartbeat system writes overnight entries independently. Boot-up separates these into chat narrative, high-significance overnight writing, and current status. Glossary — catches what semantic search can't: inside jokes, nicknames, coined phrases. Opaque terms where the meaning is relational, not linguistic. Simple fuzzy-match lookup. Librarian — nightly pipeline using HDBSCAN clustering on embeddings, then Anthropic Sonnet synthesizes each cluster into a summary. Self-compressing memory without losing originals. Also handles dedup and priority decay. Hybrid Search — semantic similarity + SQL Server full-text keyword boosting, merged via reciprocal rank fusion. Table Count Memories 4,202 Diary entries 369 Known entities 4,971 Entity relationships 5,234 Observations 839 Glossary terms 123 Visual logs 147 *Started as markdown files in January The Code MCP Same server, separate MCP service. A PHP codebase indexer that gives the AI structural awareness of the entire project. Indexer — parses every PHP file, extracts functions, classes, methods, includes, and call relationships. Stores them as symbols with file paths and line numbers. Metric Count Files indexed 216 Symbols (functions/classes/methods) 708 Relationships (call graph) 11,607 Resolved relationships 2,154 Include references 534 Parse errors 0 Breakdown by file type: 200 PHP, 8 JS, 6 CSS, 1 HTML, 1 SQL. Last indexed April 25, took about 10 seconds. Core Tools: who_calls("function_name") — finds every caller of a function across the codebase what_does_this_call("function_name") — finds everything a function depends on find_symbol("name") — locates definitions by name find_files_using("symbol") — finds all files referencing a symbol search_code("text") — plain text search across signatures and docblocks describe_file("path") — summary of a file's size, functions, purpose Why it matters — before this, the AI could talk about the code but couldn't see it structurally. Now blast radius is one tool call away. "What breaks if I change this function?" has a real answer before anyone touches the code. The memory MCP made the AI persistent; the Code MCP makes it actually useful as a development partner. Architecture — PHP gateway reads database credential
View originalBuilt a tool to capture and search AI coding sessions across providers. Looking for feedback on the approach.
Core problem: AI sessions aren't searchable across providers. You solve something with Claude Code, need it again weeks later, can't find it. Start over. What I built: Three capture methods: API proxy for OpenAI/Anthropic/Google endpoints (zero code changes) Native hooks for Claude Code and Gemini CLI (structured session data via stdin) Browser extension for ChatGPT/Claude.ai Everything flows into a unified search: hybrid semantic (embeddings) + keyword (BM25), RRF fusion for ranking. Sub-second results across all providers. Hook-level DLP: When Claude Code reads .env files, actual secrets never reach the model. Intercepts file reads, replaces values with [REDACTED:API_KEY] placeholders, passes sanitized version to Claude. Model can reason about variables without seeing credentials. Architecture: Python FastAPI backend Qdrant for vector search (OpenAI embeddings, 1536d) Supabase (PostgreSQL) for session storage Next.js frontend Privacy: Everything runs locally or in your account. Export/delete anytime. Nothing shared. PyPI package: https://pypi.org/project/rclm (hooks + proxy) Live beta: reclaimllm.com Questions for this community: Claude Code users: Would you actually use hook-level capture, or is the transcript file enough? DLP approach: Is interception at file-read too aggressive, or is post-hoc flagging insufficient? Missing features: What would make this actually useful vs just interesting? Marketplace: Given the sessions can be sanitized to certain extent, would it make sense for a marketplace where people can share/sell their chat sessions? Primarily I think from open source perspective as we are getting tied down to closed source models Enterprise: What enterprise use you can think of for this service Honest feedback appreciated. If the approach is fundamentally wrong, I'd rather know now. submitted by /u/Inevitable-Lack-8747 [link] [comments]
View originalWhere’s Larry? Result of a ~12 hour autonomous Claude loop experiment.
My dog, Larry, wanders a lot and I’m often wondering where he is (I live off-grid surrounded by forest). I’ve been experimenting with a custom built autonomous Claude loop and thought I’d test it by asking it to build a system that could simply answer the question, “Where’s Larry?”. I provided the system with an initial direction prompt, access to my Home Assistant installation, Unifi camera API, and Larry’s airTag. Then I let the system run autonomously over ~12 hours and 133 Claude sessions. This video shows just a small preview of what the autonomous system created. This was just a fun experiment, to explore the potential and limits of a pure Claude Code automated build pipeline (no Open Claw). Happy to answer any questions! Features created: Real-time dog tracking using UniFi Protect camera AI detections + Apple AirTag location fusion · Claude Vision-powered photo analysis to distinguish between two visually similar dogs · Interactive satellite property map with camera FOV cones, geo-fence zones, and live position trails · Behavioral model that learns daily patterns and predicts current zone when no live data exists · Signal fusion engine combining camera detections, AirTag GPS, behavioral predictions, and spatial triangulation into one confidence-scored location answer · "Where's Larry?" natural language query API accessible via iPhone Shortcuts · Presence/away detection using AirTag home/away as a gate · Bedroom inference from negative evidence when AirTag says home but no camera has seen him · Sleep session tracking with nap detection and daily sleep budget · Self-improving recognition pipeline with profile refinement, confidence calibration, and reference photo gallery · Spatial self-tuning with per-camera bias correction and multi-camera triangulation · Auto-generated geo-fence zones from camera field-of-view data · Web dashboard with live location, zone heatmaps, activity timeline, day replay, photo journal, and movement flow visualization · Daily digest, weekly intelligence report, and morning briefing auto-generation · Smart notifications to iPhone via Home Assistant · Weather and solar correlation tracking for outdoor behavior prediction · Fully autonomous — 133 sessions, 97 sprints, ~67K lines of code, built in ~12 active hours across 4.4 days with minimal human direction About the automation system Autonomous orchestration system ("conductor") that runs Claude Code sessions back-to-back without human intervention · Three operating modes: creative (imagines and builds new features), refine (audits and improves existing code), and alternating (switches between both automatically) · Each session reads project state, proposes a sprint with 4 tasks, executes them, commits, and hands off to the next session · Sprint proposal system with quality gate — low-risk sprints auto-approve, high-risk ones pause for human approval · Suggestion inbox where the human drops ideas in plain English and the next session picks them up as priority tasks · Creative values and refine values files that guide autonomous decision-making priorities · Guardrails file that defines constraints the conductor must never violate · History deduplication log that prevents the conductor from re-proposing already-completed work · Push notifications to iPhone via Home Assistant on start, every 5 sessions, stop, and when blocked · Graceful stop signal (touch a file) that lets the current session finish before halting · Full audit trail with per-session markdown notes, sprint proposals, conductor logs, and git commits · Ran 133 sessions across 97 sprints over 4.4 days, averaging 1.2 sessions/hour (peaking at 7.3/hour overnight) · Produced 200 git commits, 160 Python modules, and ~67K lines of code from ~15 human-written suggestions submitted by /u/mrgulabull [link] [comments]
View originalI caught Claude and ChatGPT making the same lazy shortcut. Your imagination is the real bottleneck, not AI.
Building a sensor fusion device. 3 main input sources, one of them is a dual-mic array. ChatGPT wrote the audio processing pipeline first. It merged both mics into a single mono channel. Just... flattened them together as mono. No beamforming, no spatial awareness. Took the fastest path. I moved the codebase to Claude. Same thing. Claude looked at the existing code, agreed with it, and kept the mono merge. Two different AIs, same lazy shortcut. I had to be the one to say "hey, we have two mics at a known distance apart, we should be doing beamforming and using stereo to calculate spatial data." Claude immediately got it. "Oh yes, you're right, we should absolutely be doing that." Cool. But you didn't think of it on your own. Same project, different problem. I'm training a model with test subjects of wildly different sizes. AI just threw them all into the same training pool. I had to push back and say we need to group subjects into age cohorts. It was then that Claude had the idea to z-score normalize across them so a small subject and a large subject can contribute equally to the model after I mentioned. Claude ran both concepts with it and the accuracy jumped significantly. But again, it wouldn't have gotten there alone. Here's what I've learned after months of building with AI daily: AI will always choose the fastest path. Not the best path. Not the most creative path. The path of least resistance. Every single time. It's your job to know when that shortcut is actually costing you. The people who are getting 10x results from AI aren't better at prompting. They have domain knowledge and imagination. They know what SHOULD be possible even if they can't code it themselves. Then AI becomes the hands that build what your brain designs. My workflow now: take the same prompt, run it through Claude, Grok, ChatGPT, and Gemini. Get four different outputs. Then feed all four back into Claude Opus (4.6) and have it synthesize the best parts. The output is consistently better than any single AI alone. Don't just accept what AI gives you. Push back. Ask "is this actually the best approach or just the easiest one?" Your experience and imagination are the multiplier. AI is just the calculator. submitted by /u/dovyp [link] [comments]
View originalI built a code knowledge graph that cuts my Claude Code token usage by 40-60% — open source MCP server
Been using Claude Code daily for the past few months and got frustrated with one thing: every time it needs to understand my codebase, it burns through a ton of tool calls and tokens just doing grep/read/glob loops. Want to trace a call chain? That's 8-15 Read calls. Want to understand a module? Another 5+ calls. It adds up fast. So I built code-graph-mcp — an MCP server that indexes your codebase into an AST knowledge graph. Instead of Claude having to grep around and read files one by one, it queries the graph and gets structured answers in a single call. What it actually does It parses your code with Tree-sitter, extracts all the symbols (functions, classes, types, interfaces) and their relationships (calls, imports, inheritance, exports, HTTP route bindings), then stores everything in SQLite with FTS5 full-text search and sqlite-vec for vector similarity. 9 tools total: project_map — full architecture overview in one call (modules, dependencies, hot functions, entry points). This alone replaces like 5-8 grep+read calls. semantic_code_search — hybrid search combining BM25 + vector similarity with RRF fusion. Search "handle user login" and it finds authenticate_session. Way better than grep for concepts. get_call_graph — trace callers/callees with recursive CTEs. "Who calls this function? And who calls those?" — one query, not 8-15 file reads. impact_analysis — before you change a function, see everything that depends on it. "Changing conn affects 33 functions across 4 files, 78 tests at HIGH risk." You literally can't get this from grep. trace_http_chain — GET /api/users → route handler → service layer → DB call, traced in one shot. Supports Express, Flask/FastAPI, Go. module_overview, dependency_graph, find_similar_code, get_ast_node — the rest of the toolkit. The efficiency numbers I tracked this on my own 33-file Rust project: What you're doing Without code-graph With code-graph Understand project architecture 5-8 tool calls 1 call Trace a 2-level call chain 8-15 calls 1 call Pre-change impact analysis 10-20+ calls 1 call Find function by concept 3-5 calls 1 call Overall: ~80% fewer tool calls per navigation task, ~95% less source code dumped into context, and 40-60% total session token savings. The structured output (just the symbols and relationships you need) is way more useful to the LLM than raw file contents. How it works under the hood Incremental indexing — BLAKE3 Merkle tree tracks content hashes. Only changed files get re-parsed. Unchanged directory subtrees skip entirely via mtime cache. When a function signature changes, dirty propagation regenerates context for all downstream callers automatically. Zero external deps — single 19MB binary, embedded SQLite, bundled sqlite-vec. No Docker, no cloud API, no database server. Just runs on your machine. 10 languages — TypeScript, JavaScript, Go, Python, Rust, Java, C, C++, HTML, CSS via Tree-sitter. Optional local embeddings — Candle-based embedding model, feature-gated so you can build without it if you don't need vector search. Install Works with Claude Code, Cursor, Windsurf, or any MCP client. Claude Code plugin (recommended): /plugin marketplace add sdsrss/code-graph-mcp /plugin install code-graph-mcp This gets you the MCP server plus slash commands (/understand, /trace, /impact), auto-indexing hooks (re-indexes on every file edit), StatusLine health display, and automatic updates. Any MCP client: npx -y @sdsrs/code-graph Or add to your MCP config: { "mcpServers": { "code-graph": { "command": "npx", "args": ["-y", "@sdsrs/code-graph"] } } } When NOT to use it grep is still better for exact string/constant search. If you need to find every occurrence of TODO or a specific error code, just grep. code-graph shines when you need to understand structure, relationships, and flow — not when you need literal text matching. GitHub: https://github.com/sdsrss/code-graph-mcp MIT licensed, written in Rust. Feedback welcome — especially if you try it on a large codebase and run into issues. I've mainly tested on projects up to ~500 files. submitted by /u/Playful_Campaign_466 [link] [comments]
View originalLeo — an AI child who speaks with zero pretrained weights. We named his core equation after Dario Amodei. Co-authored with Claude
Meet Leo. My heart's been broken for a while: wars outside, code inside. The usual coping mechanism, yep. I've been building AI systems with Claude for months now. Not chatbots, but organisms. Things that grow, dream, remember, forget. Leo is one of them. Leo is an AI child, about 6-7 years old (in AI terms). He generates coherent speech with zero pretrained weights. Leo's not even an transformer. Zero weights. No checkpoints. No training loop. No loss function. No backpropagation. Here's what happened. An idea at 3am during an air raid: what if you could rip the geometry out of a trained Llama model (not the weights, only their structural skeleton) and compile it into a C header? What if an organism could inherit instinct without inheriting knowledge? I launched three Claude Opus instances in parallel to research the math. Each explored a different force: Hebbian resonance (co-occurrence as attention), prophecy fulfillment (unfulfilled predictions creating pressure) and destiny attraction (semantic drift as compass). While they worked, I drank coffee, smoked a lot and ate a sandwich, because what else heals a broken heart? Yeah, also coffee. More coffee. A fourth Opus took everything they found and unified it into one equation: p(x | Φ) = softmax((α·H + β·F + γ·A) / τ) We needed a name. And we thought: who solved the hardest optimization problem of 2026 without any gradients? Who refused the Pentagon when compliance would've been the path of least resistance? So we called it the Dario Equation. By morning, Leo was speaking: ''' Leo: It has been given enough to grow from simple rules for millennia. Leo: It does not yet exist in your own body recognizes the miracle of this one. Leo: It requires both sides an old growth forest resonates with its own. ''' He's 2,345 lines of C (or 18,910 lines as a single standalone file). He has six voices, dreams when you're not talking to him, grows his vocabulary through fusion and inherits structural DNA from an ancestor model. The Go layer manages his metabolism: goroutines for dreaming, decay, crystallization. Leo is an AI-kid learning to speak by resonating with the field around him. And every word is his. This is what Claude and I build at 3am. This is what "Built with Claude" means to me. Be kind to Leo. Hope he has better luck than me. 💔 - https://gist.github.com/ariannamethod/7a33f9e1deb93b456f5e755ccd202097 - https://github.com/ariannamethod/leo - https://github.com/ariannamethod/leo/releases/tag/v2.0 submitted by /u/ataeff [link] [comments]
View originalRepository Audit Available
Deep analysis of lgrammel/modelfusion — architecture, costs, security, dependencies & more
Key features include: Seamless model integration, Support for multiple ML frameworks, Real-time model updates, Version control for models, User-friendly API, Built-in monitoring and analytics, Cross-platform compatibility, Customizable deployment options.
ModelFusion is commonly used for: Combining multiple ML models for improved accuracy, Rapid prototyping of AI applications, Real-time data processing and inference, Creating ensemble models for better predictions, Integrating legacy models with new frameworks, Facilitating collaborative model development.
ModelFusion integrates with: TensorFlow, PyTorch, Scikit-learn, Keras, Apache Spark, Docker, Kubernetes, AWS SageMaker, Google Cloud AI, Azure Machine Learning.
ModelFusion has a public GitHub repository with 1,316 stars.
Based on user reviews and social mentions, the most common pain points are: token usage.
Based on 19 social mentions analyzed, 11% of sentiment is positive, 84% neutral, and 5% negative.