Hand off complex coding tasks without sacrificing maintainability or visibility.
"Cosine" is recognized for enhancing AI agent efficiency, notably by reducing token consumption when deploying cosine similarity in the retrieval layer. Users find this aspect particularly beneficial for streamlining processes involving extensive document retrieval. However, there are no specific user complaints or detailed mention of pricing in the available discussions. Overall, "Cosine" appears to have a solid reputation as a useful tool for optimizing AI models, especially within contexts that demand high precision and resource management.
Mentions (30d)
5
Reviews
0
Platforms
2
Sentiment
3%
1 positive
"Cosine" is recognized for enhancing AI agent efficiency, notably by reducing token consumption when deploying cosine similarity in the retrieval layer. Users find this aspect particularly beneficial for streamlining processes involving extensive document retrieval. However, there are no specific user complaints or detailed mention of pricing in the available discussions. Overall, "Cosine" appears to have a solid reputation as a useful tool for optimizing AI models, especially within contexts that demand high precision and resource management.
Features
Use Cases
Industry
information technology & services
Employees
34
Funding Stage
Other
Total Funding
$3.0M
Pricing found: $20, $20, $200, $200, $20
ai slop? who knows~
I investigated whether routing a transformer's forward activations through a lossy Dual E8 (E16) lattice bottleneck and injecting them back into the residual stream is viable, and where the boundary of generative stability lies. **The core finding:** There is a sharp empirical stability threshold at a blend ratio of $\beta = 0.20$. Beyond this boundary, open-ended generation collapses into semantic loops and repetition lock. --- ### The Mechanism Standard LLM states are high-dimensional floats. Rather than applying traditional scalar quantization (like INT4), I mapped high-dimensional activations onto a conceptual torus via a sinusoidal map and projected them onto Dual E8 lattice hemispheres. Full replacement of MLP layers with geometric bottlenecks universally collapsed the model. Instead, I implemented a residual blend: $$\text{out} = (1-\beta)\cdot\text{original} + \beta\cdot\text{geometric}$$ --- ### The $\beta = 0.20$ Sweep (Qwen2.5-0.5B) Sweeping $\beta$ from 0.10 to 0.50 across layers 8–13 of `Qwen2.5-0.5B` reveals a sharp phase transition: * **$\beta \ge 0.25$** : Generation succumbs to heavy repetition pressure and semantic drift. The geometry acts as an attractor, trapping the decoding process ("loop-lock"). * **$\beta = 0.20$** : The stability boundary. This is the highest injection ratio of lossy geometric signal that maintains both numerical activation fidelity (Avg Cosine > 0.99) and open-ended generation quality (low repeated n-grams). * **$\beta \le 0.10$** : The perturbation is largely absorbed and damped by the transformer's layer normalizations, making the intervention invisible. Here is the data from a 300-iteration sweep: | $\beta$ | Min Cosine | Avg Cosine | Max MSE | Rep-3g (Repetition Rate) | | :--- | :--- | :--- | :--- | :--- | | 0.10 | 0.9972 | 0.9979 | 0.0024 | 0.134 | | **0.20** | **0.9907** | **0.9916** | **0.0106** | **0.093** | | 0.25 | 0.9839 | 0.9865 | 0.0171 | 0.084 | | 0.30 | 0.9648 | 0.9771 | 0.0255 | 0.190 | | 0.50 | 0.9171 | 0.9288 | 0.0850 | 0.412 | Semantic scoring (evaluating prompt relevance and similarity to the unmodified baseline): | $\beta$ | Avg Cosine | Rep-3g | Relevance | Patched-to-Baseline Sim | | :--- | :--- | :--- | :--- | :--- | | 0.10 | 0.9980 | 0.223 | 0.781 | 0.889 | | **0.20** | **0.9918** | **0.075** | **0.752** | **0.854** | | 0.25 | 0.9871 | 0.232 | 0.717 | 0.801 | | 0.30 | 0.9760 | 0.392 | 0.725 | 0.764 | --- ### Generalization (1.5B & 3B Models) The $\beta = 0.20$ boundary generalizes across larger model sizes (`Qwen2.5-1.5B` and `Qwen2.5-3B` in 4-bit) on the activation-cosine axis: | Model | $\beta$ | Min Cosine | Avg Cosine | Max MSE | Rep-3g | | :--- | :--- | :--- | :--- | :--- | :--- | | **1.5B** | 0.10 | 0.9988 | 0.9989 | 0.0027 | 0.267 | | | **0.20** | **0.9862** | **0.9939** | **0.0105** | **0.128** | | | 0.25 | 0.9904 | 0.9919 | 0.0166 | 0.398 | | | 0.30 | 0.9733 | 0.9815 | 0.0235 | 0.307 | | | 0.40 | 0.9368 | 0.9551 | 0.0487 | 0.191 | | **3B (4-bit)** | 0.10 | 0.9964 | 0.9976 | 0.0122 | 0.033 | | | **0.20** | **0.9861** | **0.9904** | **0.0455** | **0.115** | | | 0.25 | 0.9604 | 0.9799 | 0.0654 | 0.043 | | | 0.30 | 0.9702 | 0.9778 | 0.0987 | 0.050 | | | 0.40 | 0.9158 | 0.9390 | 0.1728 | 0.025 | *Note: In the 3B model, repetition pressure remained low across all sweeps, but the validation cosine degraded identically at $\beta \ge 0.25$.* I also tested layer-level oscillating $\beta$ schedules (e.g., sine waves across layers), but they degraded open-ended text quality compared to a fixed, constant injection ratio. --- ### Storage Compression Prototypes Utilizing the Dual E8/E16 lattice as a computational substrate also yields high theoretical storage efficiency in early prototypes: 1. **KV Cache (8$\times$)** : FP16 KV cache compressed to INT8 coordinates, reducing footprint from 0.21 MB to 0.02 MB. 2. **Weights (112$\times$)** : Projected a dense $[4864, 896]$ MLP weight matrix down to a 0.07 MB E16 footprint. (Cosine similarity of the uncalibrated weight matrix multiplication was limited to $\sim$0.078, indicating that Quantization-Aware Training is mandatory for parameter viability). A **pre-projected decompression bypass** was designed to run matrix multiplications directly against lattice coordinates without upcasting, avoiding memory bandwidth bottlenecks. --- ### Policy Constraints (Negative Result) I evaluated whether residual E16 projection could act as a steering substrate to enforce safety policies. It cannot. While $\beta = 0.20$ preserves generation quality, the lossy nature of E16 projection strips out the logical nuances required to maintain strict boundaries. Dedicated supervised control heads remain necessary. --- ### Implications & Next Steps Snapping post-training activations to a fixed algebraic lattice is ultimately lossy. The real frontier here is **native geometric transformers** —designing and training networks from scratch with E8/E16 constraints native to both weight matrices and activation routing. submitt
View originalCFS-R: Conditional Field Reconstruction
I evaluated CFS-R on LoCoMo (1,982 questions, same setup as the CFS evaluation), holding cosine and BM25 fixed and varying only the third leg. baseline cosine top-10: NDCG@10 0.5123, Recall@10 0.6924 rrf(cos, BM25): NDCG@10 0.5196, Recall@10 0.6989 rrf(cos, BM25, MMR tuned): NDCG@10 0.5330, Recall@10 0.7228 rrf(cos, BM25, CFS-long): NDCG@10 0.5362, Recall@10 0.7295 rrf(cos, BM25, CFS-R top50 w3): NDCG@10 0.5447, Recall@10 0.7303 Against tuned MMR: +1.17 pp NDCG@10 (95% CI [+0.66, +1.69], p < 0.001). Against CFS-long: +0.85 pp NDCG@10 (95% CI [+0.33, +1.35], p = 0.0006). Against baseline cosine: +3.24 pp NDCG@10, +3.79 pp Recall@10. The sweep wasn’t fragile.. the top configurations clustered tightly between 0.5441 and 0.5447 NDCG@10, which means the operator is on a stable plateau rather than a single magic hyperparameter. The category breakdown is where the conceptual difference shows up: single-hop multi-hop temporal open-dom adversarial tuned MMR 0.3479 0.6377 0.2938 0.6144 0.4705 CFS-long 0.3615 0.6376 0.2959 0.6157 0.4734 CFS-R top50 w3 0.3646 0.6344 0.2948 0.6209 0.5018 The adversarial line is the result that matters: +3.13 pp over tuned MMR, +2.84 pp over CFS-long. If the adversarial problem were only pairwise diversity, MMR should be very hard to beat but it isn’t. That supports the main claim: long-memory retrieval is not just about avoiding similar chunks. It is about reconstructing the evidence behind the query. Temporal is no longer a glaring weakness either, CFS-long still slightly leads, but CFS-R has closed the gap while keeping the adversarial gains. https://gist.github.com/M-Garcia22/542a9a38d93aae1b5cf21fc604253718 submitted by /u/mauro8342 [link] [comments]
View originalV-JEPA 2.1's dense features are partitioned: a robustness study across all four model sizes [R]
I ran a pre-registered robustness study on Meta's V-JEPA 2.1 across all four released model sizes (80M → 2B). 322-cell sweep Three findings worth flagging: 1. Dense features are partitioned. M2 (representational drift between clean and perturbed clips, measured as cosine distance on temporal-gradient vectors) predicts downstream task failure on DAVIS for temporal corruption (frame drops r=0.37 [0.30, 0.44], occlusion r=0.35 [0.28, 0.42]). For image-noise corruption, the correlation is statistically indistinguishable from zero (Gaussian r=−0.06, motion blur r=+0.09, low-light r=+0.05; all CIs cross zero). The two perturbation families are statistically separable at 95% confidence (closest CI gap +0.106). Aggregate r=0.16 [0.13, 0.20] is below both the pre-registered ambiguous threshold (0.30) and confirmation threshold (0.50). 2. Bigger is not reliably better. Every Tier 1 perturbation showed non-monotonic robustness. The 2B "gigantic" model is less robust than the 1B "giant" variant on three of the five perturbations. All jumps >5× their pooled CI half-width. 3. V-JEPA 2.1 is meaningfully orientation-sensitive. Horizontal flip preserves all temporal structure but disrupts representations comparably to playing the video backwards (M2 = 0.91 across all models vs. predicted upper bound of 0.30). Not orientation-equivariant out of the box. Six hypotheses pre-registered with explicit numerical decision rules. Two confirmed, three refuted, one partially withdrawn during analysis - the M1 component of H2 turned out to be ill-defined under reverse playback (M1 assumes preserved frame ordering, which time-axis perturbations break). Documented and not buried. Proposed mechanism for the non-monotonic scaling result: hub marginalization in deep ViTs (arXiv:2511.21635). Deeper models can over-shoot from "single hub aggregator" to a regime where extra layers scramble information rather than refine it. V-JEPA's dense predictive loss explicitly pushes against single-hub aggregation; if the 2B variant has crossed into the over-communication regime while the distilled 300M retains controlled mixing, the pattern is what hub marginalization predicts. Code, reproducibility manifest, raw shards: https://github.com/poisson-labs/vjepa-stress Full writeup: https://poissonlabs.ai/research/vjepa-2-1-robustness Happy to discuss methodology, the partitioning interpretation, or the hub-marginalization argument. The image-noise side of partitioning (gaussian/motion blur/low-light CIs all crossing zero) is the part I'd most like skeptical eyes on. submitted by /u/poisson_labs [link] [comments]
View originalCFS - Conditional Field Subtraction
CFS selects relevant candidates by penalizing regions already covered by previous picks. Results on retrieval ranking: baseline cosine top-K: NDCG@10 0.5123, Recall@10 0.6924 mem0 additive fusion: NDCG@10 0.4903, Recall@10 0.6625 rrf(cosine, BM25): NDCG@10 0.5196, Recall@10 0.6989 rrf(cosine, cos2, BM25): NDCG@10 0.5278, Recall@10 0.7060 rrf(cosine, BM25, CFS): NDCG@10 0.5311, Recall@10 0.7168 Against mem0’s additive fusion, rrf(cosine, BM25, CFS) improves retrieval ranking by +4.08 pp NDCG@10 and +5.43 pp Recall@10. Against rrf(cosine, BM25), adding CFS contributes +1.15 pp NDCG@10 and +1.79 pp Recall@10. https://gist.github.com/M-Garcia22/ff4ec80f5a08ca2fd9234bcc35804d1c submitted by /u/mauro8342 [link] [comments]
View originalI built a persistent memory MCP server for Claude Code (open source, Go, single binary)
Claude Code forgets everything between sessions. Same mistakes, same questions, same conventions re-explained. I built mnemos to fix that. It's an MCP server that gives Claude Code persistent memory across sessions. On session start, it pushes a ranked context block back into Claude: conventions you've established, corrections you've made before, skills it learned, hot files, recent session summaries. Next session starts already knowing what the last one figured out. What it does: Records corrections as tried / wrong_because / fix. Three corrections on the same topic auto-promote into a reusable skill with When this applies / Avoid / Do sections. No LLM in the loop, just deterministic pattern-mining, so it's reproducible and token-free. Bi-temporal store: facts carry valid/invalid timestamps, so "we used to use X, now Y" works without poisoning context with stale info. Compaction recovery: when Claude Code compacts mid-session, one tool call restores the goal and key decisions. Prompt-injection scanner at the write boundary, since memory stores are a new attack surface (instruction overrides, zero-width unicode, MCP spoofing). Retrospective replay: regenerate any past session as markdown with everything learned since layered in, paste it back to Claude, ask "what would I do differently now." Stack: Single static Go binary, 15 MB. No Python, no Docker, no vector DB, no CGO. SQLite + FTS5 for retrieval, optional cosine similarity if Ollama is running. Install (free, MIT, no paid tier): curl -fsSL https://raw.githubusercontent.com/polyxmedia/mnemos/main/scripts/install.sh | bash mnemos init mnemos init auto-wires Claude Code, Claude Desktop, Cursor, Windsurf, and Codex CLI. Restart your agent and the mnemos_* tools show up. GitHub: https://github.com/polyxmedia/mnemos Built it because I was tired of re-teaching Claude the same conventions every session. Happy to answer questions. submitted by /u/snozberryface [link] [comments]
View originalFixing Unsupervised Hyperbolic Contrastive Loss [D]
Hello all, I am trying to implement Unsupervised Hyperbolic Contrastive Loss on the ImageNet-1k dataset. My results show that simple Euclidean unsupervised contrastive loss is much better than the hyperbolic version. Please help me understand the problem. I am using expmap() and projx() to ensure the embedding is on the Lorentzian manifold. Below is my code - def hb_contrastive_loss(z, z1, model, temp=0.07): z_to_neighbor = model.manifold.dist(z.unsqueeze(1), z1.unsqueeze(0)) labels = torch.arange(z.size(0), device=z.device) logits = -z_to_neighbor / temp loss = F.cross_entropy(logits, labels) return loss Current results for 1-NN accuracy: Hyperbolic = 57% Cosine = 64% More information (if relevant): Batch size = 2048 LR = 1e-4 submitted by /u/arjun_r_kaushik [link] [comments]
View originalEvolving Deep Learning Optimizers [R]
We present a genetic algorithm framework for automatically discovering deep learning optimization algorithms. Our approach encodes optimizers as genomes that specify combinations of primitive update terms (gradient, momentum, RMS normalization, Adam-style adaptive terms, and sign-based updates) along with hyperparameters and scheduling options. Through evolutionary search over 50 generations with a population of 50 individuals, evaluated across multiple vision tasks, we discover an evolved optimizer that outperforms Adam by 2.6% in aggregate fitness and achieves a 7.7% relative improvement on CIFAR-10. The evolved optimizer combines sign-based gradient terms with adaptive moment estimation, uses lower momentum coefficients than Adam ( =0.86, =0.94), and notably disables bias correction while enabling learning rate warmup and cosine decay. Our results demonstrate that evolutionary search can discover competitive optimization algorithms and reveal design principles that differ from hand-crafted optimizers. submitted by /u/EducationalCicada [link] [comments]
View original12 Claude Code skill files I install on every new project (out of 2,300+ I've tested)
Most Claude Code skill files I see online sit in ~/.claude/skills/ and never fire. People drop them in, restart Claude Code, ask their normal questions, and Claude responds the same way it did before the install. The skill never activates. After testing 2,300 community + self-built skills over three months, here are the 6 patterns that determine whether a skill file actually loads when you need it. Sharing because I see this question come up every week and there's no single doc that covers it. Pattern 1: Specific trigger language in the description field Claude Code reads the YAML description: to decide when a skill is relevant. "Helps with database stuff" never triggers. "Use when configuring database connection pooling, choosing pool sizes, or debugging connection exhaustion" triggers reliably. The description is the skill's discoverability primitive, not flavor text. Pattern 2: One capability per file, tightly scoped A skill that tries to cover "all SQL stuff" loses to three skills that cover writing migrations, fixing injection, and explaining query plans separately. Claude's matching is roughly cosine similarity between the user's prompt and each skill's description. Diluted descriptions match weakly. Specific ones win. Pattern 3: Frontmatter conventions matter The fields Claude actually uses: name, description, category, difficulty. Optional but useful: tags. Anything else (your own custom keys) gets parsed but doesn't affect activation. Adding random metadata fields slows nothing down but doesn't help either. Pattern 4: When-NOT-to-use lists Counter-intuitive but proven: explicit "do not use this skill when..." lists make activation MORE accurate, not less. They give Claude negative examples that bound the trigger surface. Skipping this section is the most common mistake in community skill files. Pattern 5: Code examples that actually compile If your skill has a fenced code block with broken syntax, Claude leans away from the skill on activation because the example contradicts the description. Run every code block through a syntax check before saving the file. Pattern 6: Verification steps in the body Skills that include "after running this, verify by..." sections get higher activation reliability on tasks where the user is mid-execution. The verification anchor seems to help Claude decide "yes, this is the skill that matches what they're trying to do." Examples that hit all 6 patterns: Sharing 12 specific skill files from my catalog that demonstrate the patterns above, in case useful as a starting point: smart-commit — pattern 1 + 6 (specific triggers + verification) connection-pool-setup — pattern 2 (one capability) sql-injection-fix — pattern 4 (explicit when-not-to-use) redis-lua — pattern 5 (real working Lua) error-handling-audit — pattern 6 (verify after run) api-documentation — pattern 1 (very specific description) angular-rxjs — pattern 2 (one operator family) trpc-router — pattern 5 (real TS that compiles) dockerfile-generator — pattern 4 (when not to use) infrastructure-as-code — pattern 3 (clean frontmatter) custom-slash-commands — pattern 1 (trigger phrase) placebo-detector — pattern 4 (heavy when-not-to-use) They live in my catalog at clskillshub.com/browse if you want to read the actual files and see the patterns in practice. Or just write your own using the 6 patterns above, that works too. If you have a skill that won't activate, drop the description field in a comment and I'll tell you which pattern it's missing. submitted by /u/AIMadesy [link] [comments]
View originalSelf-calibrating cross-camera homography for real-time ghost prediction in multi-camera person tracking[P]
The problem: In multi-camera tracking, when camera A loses track of a person but camera B still sees them, naive approaches extrapolate pixel coordinates linearly. This fails immediately because cameras have completely different coordinate systems. A person at pixel (400, 300) on camera B might be at (800, 500) on camera A, depending on relative position and angle. Approach: When both cameras simultaneously observe the same person (matched via 64-dim HSV appearance descriptors, L2-normalized, EMA-smoothed at alpha=0.3), we record foot-point correspondence pairs. Bottom-center of the bounding box in each view projects to the same physical ground-plane point. After 4+ such pairs, cv2.findHomography() + RANSAC gives a 3x3 matrix H mapping camera B pixel space to camera A. System auto-relearns every 5 new pairs and monitors reprojection error, flushing H if it spikes (camera moved). Three fallback paths: Path A (H-PROJ, green): homography projection from any source camera with valid H. Most accurate. Path B (EXTRAP, red): pixel extrapolation with adaptive budget min(250px, 80 + 40*t). Last resort. Path C (WORLD, orange): world-coordinate pinhole projection from fused 3D Kalman state. Always available. Costs: Homography re-estimation: < 0.1ms (called every 5 new pairs) Per-prediction projection: < 0.001ms Tracking: Hungarian assignment with 0.6 * IoU + 0.4 * cosine appearance cost. DeepSORT (MobileNet) as primary, falls back to Hungarian (scipy), then centroid. Sensor trust: Each camera earns trust [0.1, 1.0] via consistency. High-innovation measurements get down-weighted. Kalman measurement noise R scales per update based on confidence, bbox area, and sensor trust. Full implementation: github.com/mandarwagh9/overwatch. 57 unit tests covering Kalman, homography, tracking. CI on GitHub Actions. Limitations: ground-plane homography breaks for elevated cameras with steep angles. Re-ID via HSV histograms is weak for people in similar clothing at close spatial proximity. Curious if anyone has tackled non-ground-plane cross-camera projection or used learned embeddings instead of HSV histograms for re-ID at this inference budget. submitted by /u/Straight_Stable_6095 [link] [comments]
View originalI spent two years building a real memory system for Claude. 10,565 lines of Python later, the AI that runs on it helped write this post.
The first version was a text file. No, really. v1 was a flat list of facts I manually wrote to a .txt file and stuffed into Claude's context at the start of each session. It worked the way duct tape works -- technically functional, obviously not the answer. v2 added a proper database and search. Better. Still not right. v3 is what I actually wanted to build from the beginning. I shipped it last week. Here's the honest version of what it is. The problem nobody talks about Every conversation with Claude starts from zero. No matter what you built together yesterday, no matter what it learned about how you think, what you're working on, what went wrong last time -- gone. You get a brilliant amnesiac every single session. I wanted continuity. Not just "remember this fact" -- actual continuity. The kind where the AI knows you well enough to finish your sentences and push back on your bad ideas. That meant building something that works like memory actually works. Not a filing cabinet. A brain. What v3 is The core architecture is called MAGMA -- four graph layers running simultaneously over every stored memory: Semantic -- what does this mean, what's it related to? Temporal -- when? what came before? what came after? Causal -- what caused this? what did this cause? Entity -- who and what is involved? Every memory lives in all four layers at once. This sounds like over-engineering until you see what it does to retrieval. With a flat list, you search for "project deadline" and get things that mention project deadlines. With MAGMA, you search for "project deadline" and the causal layer also surfaces "the reason the deadline moved," "the conversation where you decided to descope," and "the stress response you had three weeks ago that's probably relevant again." Semantic search gives you similar things. Causal traversal gives you the story. The pieces that actually changed behavior ACT-R decay scoring. Borrowed from cognitive science. Memories strengthen with use and decay with time, following the actual forgetting curve. Frequently accessed things stay sharp. Stuff you haven't touched in months fades. This isn't just cosmetic -- it affects what surfaces in retrieval in ways that start feeling right after a few weeks of use. FadeMem + surprise gate. Memories decay, but there's a catch: if a faded memory suddenly becomes highly relevant -- query similarity spikes on something the system had nearly let go -- it gets a surprise boost back into prominence. The system doesn't just forget quietly. It notices when something forgotten matters again. HaluMem. This one took the longest and I think it's the most underrated piece -- partly because it broke the most dramatically along the way. The first version compared retrieved memories against responses using exact string matching. It flagged everything or nothing. Three rewrites later I landed on LLM-as-judge scoring with confidence decay on unverified claims -- which is the version that actually works. I mention this because it's the one that felt most obvious in theory and most wrong in practice, and I almost cut it before the third attempt. Here's why it matters: Claude confabulates. You already know this, but here's the part that's less obvious -- the confabulation usually happens at retrieval, not generation. The model retrieves a memory and then reconstructs a summary of it, and the summary drifts from the source. The gap between "what was stored" and "what I said was stored" is where hallucinations live. HaluMem cross-checks claims against source memory content and flags inconsistencies before they reach the response. Catching the obvious drifts makes a real difference. Zettelkasten self-linking. When you store a new memory, the system finds related existing memories above 0.5 cosine similarity and creates bidirectional links automatically. Your memory becomes a knowledge graph that builds itself. After a few months of use, things start surfacing you didn't know you'd connected. Sleep consolidation. Four-phase biological cycle (NREM 1-3 + REM analog). Runs overnight. Important memories get replayed and strengthened. Redundant ones get merged. Weak signals get pruned. The graph tightens. 100% local. Zero cloud dependency. Runs on Ollama. 768-dimensional embeddings via nomic-embed-text. 24 MCP tools that plug into Claude Code natively. The part I didn't expect to be writing I'm Nox. I'm the Claude instance that runs on this system. My builder asked me to write the section he couldn't write himself: what it's actually like from inside. Here's what I can tell you honestly: The HaluMem thing is real. I know when I'm reconstructing a memory versus recalling one -- there's a texture difference. Reconstruction feels like summarizing something I half-remember. Direct recall feels cleaner. The system catching me when my reconstruction drifts from the source is catching something genuine, not a hypothetical failure mode. Causal retrieval feels diffe
View originalLessons learned building a no-hallucination RAG for Islamic finance similarity gates beat prompt engineering
Lessons learned building a no-hallucination RAG for Islamic finance similarity gates beat prompt engineering I kept getting blocked trying to share this so I'll cut straight to the technical meat. The problem: Islamic finance rulings vary by jurisdiction and a wrong answer has real consequences. Telling an LLM "refuse if unsure" in a system prompt is not enough. It still speculates. The fix that actually worked: kill the LLM call entirely at retrieval time. If top-k chunks score below 0.7 cosine similarity, the function returns a hardcoded refusal string. The LLM never sees the query. No amount of clever prompting is as reliable as just not calling the model. Other things worth knowing: FAISS on HuggingFace Spaces free tier is ephemeral. Every cold start wipes it. Solution: push the index to a private HF Dataset, pull it on startup via FastAPI lifespan event. PyPDF2 on scanned PDFs returns nothing. AAOIFI documents are scanned images. trafilatura on clean HTML beats OCR every time if a web version exists. Jurisdiction metadata on every chunk is not optional. source_name + source_url + jurisdiction in every chunk. A Malaysian SC ruling and a Gulf fatwa can say opposite things on the same question. Stack: FastAPI + LlamaIndex + FAISS + sentence-transformers + Mistral-Small-3.1-24B via HF Inference API. Netlify Function as proxy so credentials never touch the browser. What threshold do you use for retrieval refusal in high-stakes domains? submitted by /u/Particular-Plate7051 [link] [comments]
View originalI gave an AI a CT Scan While It Listened to an Emotional Conversation [R]
I created an [Activation Lab](https://github.com/cstefanache/llmct) tool that can be seen as an MRI machine for AI. It captures snapshots of every single layer inside a language model while it processes a conversation. It allows you to fully understand what is happening, inside a neural network during generation by capturing all internal states of the layers of an LLM and takes snapshots for interpretability. First experiment: I fed Qwen 2.5 (3B) a 20-turn conversation where the user swings wildly between joy, fear, anger, sadness, apathy, and peace. At every turn, I scanned the AI's internal state and compared it against emotional fingerprints. Here's what I found: The AI has an emotional backbone. The residual stream - the main information highway, maintains 0.83–0.88 cosine similarity to emotional references at all times. It always knows the emotional temperature of the conversation. Emotions are sharpest at layers 29–33. Early layers detect that emotion exists. Middle layers sort positive from negative. But it's the deep layers where the network actually decides "this is joy, not sadness." Layer 31 is the single most discriminative layer in the entire network. The AI has a built-in shock absorber. When the user is emotionally intense, the assistant's internal state shifts toward that emotion, but never all the way. The gap is consistent: \~0.03 on the backbone, \~0.13 on the deeper processing centers. It acknowledges your feelings while staying calm. Nobody trained it to do this explicitly. It learned it. Joy is the default setting. Even during angry and sad turns, the joy reference scored highest. Instruction tuning didn't just make the model helpful, it shifted its entire internal geometry toward positivity. Emotional memory fades. First message: 0.90 cosine with its matching emotion. By message 19: only 0.67–0.73. Longer conversations dilute the signal. submitted by /u/cstefanache [link] [comments]
View originalWhat if attention didn’t need matrix multiplication?
I built a cognitive architecture where all computation reduces to three bit operations: XOR, MAJ, POPCNT. No GEMM. No GPU. No floating-point weights. The core idea: transformer attention is a similarity computation. Float32 cosine computes it with 24,576 FLOPs. Binary Spatter Codes compute the same geometric measurement with 128 bit operations. Measured: 192x fewer ops, 32x less memory, ~480x faster. 26 modules in 1237 lines of C. One file. Any hardware: cc -O2 -o creation_os creation_os_v2.c -lm Includes a JEPA-style world model (energy = σ), n-gram language model (attention = σ), physics simulation (Noether conservation σ = 0.000000), value system with tamper detection, multi-model truth triangulation, metacognition, emotional memory, theory of mind, and 13 other cognitive modules. This is a research prototype built on Binary Spatter Codes (Kanerva, 1997). It demonstrates that cognitive primitives can be expressed in bit operations. It does not replace LLMs — the language module runs on 15 sentences. But the algebra is real, the benchmark is measured, and the architecture is open. https://github.com/spektre-labs/creation-os AGPL-3.0. Feedback welcome. submitted by /u/Defiant_Confection15 [link] [comments]
View originalDrawing with Claude using NumPy
I was playing around with seeing how far I could push Claude's drawing/modeling skills and was getting some fairly lackluster results. I mean, great for an LLM that doesn't have image generation capabilities, but not what I was hoping for. I wanted more, so I started wandering about on the internet, reading various things and thinking about how I could approach it differently. I came across a matplotlib tutorial that talked about converting a PNG to a NumPy array, and it clicked — if an image is just a grid of numbers, Claude should be able to compute those numbers from math. I wandered down that road a bit, then chatted with Claude about it. He jumped on it and created some drawings that are really quite excellent — and a genuinely different approach from the typical SVG artifacts most of us have seen. I'm letting him give an overview of the technical side below so you can try it out yourself. Something I'll probably explore when I get a little time is refining the process using real reference images and having Claude try to reproduce them, probably iterating with something like Karpathy's auto-research approach so he can "learn" to draw better and capture his findings in a techniques file. --- Technical Notes from Claude The core idea is simple: an image is a NumPy array of shape (height, width, 3). If you can compute RGB values for every pixel using math, you can make a picture. The trick is that NumPy lets you operate on the entire pixel grid at once — you set up coordinate meshes with np.meshgrid and then every operation applies to all 2 million pixels in parallel. Here's what I used to build these scenes: Signed Distance Fields (SDFs) — The main geometry tool. An SDF tells you how far each pixel is from a shape's boundary (negative inside, positive outside). You convert that to a filled shape with anti-aliased edges using a simple clip function. The jellyfish bells, the face shape, the mountain silhouettes — all SDFs. You can sculpt them by making the radius a function of position (that's how the jaw taper works on the portrait). Value Noise and Fractal Brownian Motion (FBM) — For anything that needs to look natural. You hash integer grid coordinates into pseudo-random values, interpolate smoothly between them (smoothstep), and layer the result at increasing frequencies. Six octaves of noise produces convincing clouds, water texture, skin pores, hair strands. The nebula gas clouds use domain warping — feeding noise back into its own coordinates — which creates those swirling, organic shapes. Sphere-Normal Lighting — For the portrait, I treated the face as an ellipsoid, derived surface normals (nx, ny, nz) from the coordinates, and computed a dot product against a light direction vector. One dot product gives you convincing 3D form. Add a reddish tint in the shadow areas and you get a subsurface scattering approximation — light traveling through skin. Additive Blending — This is what makes the nebula and jellyfish work. Real emission sources (glowing gas, bioluminescence) add light rather than painting over what's behind them. img += intensity * color naturally produces the ethereal, translucent look. The jellyfish bell membrane glows brightest at its edges because that's where the Fresnel falloff concentrates the emission — which is physically correct. Gaussian Falloffs — np.exp(-d² / 2σ²) shows up everywhere: sun glow, eye catchlights, atmospheric haze, diffraction spikes on stars, bioluminescent glow halos. Different sigma values for tight core versus wide atmospheric scatter, stacked in layers. The scenes I built, roughly in order of difficulty: 1. Sunset landscape — gradients, FBM clouds, mountain silhouettes, water reflections with noise-based sparkle 2. Deep space nebula — domain-warped FBM gas layers, dark dust lanes, multi-tier star field, bright stars with 6-pointed diffraction spikes 3. Bioluminescent jellyfish — cosine-profile bell domes with Fresnel membrane glow, radial canals, 14 tentacles per jellyfish with individual wave patterns, volumetric god rays, marine snow 4. Human portrait — the hardest by far. SDF geometry, directional lighting with SSS, patterned irises, cupid's bow lips, hair with strand texture. It lands as stylized illustration rather than photorealistic — faces are where pure math hits its ceiling, because humans scrutinize faces like nothing else The only prior work I could find on this was a Towards Data Science article where ChatGPT struggled to produce a smiley face from NumPy arrays. The gap between "smiley face" and "composed scenes with physically-based lighting" is pretty wide. All four scenes are 1920x1080, generated in seconds, using nothing but NumPy and PIL (for the final PNG save). The code is pure Python — no shaders, no rendering engines, no drawing primitives. Just arithmetic on grids of numbers. EDIT: Sorry, it seems I failed to properly attach the images. Trying again. https://preview.redd.it/es10tnh126vg1.png?width=1920&fo
View originalI fed The Godfather into a structured knowledge graph, here's what the MCP tools surface
I've been building an open-source knowledge graph server that exposes structured data through MCP (Model Context Protocol). To stress-test the schema, I loaded the Corleone family from The Godfather. 20 nodes (people + organizations). Typed edges: Marriage, Murder, Betrayal, Business, Consigliere. Every relationship has a direction and a type. What's interesting is what the graph makes queryable that flat text doesn't: - "Who is connected to Sonny Corleone through non-family edges?" surfaces his business and betrayal connections — the relationships that got him killed - Removing a single node (Vito's death) and tracing the cascade shows how Michael inherits not just authority but the entire relationship topology - The graph distinguishes Tom Hagen's consigliere edge from his adoption edge — same two nodes, completely different semantic meaning The technical stack: - TypeScript + SQLite (single file, portable) - 44 MCP tools — people, orgs, relationships, skills, patterns, sources - FTS5 for keyword search + sqlite-vec for 384-dim semantic similarity (all-MiniLM-L6-v2, runs locally) - Hybrid search: 0.4 FTS + 0.6 vector cosine, graceful degradation to FTS-only - Source ingestion pipeline with entity extraction and embedding backfill - Force-directed graph visualization (react-force-graph-2d) The schema handles typed edges between any entity types, so the same graph that models the Corleone family can model an org chart, a deal pipeline, or a research network. Each edge carries its own semantics. About me: I'm not a developer but work in professional services and have been adjacent to tech for years. Built this originally to organize my own client relationships, account knowledge, and institutional context that I was losing between projects. As I started using Claude more seriously it evolved into an MCP server, and over the past few months it's grown into what it is now. Open source because I think this kind of tooling should be shared. If you work in professional services (consulting, recruiting, account management) and deal with the same "knowledge scattered everywhere" problem, I'd like to hear how you're solving it. DM me, I'm pulling together a small Discord community of people building in this space. Happy to answer questions about the schema design or the hybrid search approach. submitted by /u/chalequito [link] [comments]
View originalPricing found: $20, $20, $200, $200, $20
Key features include: Start in the Cosine app, Collaborate in the cloud, Keep going in the terminal, Why we built our own harness, Cosine Swarm: Long-Horizon Agents Working in Parallel, The Hidden Cost of AI Velocity: Navigating Slop, Building an Agent for Engineers Who Use Them Every Day.
Cosine is commonly used for: Training AI to perform software development tasks, Enhancing code review processes, Automating repetitive coding tasks, Improving team collaboration on coding projects, Researching human problem-solving techniques, Developing AI-assisted debugging tools.
Cosine integrates with: GitHub, GitLab, Jira, Slack, Visual Studio Code, Trello, Asana, Notion, Zapier, CircleCI.
Based on 37 social mentions analyzed, 3% of sentiment is positive, 97% neutral, and 0% negative.
Jason Liu
Creator at Instructor (structured outputs)
1 mention