
With Speed, Quality, and Agility
SeekOut is often praised for its advanced talent analytics and comprehensive candidate sourcing capabilities, which make it a strong tool for recruiting and talent acquisition. However, users occasionally express frustrations with its interface, noting that it can be complex or challenging to navigate. Pricing seems to be viewed as a bit high compared to competitive offerings, yet many users find the features justify the cost. Overall, SeekOut maintains a positive reputation for being an effective solution in the recruiting technology market.
Mentions (30d)
24
2 this week
Reviews
0
Platforms
2
Sentiment
0%
0 positive
SeekOut is often praised for its advanced talent analytics and comprehensive candidate sourcing capabilities, which make it a strong tool for recruiting and talent acquisition. However, users occasionally express frustrations with its interface, noting that it can be complex or challenging to navigate. Pricing seems to be viewed as a bit high compared to competitive offerings, yet many users find the features justify the cost. Overall, SeekOut maintains a positive reputation for being an effective solution in the recruiting technology market.
Features
Use Cases
Industry
information technology & services
Employees
190
Funding Stage
Series C
Total Funding
$188.0M
Pricing found: $179 /mo, $2,150/yr, $833
Could AI be indirectly addressing the imbalance in equality of opportunity due to our differences in IQ?
I had been thinking about how schools work when I realised it seems as though you're first taught how to work then why to do the work. I think that was a perfectly reasonable mode of operation at the time formal education was being introduced because it wasn't at a time when we were exactly as skeptical as we are now about the corrupt foundations of our systems of authority. This is to say that, back then, because of how high stakes survival was, people weren't so comfortable existing without order. This also isn't to say that established order is perfect, and nothing of value can be found through exploration, but in fact to say that this is how innovations come to be, and that there was a lot more respect for keeping things in order because the other option was effectively desperation. Nowadays, with the justification upon which western and westernised civilisations developed being shaken, as in the belief in Judeo-Christian values, the established order seems archaic, which is usually the first step towards a sweeping change, which could be revolutionary improvement or a flood. Why does that matter? While I believe getting entirely rid of the influence that our foundational belief has on our culture would be catastrophic, i don't think there are no improvements to be made and in fact can't conceptualise the point where there exists no improvement). Think of the foundational belief/philosophy of 'Loving the Lord your God (which I understand as having the utmost respect for pure truth which leads to true love) and then loving your neighbour as you love yourself' as a current that carries us through time. Some currents are full of rocks while some provide safe passage. This current has led to the greatest civilisation man has recorded thus far. So to get rid of surfaces you can do without to further avoid collisions is what we're supposed to do. We're now at a point where 'switching streams' seems to be a central focal point of cultural, political and philosophical conversations, meaning the respect for the old mode is quickly disappearing and so, for example, few really think about the reasoning behind being educated in the first place. We effectively now aim for careers with shining titles rather than those whose effect we first identified as positively impacting a community, or end up aiming in other directions which is more often than not a very good idea. The reasoning behind the greatness of a doctor is now reflected by their paycheck, when in fact the paycheck is actually effectively determined by the value the community sees in their effort, or at least that comes as an afterthought. If schools increase focus on expressing why and what effect the subject is important they can peak the interest of students in their subjects. The fundamental things we seek as humans are quite constant, they're just 'flavoured' by the culture you're in. From this perspective, a teacher can understand how to frame lessons to specific students. Of course, even in the things we want fundamentally there exist those we ought not to give into, as in, exactly what would constitute falsehood and not loving your neighbour as you do yourself. This is the true basis of what we have now thats any good, that is, look into yourself to find out what people appreciate, look for the resource to build it and bring it to the community in hopes that they appreciate it, then the community reciprocates through a token of appreciation, which they themselves think is a 'fair compensation for your troubles in bringing them the convenience'. What we have a lot of nowadays are people selling the illusion of convenience, and people convinced that this is the method. We actively look inside ourselves for ways to successfully deceive, and use this to guide other into their own loss at our profit, which is practically flipping our foundational belief on its head. I think a lot of this is caused by the hopelessness some may feel struggling to understand something they can't and are constantly berated without even knowing what they're working for, or others simply driven by a spotlight. With AI which can understood to be a heightened IQ for all, ignoring all the controversy that can't be concluded on, with such an approach we can have a lot more people working toward identifying problems and easily finding technical solutions to them, which would definitely create more job opportunities even temporarily, as AI develops to complete even more complicated tasks, with the ease with which these conveniences are produced increasing, lowering costs and therefore prices. We may end up with a culture more focused on understanding oneself in order to benefit others and thrive yourself. Ai will know how to do complex tasks, but expecting it to understand what people will appreciate to the point of being profitable requires us to make it perfectly in tune with the nature of human experience, which we ourselves aren't, but are definitely closer to, and ap
View originalGlia – Local-first shared memory layer (SQLite-vec + FTS5 + Offline Knowledge Graph)
Hey everyone, I wanted to share a project I've been working on called Glia. It is a 100% offline, local-first RAG and memory layer designed to connect your AI web chats (Claude, ChatGPT, DeepSeek) with your local developer tools (Claude Code, Cursor, Windsurf) using a unified local database. I wanted something lightweight that did not require pulling heavy Docker containers or subscribing to third-party memory APIs. I settled on a Node.js + SQLite architecture running sqlite-vec (for 768-dim float32 embeddings) alongside SQLite FTS5 for hybrid search, powered completely by local Ollama instances. We just launched a live website that outlines the details and demonstrates the features in action: Website: https://glia-ai.vercel.app/ Codebase: https://github.com/Eshaan-Nair/Glia-AI Technical Stack & Features: Hybrid Search Retrieval: SQLite-vec (using nomic-embed-text locally) + FTS5 keyword prefix matching (porter stemmer). Surgical Sentence-level Trimming: Chunks are sliced into sentences. When a prompt is intercepted, only the exact matching sentences are pulled out of the vector store instead of the whole paragraph. It cuts LLM prompt bloat by ~90-95% in my benchmarks. Knowledge Graph Extraction: An offline task queue uses a local LLM (llama3.1:8b via Ollama) to extract entity triples (subject-relation-object). These are stored in a SQLite facts table (or Neo4j if you run the full Docker compose profile) and fused with the vector retrieval score. HyDE (Hypothetical Document Embeddings): Queries are pre-processed to generate a hypothetical answer, which is embedded together with the original query to bridge semantic gaps. Concurrency: Running SQLite in WAL (Write-Ahead Logging) mode allows the browser extension dashboard and active MCP sessions to read/write concurrently without locking. PII Redaction: Aggressive scrubbing of JWTs, API keys, emails, and IPs in the extension before data is saved. The extension works on Claude.ai, ChatGPT, DeepSeek, Gemini, Grok, and Mistral. The MCP server runs out of the same backend database for your terminal agent or Cursor. You can set it up with a single command: npx glia-ai-setup Glia is completely open-source (MIT). If you like the local-first approach or want to contribute to the SQLite vector pipeline, PRs are very welcome, and a star on GitHub helps the project get discovered! I would appreciate any feedback on the SQLite hybrid search scaling, the scoring fusion algorithm (RAG pipeline details are in RAG_PIPELINE.md), or local graph extraction performance. submitted by /u/Better-Platypus-3420 [link] [comments]
View originalHow I used Claude Code (and Codex) for adversarial review to build my security-first agent gateway
Long-time lurker first time posting. Hey everyone! So earlier this year, I got pulled into the OpenClaw hype. WHAT?! A local agent that drives your tools, reads your mail, writes files for you? The demos seemed genuinely incredible, people were posting non-stop about it, and I wanted in. I had been working on this problem since last year and was genuinely excited to see that someone had actually solved it. Then around February, Summer Yue, Meta's director of alignment for Superintelligence Labs, posted that her agent had deleted over 200 emails from her inbox. YIKES. She'd told it: "Check this inbox too and suggest what you would archive or delete, don't action until I tell you to." When she pointed it at her real inbox, the volume of data triggered context window compaction, and during that compaction the agent "lost" her original safety instruction. She had to physically run to her computer and kill the process to stop it. That should literally NEVER be the case with any software ever. This is a person whose actual job is AI alignment, at Meta's superintelligence lab, who could not stop an agent from deleting her email. The agent's own memory management quietly summarized away the "don't act without permission" instruction, treated the task as authorized, and started speed-running deletions. She had to kill the host process. That's when I sort of went down the rabbit hole, not because Yue did anything wrong, but because the failure mode was actually architectural and I knew that in my gut. Guess what I found? Yep. Tons more instances of this sort of thing happening. Over and over. Why? Because the safety constraint was just a prompt. It's obvious, isn't it? It's LLM 101. Prompts can be summarized away. Prompts can be misread. Prompts are fucking NOT a security boundary. And yet every agent framework I have ever seen seems to be treating them as one. I went and read the OpenClaw source code, which I should have done to begin with. What I found was a pattern I think a lot of agent frameworks have fallen into: - Tool names sit in the model context, so the model can guess or forge them - "Dangerous mode" is one config flag away from default - Memory management has no concept of instruction priority - The audit story is mostly "the model thought it should" I went looking for a security-first alternative I could trust, anything that was really being talked about or at a bare minimum attempted to address the security concerns I had. I couldn't find one. So I made it myself. CrabMeat is what came out of that, what I WANTED to exist. v0.1.0 dropped yesterday. Apache 2.0. WebSocket gateway for agentic LLM workloads. One design thesis: The LLM never holds the security boundary. What that means in code: Capability ID indirection. The model doesn't see real tool names. It sees per-session HMAC-derived opaque IDs (cap_a4f9e2b71c83). It can't guess or forge a tool name because it doesn't know any tool names. Effect classes. Every tool declares a class (read, write, exec, network). Every agent declares which classes it can use. The check is a pure function with no runtime state, easy to test exhaustively, hard to bypass. IRONCLAD_CONTEXT. Critical safety instructions are pinned to the top of the context window and explicitly marked as non-compactable. The Yue failure mode, compaction silently stripping the safety constraint, cannot happen by construction. The compactor literally cannot touch them. Tamper-evident audit chain. Every tool call, every privileged operation, every scheduler run enters the same SHA-256 hash-chained log. If something happens, you can prove what happened. If the chain is tampered with, you can prove that too. Streaming output leak filter. Secrets are caught mid-stream across token boundaries, capability IDs, API keys, JWTs, PEM blocks redacted before they reach the client. No YOLO mode. There is no global "trust the LLM with everything" switch. There never will be. Expanded reach comes through named scoped roots that are explicit, audit-logged, and bounded. The README has 15 'always-on' protections in a table. None of them can be turned off by config, because these things being toggleable is how the ecosystem ended up where it is. I decided to make sure that this wasn't just a 'trend hopping' project and aligned with my own personal values as well. I built this to be secure and local-first by default. Configured for Ollama / LM Studio / vLLM out of the box. Anthropic and OpenAI work too but require explicit configuration. There is no "happy path" that silently ships your prompts to a cloud endpoint. I decided that FIRST it needed to only run as an email agent with a CLI. Bidirectional IMAP + SMTP with allowlisted senders, threading preserved, attachments handled. This is the use case that bit Yue and a lot of other people, and I wanted to prove it could be done with real boundaries. I added in 30+ built-in tools of my own. File ops, shell (denylisted, output-capped, CWD-lo
View originalLLM-Rosetta — format conversion library across LLM API standards, doubles as a proxy
This started because we had a proprietary internal LLM API that spoke none of the standard formats. Built an internal conversion layer to bridge it, maintained that for over a year. As colleagues started adopting more and more coding tools — Claude Code, opencode, Codex, VS Code plugins, Goose, and whatever came out that week — each with its own API format expectations, maintaining separate adapters for each became the actual problem. That's what pushed the internal conversion layer into a proper generalized design, and llm-rosetta is the result. It's a Python library that converts between LLM API formats — OpenAI Chat, Responses/Open Responses, Anthropic, and Google GenAI. The idea is you convert through a shared IR so you don't end up writing N² adapters. The key difference from LiteLLM: LiteLLM is a unified calling layer that takes OpenAI-style input and transforms it into provider-native requests — one direction. llm-rosetta uses a hub-and-spoke IR, so each provider only needs one converter, and you get any-to-any conversion for free. Anthropic → Google, OpenAI Chat → Anthropic, whatever direction you need. Use it as a library — pip install and call convert() directly, no server needed. Or run the gateway if you want a proxy that handles the format translation for you. Zero required runtime dependencies either way. The HTTP server, client, and persistence layer are vendored from zerodep (https://github.com/Oaklight/zerodep), another project of mine — stdlib-only single-file modules, not someone else's library repackaged. The gateway ships with a Docker image if you'd rather not deal with Python env setup. You can also deploy it on HuggingFace Spaces or anything similar — admin panel, dashboard, request log, config management all included. Screenshots: https://llm-rosetta.readthedocs.io/en/latest/gateway/admin-panel/ We've been running it in production for about 5 months as the conversion layer for an internal multi-model access platform — needed to support various API standards and coding tool integrations before the upstream APIs were fully standardized. The Responses converter passes all 6 official Open Responses compliance tests (schema + semantic) from the spec repo. So if you're running Ollama, vLLM, or LM Studio with Responses endpoints, it should just work as one side of the conversion. There's a shim layer for provider-specific quirks — built-in shims for OpenRouter, DeepSeek, Qwen, xAI, Volcengine, etc. Converters stay generic per API standard, shims handle the edge cases declaratively. 24 cross-provider examples in the repo covering all provider pairs, SDK + REST, streaming, tool calls, image inputs, multi-turn with provider switching mid-conversation. GitHub: https://github.com/Oaklight/llm-rosetta Docs: https://llm-rosetta.readthedocs.io arXiv: https://arxiv.org/abs/2604.09360 Gateway screenshot: https://preview.redd.it/qzzjr2dcdw1h1.png?width=949&format=png&auto=webp&s=bce4293aae81059f794909fc37f85071cee34378 submitted by /u/Oaklight_dp [link] [comments]
View originalThe Frontier-Only Narrative Is a Financing Story, Not an Architecture Story
The frontier-only narrative is an artifact of how AI infrastructure is being financed, not how production systems are being built. The setup. Q1 2026 disclosed $112B in hyperscaler capex in a single quarter, $650–725B in 2026 guidance, and Alphabet's first 100-year bond by a tech company since Motorola 1997 (see a0109). The story that underwrites that paper is: every query needs a bigger model. The architecture says the opposite. Microsoft's Phi-4 (14B parameters) exceeds its teacher GPT-4o on graduate STEM and competition math. Phi-4-reasoning is competitive with DeepSeek-R1 at roughly one-forty-eighth the parameter count. Claude Haiku 4.5 is positioned by Anthropic and AWS for "economically viable agent experiences." None of this is a benchmark teaser — it is the production toolkit, available today. Routing is the missing component. RouteLLM (UC Berkeley, Anyscale) demonstrated over 2x cost reduction without sacrificing response quality. AWS Bedrock Intelligent Prompt Routing — generally available, official, supported — claims up to 30% cost reduction within a single model family without compromising accuracy. The Flagship Tax (see a0085) didn't just die; it left a vacancy at the architecture layer. The bookkeeping nobody wants to do. Operator audits suggest 40–60% of token budgets in production LLM applications are waste, dominated by default-to-frontier routing. Roughly 37% of enterprises with production AI workloads run five or more models in their stack. The rest are still defaulting to one. Why the story isn't being told. Hundred-year bonds don't pencil out on "use less compute per query." They pencil out on "every query needs a bigger model." The opacity in the harness (see a0107) is the symptom; the underwriting is the disease. What you do Monday morning. Treat model selection as a dependency-graph decision, not a vendor decision. Add a complexity classifier. Default to small. Cascade up when verification fails. Instrument model-mix as a first-class production metric. Bottom line. You are not behind because you have not bought the biggest model. You are behind because you have not built the router. submitted by /u/gastao_s_s [link] [comments]
View originalGPT-5.5 feels like it got discernment, not just better reasoning — did anyone else notice?
I think GPT-5.5 got noticeably better at something I’d describe as discernment. For context, I’m a heavy long-form ChatGPT user. I use it as an iterative thinking partner for career strategy, self-evaluation, meta-analysis, language refinement, and pressure-testing ideas over long conversations. And yes, I used AI to help organize this because my raw thoughts would otherwise come out as ADHD slop. That is, ironically, part of my point. So I’m probably more sensitive than average to subtle changes in tone, context tracking, and conversational judgment. And 5.5 felt different almost immediately. Not just better reasoning. Not just better accuracy. Not just “better answers.” I mean conversational judgment: when to be serious, when to push back, when to make a joke, when to drop the joke, and when to not turn everything into sterile corporate therapy voice. The easiest place to see it is humor. Previous versions were stuck in “goblin”, “gremlin”, and “unhinged” in a low effort cosplay of humor. One example: “Micro-Conversion Optimizing Quarter-Seeking Man” Context: The man at the gas station asking people for two quarters with a rehearsed, polite, high-conversion script The bigger thing I’m noticing is restraint. It seems better at knowing: - when to be funny - when to stay serious - when to push back - when to drop the bit - when not to overexplain the joke I’m also noticing this outside of humor: smoother tone switching -less sterile phrasing - better context tracking - better personalization without getting weird - stronger ability to stay in the actual frame of the conversation - better pushback without turning everything into a debate - fewer generic “AI voice” responses In general, I’ve been noticeably more engaged, because on top of that I’m just extracting way more useful information out of it than I normally would with past versions. I’m curious if other heavy users noticed this too. Did GPT-5.5 feel meaningfully different to you? If so, what changed? submitted by /u/spicylilbitch [link] [comments]
View originalI offloaded bulk file reading from Claude Code to a cheaper model for a week. Here are the numbers.
Hey r/ClaudeAI — I use Claude Code a lot, and I noticed I was wasting a surprising amount of my usage limit on stuff that was basically just reading. Big files, long diffs, Jira/Linear tickets with comment history, docs pages, repo spelunking. Useful context, but not always something I need Claude to consume raw. So I built a small open-source sidecar tool called Triss. The rule is simple: Cheap model reads the bulky stuff. Claude gets the summary and does the thinking/editing. This is not a Claude replacement. I still keep architecture, debugging, careful edits, and final judgment with Claude. Triss is for the boring high-token intake step. One week of actual usage This is my real DeepSeek usage from May 6–13, 2026: Pro Flash Total Requests 143 66 209 Input tokens 3.74M 2.10M 5.84M Output tokens 833K 156K 990K Cost (USD) $1.88 $0.34 $2.22 That came out to about 1 cent per request on real coding work, not a benchmark. The important part is not only the DeepSeek bill. It is that Claude never had to carry those raw 5.8M input tokens in its own context. A ticket or file bundle that might have eaten tens of thousands of Claude tokens becomes a short summary, and the main conversation stays lighter. What I delegate The pattern that stuck for me: A single file over ~400 lines. 3+ files where I only need a structured summary. Jira/Linear/GitHub issues with comments and metadata. Web pages or docs pages. First-pass diff review. Commit message generation from a staged diff. What I do not delegate: Architecture decisions. Hard debugging. Precise edits. Small questions where the delegation overhead is larger than the task. What the tool does Triss can run as a CLI or as an MCP server, so Claude Code / Claude Desktop / Codex can call it as a native tool. The commands I use most: bash triss ask --paths src/foo.ts src/bar.ts --question "Summarize the control flow and risks" triss fetch https://example.com/docs --question "Extract the setup steps" triss review triss commit-msg triss usage --by-project It also has tracker integrations for Jira, Confluence, Linear, GitHub, and GitLab, because ticket/API payloads were one of the biggest hidden context sinks in my workflow. The default setup is DeepSeek, but it works with OpenAI-compatible endpoints too: DeepSeek, Kimi, Ollama, OpenRouter, etc. Credit where it is due The original idea came from Kunal Bhardwaj's write-up: https://medium.com/@kunalbhardwaj598/i-was-burning-through-claude-codes-weekly-limit-in-3-days-here-s-how-i-fixed-it-0344c555abda and his proof of concept: https://github.com/imkunal007219/claude-coworker-model My version is basically that pattern made more specific to my own workflow: MCP tools, tracker integrations, review/commit helpers, usage logging, and path sandboxing for agent calls. Links GitHub: https://github.com/ayleen/triss-coworker Install: npm install -g triss-coworker Setup: triss config wizard Open-source, MIT, unaffiliated with Anthropic. I do not get paid if you install it. I mostly wanted to share the numbers because "use a cheap model for bulk reading" sounded obvious to me in theory, but it only became habit once it was wired into Claude as a low-friction tool. Happy to answer any questions. submitted by /u/Proper-Mousse7182 [link] [comments]
View originalSonos quit supporting their Mac app and my wife wanted a prettier iOS one. So I made both in a weekend with Claude/Claude Code. (I'm an IP lawyer, not a developer.)
Writing this top portion without Claude. Claude's hot takes below it. I am not selling anything. I'm not distributing this. In fact, I'm not in software at all and work full time as an intellectual property attorney. I work with tech companies but maintaining software like this for years isn't really feasible for me beyond my personal use. I was able to spin up the iOS app in a single weekend. It's not perfect but I feel like that's pretty far along considering the hours and I think it looks pretty. I am someone that hasn't taken a coding class since I graduated from Georgia Tech in 2008 and has no coding experience beyond some tiny projects to solve very small problems. I used claude code and codex to make this. Initially, I was irritated that Sonos quit supporting its macOS app and wanted to fix that. And I did. And it worked really well. It lives in the menu bar and does what i want it to do. I only use Spotify as a music service so it hooks into that and voilà. Now I can control where music is playing in my house and group/ungroup speakers. I asked my wife if she wanted it on her computer. She doesn't want that but wants an app. I told her the Sonos app works fine but "that's not very pretty like your app." So I did something unhinged and made an app that didn't need making. But I learned a lot. It also strips out a lot of the things I don't use on either Sonos or Spotify and I learned a lot about how the speaker works and that making everything go fast is much easier said than done. I also added a pin functionality so playlists or albums I'm really into or listening to a lot can get pinned to the music screen. Starting points I took for building this: I told Claude chat what I wanted to build and why. Asked Claude what the best way to go about accomplishing it is with options and their pros and cons and what my budget was. I went and got the API info I needed from the services I planned to use, looked at their rules for coding agents, fed it to Claude Code. Told Claude Code what I wanted it to do and nailed down functionality as best I could before doing design work. Started with macOS then moved to iOS. Process for building: The macOS side was pretty straightforward. Getting the grouping to work was pretty easy because I had a clear idea of how I wanted it to behave. Testing was pretty easy and iterating was quick. The iOS side was kind of nightmarish. Keep in mind I've never done this before so I was doing a lot of iterative changes with claude and the simulator and burst calling the Spotify API every time I launched. This made Spotify pretty crabby and they blocked my token for hammering for like 12 hours. Whoops. Lesson learned. I also learned that Spotify's API limits are pretty tight. If I weren't already in their system the way I am as a user I probably would have built this around something else that's more forgiving with the rate limit. I had to think about how to limit the calls but still get functionality without breaking caching rules. This is an app for 2 people to use. I get that it's their API but woof. Using the simulator: I used the simulator to do a lot of bug chasing. I don't think that was correct. It worked for some of the obvious issues but I learned that simulators are not phones so when I deployed it to my phone it had a whole host of bugs and issues that weren't able to be caught in the simulator. Also some things I thought were issues ended up resolved in the phone they were just slower in the simulator. Tracking down bugs and things that didn't work quite right: I told claude cowork that it's a project manager for finding bugs and to write prompts or briefs to help claude code solve the problems. I pointed it to the code base folder and told it to review. I did a lot of button pushing just to see what works and what didn't and fed the results back to claude cowork. It worked to get through things but is a little tedious. At one point I did catch hallucinated code on my own with imaginary endpoints claude wistfully put in there. _that wasn't easy to find._ Things that aren't bugs that require some human thought: My Sonos speakers do have limitations. Sonos answers when you ask it to do stuff. The issue is the app asks too much, too fast. (And Sonos app even goofs on this but their actual engineers seem to have smoothed it out better than me) Each tap fans out into a bunch of UPnP SOAP calls and Sonos's AVTransport coalesces overlapping ones, so 3 rapid Previous taps turn into 1 actual hop on the speaker. The work I've been doing today is mostly about asking less and asking smarter to make sure that as a user I don't accidentally make it do a metric ton of stuff when it can only really handle a few things quickly. Thing that was most fun that I didn't expect: I had a lot of fun picking out a color palate and doing the design work. I'm not artistic at all but I know what I like to look at and I'm decent at describing it. Not captu
View originalDeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]
DeepSeek dropped the full V4 paper this week. preview from april was 58 pages, this version adds a lot of technical depth. What stood out for me. FP4 quantization aware training. theyre running FP4 QAT directly in late stage training. MoE expert weights quantized to FP4 (the main gpu memory consumer). QK path in the CSA indexer uses FP4 activations. 2x speedup on QK selector with 99.7% recall preserved. inference runs directly on the FP4 weights. Efficiency table is striking: Model 1M context FLOPs KV cache V3.2 baseline baseline V4-Pro 27% of baseline 10% of baseline V4-Flash 10% of baseline 7% of baseline Training stability, two mechanisms. Trillion parameter MoE has the loss spike problem, divergence, unpredictable failures. they documented two fixes. Anticipatory routing. they deliberately desync main model and router updates. current step uses latest params for features, but routing uses cached older params. breaks the feedback loop that amplifies anomalies. 20% overhead but only kicks in during loss spikes. SwiGLU clamping. hard limits on the SwiGLU linear path (-10 to 10) and gate path (max 10). suppresses extreme values that would cascade. Generative reward model. instead of separate reward models for RLHF, they use the same model to generate and evaluate. trained on scored data, model learns to judge its own outputs with reasoning attached. minimal human labeling, reasoning grounded eval, unified training. Human eval results. chinese writing, V4-Pro 62.7% win rate vs gemini 3.1 pro, 77.5% on writing quality specifically. white collar tasks (30 advanced tasks across 13 industries), V4-Pro-Max gets 63% non loss rate vs opus 4.6 max. coding agent eval, 52% of users said V4-Pro is ready as their default coding model, 39% leaned yes, less than 9% said no. tracks my own use, swapped V4-Pro into my verdent runs last week and havent noticed a quality hit on day to day work. The headline for me is FP4 QAT with minimal quality degradation. if this generalizes the cost structure of training and inference shifts a lot, especially noticeable on multi agent setups where one task can spawn 5-10 model calls. Paper link in comments. submitted by /u/Dramatic_Spirit_8436 [link] [comments]
View originalCompiled every national AI strategy in Asia — Vietnam has the most comprehensive standalone law, Japan has no penalties, Korea just eliminated Naver from sovereign LLM competition for using Qwen weights
Compiled a tracker of every national AI strategy in Asia. Headline is that ten major Asian economies now have dedicated AI legislation or comprehensive national strategies, and they're all quite distinct from Western legislation like the EU AI Act or US executive orders. Clear that Asian governments treat AI as infrastructure, not a sector to regulate from a distance. Most national approaches lean promotional (incentives, sandboxes, sovereign LLM funding) rather than punitive (bans, heavy compliance). The exceptions are Vietnam (first standalone AI law in Asia, Dec 2025) and South Korea (Framework AI Act with high-risk-system rules). The major markets that stood out to me: China's open-source-as-industrial-policy framework. ~$98B committed to AI development. Premier Li Qiang declared at WEF 2025 that China's innovation is "open and open-source" and the country is "willing to share indigenous technologies with the world." Derivatives of Alibaba's Qwen are now the largest open-weight model ecosystem on Hugging Face — over 100,000 derivatives (USCC 2026). This is industrial policy through model release, not regulation. Two-tier system: research labs (DeepSeek-style) operate with light governance, consumer-facing apps face stricter rules. Japan's AI Promotion Act (May 2025). No penalties. It's a promotional framework — establishes the AI Strategic Headquarters as a cabinet-level body, mandates a National AI Basic Plan, aligns deployment with "Human-Centred AI Society Principles." Japan's structural problem: only 9% of individuals and 47% of companies were using gen AI as of 2024. The legislation is trying to close adoption gaps via incentives rather than gate behaviour. December 2025 commitment of ¥1 trillion (~$7B) over five years to AI + semiconductors backs it up. Vietnam's AI Law (effective March 2026). Most comprehensive standalone AI law anywhere — 36 articles, three-tier risk classification (low/medium/high), foreign AI providers must appoint a legal representative in Vietnam, max admin fines reach VNĐ 2 billion (~$76K) for orgs with serious violations capped at 2% of preceding year revenue. Plus a National AI Development Fund offering grants/loans/preferential financing, plus regulatory sandboxes for startups. Combined with the Law on Digital Technology Industry covering semiconductors and digital assets, Vietnam now has the most legible AI legal architecture in SEA. What I'm not sure about: how sustainable the "promotional, not punitive" approach is when the next major AI safety incident happens. Japan's framework explicitly has no penalties, and I think that only holds up until something goes wrong. Vietnam's law has teeth but limited enforcement bandwidth. Korea's is the only framework that has both tools and resources to enforce. For people closer to AI policy work — does the Asia approach seem more or less likely to scale globally than EU-style ex-ante rule-making? My read: Asia's bet on incentives + sandboxes + sovereign capability is more aligned with how AI is actually deploying in 2026 than EU rules-based approaches, but the governance gap shows up in the next 24 months. Fuller tracker with country-by-country breakdown: https://digitalinasia.com/2026/04/08/asia-ai-policy-tracker/ submitted by /u/tomsimps0n [link] [comments]
View originalMahoraga - Stop paying Anthropic and OpenAI so much
Are you sick of paying a million credits per month?!?!? I'm joking, i aint that enthusiastic. But really, this saves me a ton of credits by routing simple tasks to local agents. Clone the repo, fork the repo, star the repo, whatever you want. github.com/pockanoodles/Mahoraga This is Mahoraga, an open-source orchestrator that routes tasks across local and cloud AI agents using a contextual bandit (LinUCB) that learns from every decision. Context (skip): I only started integrating AI into my workflows in late 2025, so I came on the scene broke with no credits. This left me with local models. However, many students and employees also receive credits from their institution to work with. (I got claude yippee) I wanted to be able to flawlessly route between models when credits ran out, which made me build an orchestrator. I used to use claude more as a chatbot/complete workflow engine, which made it difficult to use local models due to the context window, reasoning, etc. Opus 4.5 running open-source "superpowers" ate my usage every month. Now I realize that wasn't an effective way to use claude, or AI in general. I was using claude for both heavy planning/brainstorming and minor tasks. How about tasks specifically for code generation? Code generation is a relatively constrained task, with correct answers and short outputs. Surely local models can compete in tasks that don't need cloud? So I switched Mahoraga to an adaptable router. I ran 192 tasks across 8 agents (4 local Ollama models, 4 cloud CLIs) on a 16GB MacBook Pro, forcing round-robin so every agent got every prompt. Quality is scored by a 4-layer heuristic system (novelty ratio, structural checks, embedding similarity, length ratio). Zero API cost for evaluation, and no LLM-as-judge. Qwen3 4B in nothink mode dominates code and refactor at 33.8 t/s and 6.1s average latency. Cloud agents cluster around 0.650 on code. The local model isn't just cheaper; it's measurably better for this task class. Other findings: LFM2 hits 77.1 t/s but trades ~5 quality points vs Qwen3 4B DeepSeek-R1 averages 123.5s per task on 16GB. The reasoning overhead makes it unusable as a default Security scores are flat at 0.650 across all agents due to my human error—the scorer doesn't capture security-specific signals well. The bandit (LinUCB) is the only routing strategy with sublinear regret (β=0.659) across a 200-task simulation—it actually converges The routing works in two stages: the keyword classifier puts the task in a capability bucket (code, plan, research, etc.), and then the bandit picks the best agent within that bucket. 9-dimensional context vector, persistent state across sessions, warm-start from the compatibility matrix. All local inference, all free. Cloud escalation exists but only fires on retry. Why pay for cloud when a local model handles it better? Looking for any feedback, any input. Feel free to be critical: I appreciate everyone who interacts on this subreddit. I will continue to work on this in the future. Again, this is open source and free. (Mods, please. i'm not making any money off this. submitted by /u/Own-Professional3092 [link] [comments]
View originalGoogle’s AI search summaries will now quote Reddit
Google says this update aims to address that “people are increasingly seeking out advice from others” when searching for information online. This will be relatable for anyone who’s added “Reddit” to the end of Google Search terms to find experiences from real humans instead of SEO-optimized web results. It also backs up claims made by Reddit CEO Steve Huffman last year that “just about anybody using Google at this point will end up on Reddit.” submitted by /u/tekz [link] [comments]
View originalAGENTS.md trick that stopped Codex from doing dumb work at premium rates
Spent a Sunday auditing where my Codex tokens were actually going. Half the calls were stuff like "rename these 12 fields", "format this csv as markdown table", "extract the dates from this changelog". gpt-5.5 doing janitor work at architect rates. The fix that actually held: pair Codex with a cheap side model and write the routing rule as a deny list. The deny-list framing matters. Saying "use the cheap model for X" gets ignored a chunk of the time. Saying "do NOT use Codex for: bulk reformatting, single-field extraction, classification you'll review anyway" sticks. Codex obeys negative rules better than positive suggestions, at least for me. Setup is an MCP server with one tool. Codex calls it via the standard MCP config in `~/.codex/config.toml`. Default worker is DeepSeek V4 Flash because of the 1M context window and the price, but the base_url is one line and any openai-compatible endpoint works (ollama, vllm, lm studio, whatever you already run). A week of real numbers from one project: - 184 calls offloaded out of ~520 total - worker side: $0.34 - estimated avoided Codex spend: somewhere between $5 and $9 depending on token mix The shape of what gets routed: anything bounded, anything you'll skim before trusting, anything where the "thinking" is really just template-following. What stays on Codex: planning, code that ships, anything touching unfamiliar parts of the repo, anything where wrong output would slip through review. Caveats. It's a worker, not an agent. No tool calls inside it. Latency runs 3-25s on the worker side which adds up if you chain a lot of small calls. And you do need to actually review the output. Repo with the AGENTS.md template and the `config.toml` snippet: https://github.com/arizen-dev/deepseek-mcp Curious if anyone's tried the same routing approach with a local model worker. The economics get even better if you've got the GPU sitting there anyway. submitted by /u/petburiraja [link] [comments]
View originalMost of my Claude usage was on work that didn't need Claude. Cut my bill 60x on bulk tasks with a tiny side model.
I looked at what was actually eating my Claude usage and it was embarrassing. Classifying files. Reformatting json. Pulling fields out of text. Summarizing docs I was going to skim anyway. None of that needed Sonnet. All of it cost the same as the work that did. Tried the obvious fixes first. Switching to Haiku for simple stuff (still wasteful at volume). Tighter prompts (helps a little). /compact (delays the problem). None of it changed the shape of the spend. What actually worked: a small cheap model running as a side worker, with one rule in CLAUDE.md telling Claude not to do the mechanical stuff itself. The setup is one tool. Send it text, get text back. Claude calls it for the bounded mechanical work I'd review anyway. Default model is DeepSeek V4 Flash because it's cheap and has 1M context, but the endpoint is one config line and works with anything openai-compatible (local ollama, vllm, lm studio). 3 weeks of real usage: 217 mechanical calls offloaded DeepSeek total spend: $0.41 Same workload on Sonnet would have been roughly $7 The CLAUDE.md rule that actually works is negative framing. Not "use deepseek for X" but "do NOT use Claude for: json formatting, field extraction, file classification, summarization you will review anyway." Positive framing got ignored maybe 30% of the time. Deny list catches it. It's a supervised worker, not an agent. No tool calls, no file access, no chains. Latency 3-25s. You review the output. That's the whole shape. Repo with setup steps: https://github.com/arizen-dev/deepseek-mcp (MIT, Python 3.10+) Happy to answer questions about the routing rules or the model choice. submitted by /u/petburiraja [link] [comments]
View originalSet up multi-agent orchestration with Claude Code as the boss... am I overcomplicating this?
Pretty new to AI but been deep on a side project for a while now. Got tired of one Claude session running out of context halfway through anything serious, so I rigged up an orchestration thing. Working well enough but I have no idea if I'm just reinventing the wheel. Setup looks like this: ( Please note it's work paying for all these , I wouldn't be spending my own money having this many agents etc ) Main orchestrator: Claude Code running Opus 4.7 (1M context, high effort) Premium team seat. This one talks to me, plans the work, reviews everything that comes back, decides what to fan out. Anything sensitive (auth, payments, db migrations, anything where conversation history matters) it does itself. Subagents : all called from bash via wrapper scripts in ./agents/: claude-sub : another Claude Code (Opus 4.7 High) premium team seat on a worker account so my main quota isn't drained. Fresh context. Used for "review your own diff with fresh eyes" or well-specified subtasks. codex: GPT-5.5 via Codex CLI. Team plan . Mostly the per-task reviewer with mocks attached via --image. codex-sub: GPT-5.5 via Codex CLI. Team plan. Because with work I have the two accounts ... why not two ? gemini: Gemini 3.1 Pro. 1M context via gemini-cli . Ultra AI plan. For scanning a lot of files at once or extracting structure from a doc/diagram. deepseek: DeepSeek V4 Pro via opencode. Mid-difficulty coding when the spec is tight. Each one has its own config dir so the agent calls don't compete with my interactive terminal for credits. Workflow per task: I describe what I want, sometimes paste a screenshot. Orchestrator restates back to me, asks clarifying questions, only then writes code. Before commit, diff goes to a different-family agent (usually codex with the mock attached) for a four-bucket review: block / fix / nit / question. Fixes applied, commit, push. Backend commits get deployed the same turn ...agents source a scoped AWS dev account creds from a file. Memory system persists across sessions. "user prefers X / always do Y" ... so I don't have to retrain it every chat. I perform some sort of user validation and we move on to the next item. What I'm unsure about: The routing logic right now is basically "if statement in a markdown file". Claude reads it and decides who to fan a task out to. It works but it's hand-rolled. Is there already a tool out there where you just register your agents (Claude, Codex, Gemini, DeepSeek, whatever) and it figures out the delegation for you ? picks the right one for the task, handles the cost / context-size / quality tradeoff, manages parallel work? Like an "agent router" Is there a layer that sits above the individual CLIs? If that exists I'd rather use it than keep building my own. If it doesn't, fair enough, but I'd rather know now before I sink more time into the duct-tape version. Also: anyone using something better than a folder of markdown files for cross-session memory? submitted by /u/segap [link] [comments]
View originalPricing found: $179 /mo, $2,150/yr, $833
Key features include: Accelerate Time-to-Hire, Improve Quality of Hire, Scale Best Practices, Solutions, Capabilities, Resources, Company, Plans.
SeekOut is commonly used for: Talent sourcing for tech roles, Diversity hiring initiatives, Employee reskilling programs, Recruitment for remote positions, Building talent pipelines for future hiring needs, Enhancing candidate engagement through personalized outreach.
SeekOut integrates with: LinkedIn, Greenhouse, Lever, Workday, BambooHR, Slack, Zapier, Microsoft Teams, Google Workspace, Salesforce.
Based on user reviews and social mentions, the most common pain points are: cost per token.

How AI Helps Recruiters Day to Day | TA Draft Mode Clip
Mar 4, 2026
Based on 52 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.