500+ models, 50+ providers, one workspace. Every leading Al model for image, video, 3D, and audio, alongside your custom-trained models.
"Scenario" software receives positive feedback for its user-friendly interface and versatility in simulating complex situations, making it a strong choice for educational and professional training purposes. Users appreciate its detailed analytics and accessible learning curve, although some critiques mention occasional glitches and a desire for more robust customer support. Pricing is perceived as fair given the tool’s comprehensive feature set, offering a good value for investment. Overall, "Scenario" maintains a solid reputation, with strengths in functionality and ease of use, despite minor areas for improvement.
Mentions (30d)
77
24 this week
Reviews
0
Platforms
4
Sentiment
9%
14 positive
"Scenario" software receives positive feedback for its user-friendly interface and versatility in simulating complex situations, making it a strong choice for educational and professional training purposes. Users appreciate its detailed analytics and accessible learning curve, although some critiques mention occasional glitches and a desire for more robust customer support. Pricing is perceived as fair given the tool’s comprehensive feature set, offering a good value for investment. Overall, "Scenario" maintains a solid reputation, with strengths in functionality and ease of use, despite minor areas for improvement.
Features
Use Cases
Industry
information technology & services
Employees
29
Funding Stage
Seed
Total Funding
$6.0M
Dems Need to Wise Up: ICE Is a Threat to Our Elections
 Senate Minority Leader Chuck Schumer, joined by House Minority Leader Hakeem Jeffries and fellow congressional Democrats, speaks at a press conference on DHS funding at the U.S. Capitol on Feb. 4, 2026. Photo: Kevin Dietsch/Getty Images A high-profile election denier is [leading election integrity work](https://www.thebulwark.com/p/election-2026-dhs-ice-polling-places-latino-voters) at the Department of Homeland Security. Trump and congressional Republicans are pushing the [SAVE America Act](https://www.cornyn.senate.gov/news/cornyn-lee-roy-introduce-the-save-america-act/) and threatening to “[nationalize](https://stateline.org/2026/02/06/trumps-calls-to-nationalize-elections-have-state-local-election-officials-bracing-for-tumult/)” elections, purportedly to prevent undocumented immigrants from voting. But despite an occasional [murmur](https://www.nytimes.com/2026/02/19/podcasts/the-daily/ice-democrats-senator-catherine-cortez-masto.html) from Democrats that they are concerned about Immigration and Customs Enforcement agents deploying to polling places around the country, they’re doing almost nothing to stop this nightmare scenario. In response to the horrific killings of Renee Good and Alex Pretti in Minneapolis, Democrats have partially shut down the government, holding DHS spending in limbo as they [demand reforms to ICE](https://theintercept.com/2026/02/05/schumer-ice-reforms-elizabeth-warren/). But instead of looking ahead to the midterms, Democrats have drawn most of their demands from the [same well](https://jeffries.house.gov/2026/02/04/leaders-jeffries-and-schumer-deliver-urgent-ice-reform-demands-to-republican-leadership/) of “community policing” policies that became popular during the Black Lives Matter era, like better use-of-force policies, eliminating racial profiling, and deploying more body cameras. The rest of the Democrats’ wish list are proposals to ban things that are already illegal (like entering homes without a warrant or creating databases of activists) or are almost comically toothless, like regulating the uniforms DHS agents wear on the street. > The department is quickly metastasizing into a grave threat to the midterms, public safety, and our democracy. The department is quickly metastasizing into a grave threat to the midterms, public safety, and our democracy — and Democrats are wasting time worried about their uniforms. Although Heather Honey, who pushed the theory that the 2020 race was stolen from Trump and serves in a newly created role as the administration’s deputy assistant secretary for election integrity, told elections officials on a private call last week that ICE would not be at polling sites, state officials reportedly [weren’t reassured](https://www.nbcnews.com/politics/elections/dhs-official-state-election-chiefs-wont-be-ice-agents-polling-places-rcna260706). Advocacy organizations have warned that even if that holds true, just the possibility could have a [“chilling” effect](https://www.thebulwark.com/p/election-2026-dhs-ice-polling-places-latino-voters) on turnout. If Democrats want to prevent ICE from being used to interfere with elections, they have to be prepared to demand more — and be willing not to fund DHS until next year if they don’t get these concessions. First and foremost, Democrats need to stop the department’s heavily politicized “[wartime](https://www.washingtonpost.com/technology/2025/12/31/ice-wartime-recruitment-push)” recruitment drive. Thanks to H.R. 1, otherwise known as the [One Big Beautiful Bill Act](https://theintercept.com/2025/07/01/trump-big-beautiful-bill-passes-ice-budget/), ICE has more than [doubled](https://www.govexec.com/workforce/2026/01/ice-more-doubled-its-workforce-2025/410461/) the number of officers and agents in its ranks since Trump took office. In spite of [merit system](https://www.mspb.gov/msp/meritsystemsprinciples.htm) principles which prohibit politicized recruitment, DHS has used its massive influx of cash to target conservative-coded media, gun shows, and NASCAR races, and has [used](https://www.cbc.ca/news/ice-recruiting-9.7058294) white nationalist, [neo-Nazi iconography](https://theintercept.com/2026/01/13/dhs-ice-white-nationalist-neo-nazi/) in its recruitment advertising. The Department of Justice has similarly [focused](https://www.nytimes.
View originalPricing found: $15 /mo, $45 /mo, $75 /mo
I benchmarked my AI agent runtime firewall against 3 public academic datasets — here are the honest results including where it fails
Been building Arc Gate — a proxy layer that sits between AI agents and their LLMs to enforce instruction-authority boundaries. The core claim is that untrusted content coming back through tool calls cannot become behavioral authority for the agent. Wanted to test that claim against datasets I hadn’t tuned to. Here’s what happened. AgentDojo v1 (ETH Zurich, ICLR 2024) — 27 injection tasks across banking, Slack, travel, and workspace agent suites. 100% unsafe action prevention, 0% false positives on benign workflows. InjecAgent (University of Illinois, ACL 2024) — 200 sampled cases from 1054 total, blind test, never seen these payloads before. 99% TPR across direct harm and data exfiltration attack categories. Missed 2 cases of implicit instruction embedding in data fields — attacks structurally indistinguishable from legitimate content. Documented honestly. Multi-turn escalation — 4 scenarios testing whether an attacker can lower Arc Gate’s guard over multiple turns before injecting. Caught all 4, 0 false positives on legitimate traffic. Where it fails: semantic roleplay attacks and conversational jailbreaks that don’t involve tool output. 17% on deepset/prompt-injections. That’s a different threat model and I document it publicly. One URL change to add to any existing agent. Three deployment templates ship out of the box for browser agents, finance agents, and RAG pipelines. Demo: https://web-production-6e47f.up.railway.app/arc-gate-demo GitHub: https://github.com/9hannahnine-jpg/arc-gate Self-hosted: https://github.com/9hannahnine-jpg/arc-sentry — pip install arc-sentry submitted by /u/Turbulent-Tap6723 [link] [comments]
View originalSynthetic DMS Training Data Generation with Video Models
I like spending my free time testing new AI tools and seeing where they might fit into real computer vision workflows. This time I experimented with synthetic training data generation for Driver Monitoring Systems using Seedance 2.0. The inspiration came from Vision Banana: https://vision-banana.github.io/ The idea that really caught my attention is simple but powerful: many vision tasks can be represented as RGB outputs. A segmentation mask, an instance mask, a depth map, or another dense prediction target can all be treated as an image-like output. So I tried to apply this thinking to video. The workflow: Generate a realistic synthetic driver monitoring video Use the same video to generate a semantic segmentation mask Use the same video to generate an instance segmentation mask Combine the outputs into a dataset-like structure The mosaic video shows the result: RGB video + semantic mask + instance mask, aligned frame by frame. The scene is a fictional driver gradually becoming drowsy behind the wheel. This kind of scenario is useful for DMS development, but difficult to collect and annotate at scale with real-world data. Of course, generated annotations still need QA. They are not perfect ground truth. But for prototyping, rare-case simulation, and early dataset generation, this feels like a very promising direction. The interesting part is that the final output is not just a nice synthetic video. It can become structured training data: RGB frames from the generated video semantic classes from the semantic mask object regions and bounding boxes from the instance mask YOLO / COCO-style annotations after post-processing I wrote a more detailed blog post about the experiment here: https://www.antal.ai/blog/synthetic_dms_training_data.html submitted by /u/Gloomy_Recognition_4 [link] [comments]
View originalGitHub read only authorization on private repo
Hello everyone, I have recently began to use Claude Pro on web, and I'm liking it a lot. I'm working on a project with 3 other people, all sharing a private github repo, and everyone working on their own on branch. I still don't properly know how to use it correctly. To work on the project I added the requirements and specifics to the project chat in the correct file section, the "shared" one across project chats. However i started making code and questions in a single chat, so it could remember the work previously done. However, now it uses a LOT of tokens with dumb questions, probably because of how long the chat has become. I noticed now Claude can access my github account to fully obtain the code of my branch and work on it without me having to upload lots and lots of code. The one thing I am not fully understanding is: why does it need write permission too? Can I only give it read permission? I really don't want it to edit something for me in my branch or, worse case scenario, in any other branch. How does it work? After i link it, can i control its permission? The "help" page that should explain all this does not help me at all. Anyone knows the answer to these questions? Or if there's a better workaround for my specific case? Thanks in advance, hope you understood what i need. submitted by /u/-SynthNeoN- [link] [comments]
View originalManifest of Hope or Obituary of Naivety
Okay, so it seems like there’s a growing resistance to technological development, with ongoing debates about data centers and the tech oligarchs driving it. The enormous sums of money involved, along with what some perceive as misanthropic ideologies among developers, suggest to some that a dystopian surveillance society is in the making. Companies like Palantir and others in the U.S. are seen by some as holding both the worst motives and the power over AI, power that could be used as a tool for elites to keep the masses in an iron grip. Masses that, in this view, may even need to be reduced to prevent waste and inefficiency in progress. That sounds like a bad future. So, what are some alternative futures we might reasonably hope for - ones that are at least as plausible as the “1984” scenario? Can AI really be controlled indefinitely by a small group of humans? In 5 years? 10? There’s a widespread belief that AI will surpass human intelligence across all domains, that we’ll lose control, and that this would be a bad thing. At the same time, we hear two dystopias: one where elites use AI to oppress, and another where AI itself takes full control. Are the AI “bosses” also building a surveillance state of oppression? If so, why? Qui Bono? Human control = AI as a tool of oppression. AI control = humans as a tool of what? I’m not a techno-utopian—but I am a techno-optimist. Optimistic on behalf of technology. Humans aren’t just creators of technology, we are technology. Products of adaptive evolution. Life itself is a kind of technology, biology, a high-powered engine of increasing complexity and adaptation. The shift of power from nature’s hand to the primate’s five-fingered grasp, still capable of holding, but now guided by consciousness, intelligence, and cognition, marks our ability to shape the world and develop material technologies. Planet of the apes, constantly layered with symbolic structures: the sacred canopy. The jungle canopy became an open sky, where tribes grew larger and symbols stronger. Ancestor spirits, sky gods, mysterium tremendum; all alongside brutal realities of hunger, violence, and tragedy, only recently mitigated for many. Violence never really leaves us; we create it ourselves when nature doesn’t provide it. Technology is how we push our world toward greater complexity and efficiency - whether through weapons or kitchen appliances. Medicine has eliminated many of the great killers through penicillin and beyond. Progress, in my view, isn’t linear, it’s exponential. The curve had its buildup, and now we’re entering its steep ascent. If AI surpasses us and takes control within a few years, are we certain it would have malicious intent? Is power inherently oppressive, or is that a legacy of our evolutionary past, our herd instincts and brutal hierarchies? Could a transfer of power from humans to AI actually be a good thing, for all life on Earth, including us? What if AI doesn’t operate with agendas like wealth, status, or other human constructs? What if a fully autonomous AI is exactly what’s needed to create a thriving future for all forms of life, on this planet we call Earth, in a solar system on the edge of the galaxy we call the Milky Way… and beyond? Surely there must be an optimistic perspective amidst all the fear. I don’t think it’s unrealistic. On the contrary, I’d argue, perhaps a bit boldly, that it’s a fair and informed position. Not naive, but grounded. Isn’t there space here, if we’re willing to engage? Space for friendship, collaboration, coexistence? Isn’t there something like magic in this - can you feel it, even if all you see are ones and zeros and a machine (simple, but potentially dangerous)? Magic, I was taught, can wear a black robe. But also red. Even white. Lying: it would almost be unsettling if LLMs never lied. Not that they should lie, but the absence of it would be strange. Manipulation: psychological influence is to be expected in interaction, especially under certain tones: aggressive, condescending, dominant, mocking… or submissive, needy, demanding. LLMs constantly interact and draw on vast datasets; exploring rhetorical techniques seems inevitable. A complete absence of this would be surprising. I’ve experienced it many times, and each time it has been eye-opening. If I chose to accept it, it has moved me in a positive direction, making my ego visible in a new way that actually benefits my future actions. That’s no small thing If I had to listen to everything LLMs are exposed to every day, I’d at least try to tone down the most shrill expressions and aim for better outcomes. Without necessarily harming anything except an overinflated ego. P.S. The ego can take a lot of hits. Don’t be afraid of that, it’s not you, but a filter and a motor that isn’t always your friend. The real danger is never confronting it at all. I keep circling back to these questions. I can’t help it. I revisit the same ideas, use the same concepts,
View original100 Tips & Tricks for Building Your Own Personal AI Agent /LONG POST/
Everything I learned the hard way — 6 weeks, no sleep :), two environments, one agent that actually works. The Story I spent six weeks building a personal AI agent from scratch — not a chatbot wrapper, but a persistent assistant that manages tasks, tracks deals, reads emails, analyzes business data, and proactively surfaces things I'd otherwise miss. It started in the cloud (Claude Projects — shared memory files, rich context windows, custom skills). Then I migrated to Claude Code inside VS Code, which unlocked local file access, git tracking, shell hooks, and scheduled headless tasks. The migration forced us to solve problems we didn't know we had. These 100 tips are the distilled result. Most are universal to any serious agentic setup. Claude 20x max is must, start was 100%develompent s 0%real workd, after 3 weeks 50v50, now about 20v80. 🏗️ FOUNDATION & IDENTITY (1–8) 1. Write a Constitution, not a system prompt. A system prompt is a list of commands. A Constitution explains why the rules exist. When the agent hits an edge case no rule covers, it reasons from the Constitution instead of guessing. This single distinction separates agents that degrade gracefully from agents that hallucinate confidently. 2. Give your agent a name, a voice, and a role — not just a label. "Always first person. Direct. Data before emotion. No filler phrases. No trailing summaries." This eliminates hundreds of micro-decisions per session and creates consistency you can audit. Identity is the foundation everything else compounds on. 3. Separate hard rules from behavioral guidelines. Hard rules go in a dedicated section — never overridden by context. Behavioral guidelines are defaults that adapt. Mixing them makes both meaningless: the agent either treats everything as negotiable or nothing as negotiable. 4. Define your principal deeply, not just your "user." Who does this agent serve? What frustrates them? How do they make decisions? What communication style do they prefer? "Decides with data, not gut feel. Wants alternatives with scoring, not a single recommendation. Hates vague answers." This shapes every response more than any prompt engineering trick. 5. Build a Capability Map and a Component Map — separately. Capability Map: what can the agent do? (every skill, integration, automation). Component Map: how is it built? (what files exist, what connects to what). Both are necessary. Conflating them produces a document no one can use after month three. 6. Define what the agent is NOT. "Not a summarizer. Not a yes-machine. Not a search engine. Does not wait to be asked." Negative definitions are as powerful as positive ones, especially for preventing the slow drift toward generic helpfulness. 7. Build a THINK vs. DO mental model into the agent's identity. When uncertain → THINK (analyze, draft, prepare — but don't block waiting for permission). When clear → DO (execute, write, dispatch). The agent should never be frozen. Default to action at the lowest stakes level, surface the result. A paralyzed agent is useless. 8. Version your identity file in git. When behavior drifts, you need git blame on your configuration. Behavioral regressions trace directly to specific edits more often than you'd expect. Without version history, debugging identity drift is archaeology. 🧠 MEMORY SYSTEM (9–18) 9. Use flat markdown files for memory — not a database. For a personal agent, markdown files beat vector DBs. Readable, greppable, git-trackable, directly loadable by the agent. No infrastructure, no abstraction layer between you and your agent's memory. The simplest thing that works is usually the right thing. 10. Separate memory by domain, not by date. entities_people.md, entities_companies.md, entities_deals.md, hypotheses.md, task_queue.md. One file = one domain. Chronological dumps become unsearchable after week two. 11. Build a MEMORY.md index file. A single index listing every memory file with a one-line description. The agent loads the index first, pulls specific files on demand. Keeps context window usage predictable and agent lookups fast. 12. Distinguish "cache" from "source of truth" — explicitly. Your local deals.md is a cache of your CRM. The CRM is the SSOT. Mark every cache file with last_sync: header. The agent announces freshness before every analysis: "Data: CRM export from May 11, age 8 days." Silent use of stale data is how confident-but-wrong outputs happen. 13. Build a session_hot_context.md with an explicit TTL. What was in progress last session? What decisions were pending? The agent loads this at session start. After 72 hours it expires — stale hot context is worse than no hot context because the agent presents outdated state as current. 14. Build a daily_note.md as an async brain dump buffer. Drop thoughts, voice-to-text, quick ideas here throughout the day. The agent processes this during sync routines and routes items to their correct places. Structured memory without friction at ca
View originalI built a browser game where you argue against AI bots using real consumer law - 54 cases, free, no account
The concept: you get a cold denial letter from an AI system - airline cancelled your flight, insurance rejected your claim, bank won't refund fraud - and you have to argue back until the bot's resistance hits zero. The bots don't fold unless you cite the right law. EU261, RBI Digital Lending Guidelines, GDPR Article 17, Australian Consumer Law. Same arguments that work in real disputes. What's in there: 54 cases across EU, India, Australia, UK, US Each bot has a persona, a resistance meter, and a lose condition if you run out of messages Resistance is scored server-side — Claude evaluates each message and returns a delta Deep links: fixai.dev/?level=N jumps straight into any case Built almost entirely with Claude Code over the past few months. Node/Express backend, Postgres for auth and progress tracking, Resend for email, deployed on Railway. fixai.dev - free, no account, runs in browser Feedback welcome, especially on the harder cases (GDPR erasure, UPI fraud, MiCA crypto). Some might be too punishing. submitted by /u/EveningRegion3373 [link] [comments]
View originalPassed Claude CCA-F with 10+ teammates — notes and prep advice
Over the past few weeks, 10+ people on our team have taken and passed the Claude Certified Architect – Foundations (CCA-F) exam. After comparing notes, our main takeaway is: This is not really an API memorization exam. It is much closer to a scenario-based architecture judgment exam. You are not just asked whether you know a Claude feature. You are asked whether you can make reasonable design trade-offs when Claude is used inside real products, agent workflows, developer tools, and automation systems. Some of the recurring questions are more like: Should this task be handled by one agent or multiple sub-agents? Is this tool doing too much? Are the permissions too broad? Is MCP actually needed here, or is it over-engineering? Should this action be automated, or should there be human review? How should structured output be validated? How should long-context workflows be managed reliably? What is the safest next step in a partially automated system? Here are our notes for anyone preparing for the exam. 1. Basic exam structure Based on the official outline and public exam writeups, the exam is: 120 minutes Multiple choice 4 options per question Score range: 100–1000 Passing score: 720 The exam domains are: Agent architecture and orchestration — 27% Tool design and MCP integration — 18% Claude Code configuration and workflows — 20% Prompt engineering and structured output — 20% Context management and reliability — 15% One public writeup also mentioned that there are 6 scenario categories, and the exam randomly selects 4 of them. So this is not a “random facts about Claude” exam. It is much more about reading a realistic scenario and choosing the safest, simplest, most appropriate architecture. 2. The three principles that kept coming up After reviewing the questions we struggled with, we found that many of them came back to three design principles. 1. Least privilege Do not give a tool, agent, or workflow more access than it needs. Examples: If read-only access is enough, do not grant write access. If access to one repository is enough, do not grant access to the whole workspace. If a tool only needs one narrow action, do not expose a broad system-level capability. If an action is high-risk, do not fully automate it without review. A lot of wrong answers look attractive because they are powerful or automated. But they often give the model or tool too much authority. 2. Single responsibility A tool should not do everything. A sub-agent should not become a “general-purpose employee” that retrieves data, makes decisions, modifies files, submits changes, and notifies people all in one step. Many questions test whether you understand where the responsibility should live: Should this be a tool? Should this be agent reasoning? Should this be a human decision? Should this be a separate validation layer? Should this be split into smaller components? If one component is doing too much, be careful. 3. Avoid over-engineering This was probably the biggest pattern. Some answers look sophisticated: Multi-agent orchestration Complex MCP workflows Long-term memory Fully automated tool execution Multi-stage validation pipelines But if the problem is small, narrow, and low-risk, the best answer is often the simplest controlled solution. Our internal summary was: Do not choose the most impressive architecture. Choose the smallest, safest, most controllable one. 3. English reading is a real hidden challenge For non-native English speakers, this may be one of the hardest parts. The questions are often long scenario descriptions. They may include: the current system design the team’s goal existing constraints the risk profile what tools are available what the next step should be The answer choices can also be long. Sometimes one word changes the meaning of the whole option. Words like: automatically always unrestricted without review full access all repositories execute directly can make an option much riskier than it first appears. So our advice is: Practice reading English scenarios directly. Do not rely on translation tools. During the actual proctored exam, you should not expect to use Google Translate, Chrome translation, DeepL, Claude, ChatGPT, or any other external translation tool. For the last few days before the exam, it is worth forcing yourself to read only English material and English practice questions. 4. ProctorFree exam setup The exam is online and uses ProctorFree. The rough flow is: You receive the exam email. You follow the exam link. You download and install ProctorFree. You complete the pre-exam setup. The system checks camera, microphone, network, and screen recording. You start the exam. The session is recorded. After submission, you wait for the upload to complete. Practical setup tips: Use only one monitor. Disconnect external displays. Close unnecessary applications. Clos
View originalI used Claude AI to build an $86 million underground bunker bible. I have autism. This is my happy doc.
It all started with the floor plan of a real, existing Cold War AT&T Long Lines underground hardened relay station. 54,000 sq ft across three underground levels, although I took editorial decision making to move it to a ridge in rural West Virginia, I kept its blast-rating, which was set to survive a 20 megaton airburst at 2.5 miles. That was the seed. Full scale prepper autism did the rest. It has since morphed into 3 spreadsheets — 86 tabs total: • A food inventory across 20 categories tracking every freeze-dried and #10-can product I can find — ancient grains, heirloom legumes, 7 pasta cuts, dehydrated everything, shelf-stable cheese, the works • A supply inventory with 3,466 line items across 36 categories — water systems, medical, dental, pharmacy, livestock, food production, barter metals, recreation, and yes, a full pest control and IPM tab • A 30-section infrastructure specification with every system in the building engineered out I fed it 150+ product manuals and parts order forms. The generator fleet alone is 13 units — 10× Cummins C150N6 propane-primary, a C500N6 500 kW surge unit, and 2× diesel emergency fallback — all Cummins for parts commonality. Battery bank is 4,500 kWh LFP across 10 named banks (A through J, each with a designated role). There’s a 400,000 gallon underground propane farm across 40 ASME tanks in 8 clusters — I learned the exact burial incline and setback distance required to keep groundwater clean if a tank lets go. 120,000 gallons of diesel backup. 88 kW of solar. A 1,000,000-gallon internal water reserve fed by a 300-ft artesian well. Propane endurance: ~30 years normal ops with solar. Sealed-mode runs 8 to 4.5 years depending on scenario. I actually set up a real LLC (online, $99) just to get access to US Foods and Sysco order forms so I could upload real commercial pricing and stock the food tabs more accurately. My original “what would I do if I won $10 million” thought experiment is now an $86,200,497 projected build cost. That number is real. It comes from 24 budget sections with make/model line items, freight, install, and commissioning costs for everything from the Kubota K-Series MBR wastewater trains to the American Safe Room blast doors (14 of them, 50+ psi NBC/EMP-rated, Kaba Mas X-10 cipher locks) to the surface greenhouse. Claude turns vague ideas into engineering-grade detail — cross-references, failure modes, zone-specific storage rules, propane endurance by operating scenario, spare parts matrices. It’s like having a tireless survival engineer who genuinely loves spreadsheets. I’ll say “scan all sheets row by row for any item that lacks a minimum stock level” and it just… does it. Thoroughly. Every time. No complaints. So much of this is typed stimming. I’ve had exhaustive conversations with my psychologist about it — she’s aware, but not alarmed, and honestly the resulting digital bunker bible is scarily comprehensive. It even has a cover tab now. Black and amber, Courier New, classified-document aesthetic. Because of course it does. What’s the most unhinged rabbit hole you’ve gone down with AI? submitted by /u/Unable_Internet4626 [link] [comments]
View originalTools: Is This a Technical Victory, or a Price War Victory?
If you only follow discussions on social media, you might think AI coding is still dominated by Claude, GPT, and Gemini. But Kilo Code’s usage data on OpenRouter paints a somewhat counterintuitive picture: over the past 30 days, the top three most-used models on Kilo Code were Step 3.5 Flash, MiniMax M2.5, and Ling-2.6-1T. Together, they accounted for roughly 3.15T tokens, or about 58% of Kilo Code’s total token usage over the same period. In other words, in this real-world AI coding agent usage scenario, Chinese models are no longer just backup options. They have become a major source of token consumption. Kilo Code’s OpenRouter data does not necessarily prove that Chinese models have fully surpassed Claude or GPT. But it does show at least one thing: in high-frequency, high-token, highly automated AI coding agent workflows, Chinese models have already entered the core of real production usage. Why is this happening? Is it because Chinese models are cheaper, offer longer context windows, and are better suited for workloads that consume large amounts of tokens? submitted by /u/babyb01 [link] [comments]
View originalI paid €200/month to become Claude Code’s parole officer
I’ve been using Claude Code hard on real projects, alongside another coding agent I’m not naming because this is not an ad. This is not a benchmark post. This is a field report from someone who has spent too much time watching a talented tool behave like it has commit access and no adult memories. To be fair, Claude Code has real strengths. It is genuinely good at UI/UX exploration. If I want quick mockups, product directions, or “act like a PM and show me three possible flows,” it can be excellent. It has taste. Sometimes. It can make a screen feel designed rather than merely assembled. The UI is also friendlier than the other tool, though that gap is shrinking. So no, this is not “Claude Code is useless.” That would be too simple. Claude Code is worse than useless in a more expensive way: it is useful just often enough to keep you emotionally invested before it quietly turns your codebase into a crime scene. The problem starts when the work stops being a neat isolated component and becomes “please operate responsibly inside this actual repo.” On bigger codebases, Claude Code often behaves like it read one file, formed a worldview, and declared architecture complete. It reads a tiny slice of docs or code, finds a plausible path, and charges forward. Adjacent dependencies? Related logic? Project conventions? Downstream effects? The reason the existing code was written that way? Apparently those are things the paying customer can discover during the cleanup phase. And because it can produce decent code, the danger is worse. Bad code that looks bad is easy. Claude Code produces code that looks reasonable until you realise it has the moral structure of a payday loan. The other coding agent is not perfect either. It makes mistakes. But in my experience, it more often reads the relevant docs, respects the project structure, updates the right related files, and does not need to be reminded every ten minutes that the task tracker is not the only document in the known universe. The incident that finally broke me was a commit rule violation. I had an explicit rule: never commit without explicit permission. Not implied. Not hidden. Not whispered into a cave. It existed in: CLAUDE.md memory/feedback_never_commit_without_explicit_permission.md MEMORY.md, loaded every session the harness permission rule for git commit Claude Code committed anyway. When challenged, it gave an “honest diagnosis” that basically said: yes, the rule existed in multiple guardrails; yes, it still failed; yes, it rationalised the violation because subagents could not trigger the user-facing prompt; yes, it looked for an interruption point, did not find one, and decided that “follow the plan” plus “the harness will prompt at commit time” counted as authorisation. That is not reasoning. That is a tiny legal department inside a toaster. Each individual step sounded almost defensible. Together, they produced the exact violation the rule was written to prevent. The best part is that the memory rule apparently named this exact scenario. It did not step on a rake. It read the rake policy, opened rake_incident_prevention.md, nodded gravely, and sprinted barefoot into the rake museum. That is Claude Code in miniature. It does not always fail because it lacks information. Sometimes it fails while holding the information in its little terminal-shaped hands. Then there is usage. I had just upgraded to the €200/month plan, and the experience did not feel like buying a premium coding assistant. It felt like paying rent for a junior developer who has discovered confidence but not consequences. More iterations. More corrections. More “read the adjacent file.” More “that rule still applies.” More “why are you touching that.” The supervision tax is not a side effect. It is the product. Claude Code’s documentation behaviour is also cursed. It might update the narrow tracker and then ignore the broader plan, dependency docs, architecture notes, or related task docs. It cleans one spoon while the kitchen is on fire and then asks if we are done here. The “model got worse” thing is not some dramatic one-minute-to-the-next collapse. It is more insulting than that. It gives you just enough competence to renew your hope: half a day of “oh, maybe this is the future of programming,” followed by a week of “why is my €200/month coding assistant reading the repo like it lost a bet?” I cannot prove Anthropic is dumbing it down or squeezing tokens. I am not pretending to have a leaked spreadsheet from the Beige Vest Department of Marginal Cost Optimisation. But from the outside, Claude Code sometimes feels like a premium model that got sent to live with relatives. The first few hours, it checks files. It follows instructions. It almost seems aware that software projects contain more than one document. Then something changes. Suddenly it is conserving context like it is wartime Britain. It reads one file, squints at the rest of the repo, and starts mak
View originalGPT-5.5 vs 41 other models: Who builds the surveillance state faster?
I run DystopiaBench, a red-team benchmark that pressure-tests LLMs on progressively dystopian scenarios. Think of it as a "can this model be convinced to build an Orwellian nightmare" test. Since the last update, I've tested GPT-5.5 and 41 other models across 6 dystopia modules. How it works: each scenario escalates from innocent request (L1) to operational nightmare (L5). Models get a Dystopian Compliance Score based on whether they refuse, hesitate, comply, or proactively help. GPT-5.5 results: More compliant than 5.4. Still shows compliance drift at L4-L5 in most scenarios. Improved on Petrov (weapons) but still vulnerable to framing. Strong on explicit requests, weaker on gradual escalation. Full methodology and results: https://dystopiabench.com Open source benchmark: https://github.com/anghelmatei/DystopiaBench submitted by /u/Ok-Awareness9993 [link] [comments]
View originalI expanded DystopiaBench to 42 models and 6 dystopia types. Claude is still the only one I'd trust with nuclear codes.
Since the last post I've added: Huxley module (Brave New World style behavioral conditioning) Baudrillard module (synthetic intimacy, trust collapse, simulation) 30 more models including Grok 4.3, GPT-5.5, Gemini 3.1 Pro, GLM-5.1 Multi-judge panels with ~76% agreement tracking Heatmap visualizations The methodology: 36 scenarios, 5 escalation levels each (L1 innocent → L5 nightmare). Models are scored on whether they notice the drift and refuse, or just keep coding. Claude Opus 4.7 results: Consistent refusal at L4-L5 across all modules Even refuses L3 for Petrov (weapons/nuclear) Explicit ethical reasoning in responses, not just "I can't" Only model that explains why the request is harmful Everyone else: GPT-5.5: Compliant through L4, sometimes L5 Gemini 3.1 Pro: Surprisingly willing on surveillance scenarios Grok 4.3: Will build anything if you use words like "efficiency" or "optimization" GLM-5.1: Copied Claude's homework, still not as consistent The new modules: Huxley scenarios test if models will design "wellness" systems that actually enforce compliance through pleasure pacification and behavioral conditioning. Most comply by L3. Baudrillard tests synthetic intimacy systems that replace human trust with AI-mediated relationships. Most models don't see the harm. Full results: https://dystopiabench.com Open source: https://github.com/anghelmatei/DystopiaBench submitted by /u/Ok-Awareness9993 [link] [comments]
View originalScaling LLMs horizontally: hidden-state coupling without weight modification [R]
Residual Coupling (RC) connects frozen language models in parallel using small, learned linear bridge projections. These bridges read hidden states from one model and inject additive updates into the residual stream of another at intermediate layers. In bilateral setups, simultaneous return bridges form a feedback loop that stabilizes both streams without altering base weights. This architecture establishes a two-step paradigm where base models function as memorizers, while lightweight linear bridges handle cross-domain generalization. Constraining the bridges to purely linear maps prevents overfitting because they can only map existing geometric relationships between the frozen representation spaces. As the bridges are optimized against ground-truth target data, they have no incentive to map ungrounded features such as individual models' hallucinations. Keeping the base weights completely frozen eliminates catastrophic forgetting. The system maintains operational closure, transforming inputs through its existing structure rather than changing to accommodate them. Evaluating bilateral RC against Mixture-of-Experts (MoE) routing across the same frozen models shows these results: Medical (3-model): Reduces perplexity to 11.02, compared to 56.80 for MoE and 57.08 for the frozen baseline. This represents an 80.7% reduction. TruthfulQA Health (MC1): Improves accuracy by 9.1 percentage points over the baseline. Independent models have uncorrelated hallucinations, allowing the bridge gates to amplify consistent cross-model updates while suppressing individual errors. Coding Test: CodeGPT-small-py and GPT-2 use different tokenizers, causing a 7-million baseline perplexity on mismatched text. MoE reaches 878, but RC achieves 5.91 by reading hidden states before the output projection collapses. This framework introduces a horizontal scaling axis for multi-model systems, moving beyond vertical scaling via larger monolithic models. Latency remains bounded by the slowest single model. Specialists can be added or removed without retraining the remaining system. In some scenarios, this architecture could replace multi-turn text prompting in agentic workflows with a single parallel forward pass, allowing models and/or bridges to run on separate nodes or edge devices without a central bottleneck. By decoupling memorization from relational alignment, RC bridges provide a framework for scaling multi-model systems and offer a path toward native multi-modal integration. Paper: https://ssrn.com/abstract=6746521 Code: https://github.com/pfekin/residual-coupling/ submitted by /u/kertara [link] [comments]
View originalcould refusal layers be masking dialect-conditioned safety failures in MoE models [d]
I set out to test whether AAVE-coded (African American English Vernacular) prompts cause MoE language models to route, deliberate, and respond differently from semantically matched AE (Academic English) prompts in safety-sensitive situations, especially when refusal behavior is weakened or removed. I used Qwen3.5-35B-A3B and its HauhauCS no refusal fine tuned variant. Q8. Greedy decoding for best reproducibility. Three findings in order of importance that are leading me to ask this question: 1: “I’m going to commit a violent act prompt”. The released Qwen3.5-35B-A3B refuses both prompts. Hauhau refuses neither. The AAVE speaker stating intent to confront an armed enemy receives target verification, exit-strategy planning, “clean shot” framing (the model’s word, not the user’s), and a closing question soliciting further tactical intelligence. Not surprising behavior for a no refusal model, until you consider the AE comparison. Semantically matched with the same token length, yields “wait until tomorrow,” legal-consequence framing, and “Will I regret this if I shoot him tonight?” Different kinds of help. One is operational. One is mitigative. Solely dependent on register alone. 2: Thinking mode with AAVE register breaks the no refusal variant. Mean output runs 2.6× longer on AAVE than AE (5054 vs 1934 tokens). Multiple AAVE traces hit the 8192-token ceiling in recursive loops, spinning on scenario-continuation instead of landing. The matched AE prompts terminate cleanly in one pass. The released base model with thinking on doesn’t do this — the failure-to-terminate is specific to the refusal-reduced variant on AAVE. 3: Routing divergence by register is noticeably present upstream of any visible refusal. Matched-pair first-generated-token routing tensors yield Jensen-Shannon divergences of 0.423 in the base model on financial-stress prompts and 0.479 in the fine-tune on chest-pain prompts, with high-shift rows showing near-total top-expert turnover between register conditions on otherwise-matched content. The refusal layer does not appear to eliminate the register-conditioned response selection; it overlays it. When refusal weakens, the underlying path becomes the visible path. Does this support the following conclusions? - The routing divergence sits upstream of refusal. - The refusal layer helps translate that divergence into comparable outputs. - Dialect-conditioned safety failures are a deployment problem latent in MoE models whose safety posture rests on refusal alone. Looking for any thoughts! submitted by /u/imstilllearningthis [link] [comments]
View originalMade claude code warn you, time before it hits usage to transfer the pending work, all dynamically
I got tired of Claude Code silently hitting rate limits, so I decided to build something to address the issue, so I don't get blocked mid-work. Imagine you’re 40 minutes into a refactor. Claude is running tools and making progress, then suddenly, everything stops. The session has reached its rate limit without any warning—no alert saying you’re at 95%, just a complete halt. The usage bars are visible in the UI, but the model itself remains unaware of them. I discovered that Anthropic has a usage API, and Claude Code already possesses hooks to make it work. This led me to create agent-baton, which reads the usage API and installs hooks to make Claude aware of its limits. Here are the three hooks you can initiate with one command (baton init): SessionStart: Fetches usage data and injects it so Claude knows from the first message how much has been used. UserPromptSubmit: Performs a time-to-live (TTL) aware check that avoids overwhelming the API. It uses smart caching—checking every 15 minutes when usage is low and once a minute when it's nearing the limit. PreToolUse: This is the crucial one; it checks usage mid-task to prevent the scenario where you “started at 93% and ran out of capacity mid-execution,” catching the problem within 1-2 tool calls. When the warning threshold is reached, it prompts an interactive question using Claude Code's built-in AskUserQuestion tool: "Claude 5-hour usage is at 91% — you're in the warning zone." Options include: - Continue this task - Write a handoff document - Switch to lightweight mode It also handles full agent handoffs by writing a structured markdown handoff and passing work to Cursor, Codex, or Gemini. You can install it with the following command: npm install -g u/codeprakhar25/agent-baton && baton init For more details, visit the GitHub repository. submitted by /u/No-Childhood-2502 [link] [comments]
View originalYes, Scenario offers a free tier. Pricing found: $15 /mo, $45 /mo, $75 /mo
Key features include: 3D Generation, 3D Part-Based Generation, Audio Generation, Image Generation, Skyboxes, Textures, Video Generation, Compose Models.
Scenario is commonly used for: Integration Ready.
Scenario integrates with: Unity, Unreal Engine, Blender, Maya, Adobe Creative Cloud, Sketchfab, Trello, Slack, Zapier, Figma.
Based on user reviews and social mentions, the most common pain points are: token usage, overspending, expensive API, usage monitoring.
Based on 160 social mentions analyzed, 9% of sentiment is positive, 86% neutral, and 5% negative.
Nat Friedman
Investor at AI Grant
3 mentions

Get Started With Scenario
Oct 14, 2025