Give your marketing, sales, and service teams what they need to have more meaningful conversations with buyers online, increase pipeline, and grow rev
Users generally appreciate Drift for its robust conversational marketing features and user-friendly interface. However, some reviews express concerns about its reliability and consistency, suggesting room for improvement in these areas. Sentiment around Drift's pricing is mixed, with some users finding it reasonable while others consider it on the higher side. Overall, Drift maintains a strong reputation as a tool for enhancing customer engagement and lead conversion.
Mentions (30d)
33
17 this week
Avg Rating
4.3
20 reviews
Platforms
5
Sentiment
1%
1 positive
Users generally appreciate Drift for its robust conversational marketing features and user-friendly interface. However, some reviews express concerns about its reliability and consistency, suggesting room for improvement in these areas. Sentiment around Drift's pricing is mixed, with some users finding it reasonable while others consider it on the higher side. Overall, Drift maintains a strong reputation as a tool for enhancing customer engagement and lead conversion.
Features
Use Cases
Industry
information technology & services
Employees
880
Funding Stage
Merger / Acquisition
Total Funding
$326.1M
Iranian death toll rises to 550+; Israel threatens invasion of Lebanon after Hezbollah strikes; 3 U.S. warplanes down in Kuwait
[](https://substackcdn.com/image/fetch/$s_!opWu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e96de2b-fd20-4975-87db-5aaaa710e336_1456x609.webp) *Heavy bombing of Iran as war enters its third day. Strikes kill major Iranian leaders. Mass civilian casualties as Israel and the U.S. strike in the center of Tehran. Medical facilities in Tehran and Ahvaz damaged, and Iran says a nuclear facility was attacked. U.S. strikes across Iran include attack drones for the first time. Iran retaliates with major offensive against Israel and U.S. bases and military sites across the Gulf, and in Cyprus. Four U.S. service members killed. Three U.S. fighter jets shot down; Iran claims it downed at least one, while U.S. says it was friendly fire. At least ten killed at demonstration at U.S. consulate in Karachi. Oil facilities attacked, prices soar. President Donald Trump’s estimate of the length of the war shifts from “days” to “weeks.” Top Iranian officials signal Iran’s willingness to fight, defend retaliation. U.S. and Israel burning through munitions. China backs Iran’s self-defense.* *Israel pounds Lebanon, killing 31, after Hezbollah fires rockets.Lebanon’s prime minister demands ban on Hezbollah operations.* *Israel uses Iran war as a pretext to halt already limited aid to Gaza. Israel blocks movement in the West Bank.* *Congress to vote on War Powers Resolution. Sen. Tim Kaine, on the Senate Foreign Relations Committee, says no imminent threat justified war with Iran. Rashida Tlaib, AOC denounce U.S.–Israeli strikes and call for Congress to act. First anti-war ad of the midterm election cycle. Dark money–funded think tanks pushed regime change. Sen. Bernie Sanders unveils billionaire tax.* *169 killed in attacks in South Sudan. Afghanistan says it fired on Pakistani jets as border fighting intensifies. Russian tanker bound for Cuba is drifting in the North Atlantic. Argentine Senate approves Javier Milei’s anti-labor reform.* **In case you missed it, Drop Site’s weekend [coverage of the Iran war](https://www.dropsitenews.com/t/iran):** * **[Iran Prepared for an Existential War. How Much Are Trump and Israel Willing to Gamble?](https://www.dropsitenews.com/p/iran-war-trump-israel-khamenei-assassination-retaliation-gulf-states)** * **[After a Sports Hall in Iran Was Bombed, Witnesses Describe Chaos and “Continuous Screaming”](https://www.dropsitenews.com/p/iran-lamerd-sports-hall-teenage-girls-killed-us-israel-war)** * **[“Small Children Who Knew Nothing of Politics or Wars”](https://www.dropsitenews.com/p/iran-minab-elementary-girls-school-bombing-schoolgirls-killed-us-israel-war)** * **[As Trump Launches “Massive” Regime Change War, Iran Strikes Back at U.S. Bases and Vows Not to Capitulate](https://www.dropsitenews.com/p/trump-launches-regime-change-war-iran-vows-strike-back-israel-gulf-bases)** **This is Drop Site Daily, our free daily news recap.** We send it Monday through Friday. [Subscribe now](https://www.dropsitenews.com/subscribe?) [](https://substackcdn.com/image/fetch/$s_!_rBH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05dd801a-41e2-4471-a2a0-5f6a359c3d86_5577x3335.jpeg) A general view of Tehran with smoke visible in the distance after explosions were reported in the city, on March 02, 2026 in Tehran, Iran. Photo by Contributor/Getty Images. War on Iran =========== * **Heavy bombing of Iran as war enters its third day:** Multiple airstrikes hit Tehran on Monday as the U.S.-Israeli war on Iran expands. Tehran’s streets have been largely deserted with people sheltering during airstrikes. On Sunday, the Israeli military launched a new wave of attacks targeting what it [described](https://x.com/DropSiteNews/status/2028116934479270061) as the “heart of Tehran,” with the Associated Press reporting a major explosion near a police headquarters, a state television building, the Revolutionary Court, and a Defense Ministry building. Al Jazeera said an army hospital and other government sites were also struck. Also on Sunday, Mehr News Agency [reported](https://x.com/DropSiteNews/status/2028291537692414403) that 20 people were killed in a strike on Niloufar Square, a densely populated residential and commercial area in Tehran’s District 7. * **Strikes kill maj
View originalg2
What do you like best about Drift?Drift is a very good way to get new leads as a sales person. Targeted lead generation with better than average conversion. Does have seamless integration with calendar and custom guardrails that can ebs et according to each users schedule Review collected by and hosted on G2.com.What do you dislike about Drift?It lags connection with Salesforce/ not entirely successful. Review collected by and hosted on G2.com.
What do you like best about Drift?What I appreciate most about Drift is its ability to transform website chats into immediate sales opportunities. The platform efficiently routes complex customer inquiries to the appropriate representative, allows for instant meeting scheduling, and integrates smoothly with marketing tools such as HubSpot, Salesforce, and Adobe Marketo. Drift is especially well-suited for B2B SaaS companies aiming to accelerate their sales pipeline. Review collected by and hosted on G2.com.What do you dislike about Drift?Drift tends to be slower and consumes heavy memory, and I find the pricing structure to be somewhat unclear. The user interface is rather plain, lacking any standout visual elements. Additionally, the cost is quite high, making it more appropriate for enterprise-level teams. It's also harder to implement and slow customer support. Review collected by and hosted on G2.com.
What do you like best about Drift?The chatbot for asking information from the lead Review collected by and hosted on G2.com.What do you dislike about Drift?We have some bugs that are going to be fixed Review collected by and hosted on G2.com.
What do you like best about Drift?We used the Drift chatbot product for our website and it worked well. Review collected by and hosted on G2.com.What do you dislike about Drift?Once Salesloft acquired Drift the customer service went down significantly. They also had a major data breach that impacted the service for 10 days in August https://www.upguard.com/blog/salesloft-drift-breach. We tried to cancel the renewal, but people from Salesloft kept calling me for payment. Then, out of the blue, I received an email that payment had been processed to Salesloft on my Amex card. They had someone processed the payment using my old card # that had expired last year. Review collected by and hosted on G2.com.
What do you like best about Drift?Helps me communicate in timely manner with pros Review collected by and hosted on G2.com.What do you dislike about Drift?nothing i can think of so far , great so far Review collected by and hosted on G2.com.
What do you like best about Drift?I like that we're able to see what our customers are looking at. Review collected by and hosted on G2.com.What do you dislike about Drift?There is a lag of about 4 minutes to connect to a sales rep. Review collected by and hosted on G2.com.
What do you like best about Drift?It helps me set meetings and track prospects. Review collected by and hosted on G2.com.What do you dislike about Drift?The notification system could be better. Review collected by and hosted on G2.com.
What do you like best about Drift?I think drift is very helpful seeing the activity of who is on the website, especially by location. Helps to prioritize accounts with most page interactions and identify HQ locations. Review collected by and hosted on G2.com.What do you dislike about Drift?I dislike the filtering system. It is hard to exclude and include specific page views or audiences. Often times the filters don't work. Review collected by and hosted on G2.com.
What do you like best about Drift?Seeing that a prospect is using our website. Review collected by and hosted on G2.com.What do you dislike about Drift?I want to get alerts when prospects are on the website. Review collected by and hosted on G2.com.
What do you like best about Drift?Very User friendly and I love the AI feature Review collected by and hosted on G2.com.What do you dislike about Drift?I don't like how it automatic adds request to the calendar Review collected by and hosted on G2.com.
If you're NOT having usage or drift issues, have you turned off auto-memory?
There's a running debate in this community: some people say Opus is nerfed, usage evaporates after two prompts, sessions drift and get "stupid." Others say everything's fine. The common theory is Anthropic is A/B testing or ranking preferred customers. I think there's a simpler explanation, and I'd like the community's help testing it. The hidden variable: Claude Code's auto-memory directory Claude Code has a feature (on by default since v2.1.59) that silently creates individual .md files in ~/.claude/projects/*/memory/ every time it decides something is worth remembering about you or your project. Each memory gets its own file. There's no consolidation, no dedup, and no size management. These files load as instructions at the start of every session. Not as conversation — as instructions. The model weighs them heavily. What I found in my projects I audited every project on my machine: 136 memory files across 18 projects 432KB total (~108-140K tokens of instruction overhead) One project alone had 41 files Found direct contradictions between files — one file listed brand terms as approved, another (written later) said those same terms were explicitly rejected by the client When you have 20+ feedback files giving slightly different guidance about how to approach your work, the model tries to honor all of them simultaneously. It averages across conflicting signals. That averaging is what people experience as drift. It's not that Opus got dumber — it's that it's being pulled in 20 directions by its own instruction set. Check yours right now for dir in ~/.claude/projects/*/memory/; do if [ -d "$dir" ]; then project=$(basename "$(dirname "$dir")") count=$(find "$dir" -name "*.md" 2>/dev/null | wc -l | tr -d ' ') size=$(find "$dir" -name "*.md" -exec cat {} + 2>/dev/null | wc -c | tr -d ' ') if [ "$count" -gt 0 ]; then echo "$count files, $(($size/1024))KB — $project" fi fi done | sort -t, -k1 -rn The question for this community People who say they have NO issues with usage limits or drift — have you also turned off auto-memory ("autoMemoryEnabled": false in settings), or do you actively manage your memory files? Because if there's a strong correlation between clean/disabled memory and good session quality, that's a signal that this is a real contributing factor. And for people who ARE hitting usage walls or experiencing drift — run that diagnostic. If you're sitting on 30+ memory files with contradictions you didn't know about, that's worth knowing. I'm not claiming this explains everything. Model changes, server-side factors, plan differences — those are all real variables. But memory hygiene is the one variable you can actually control, and I don't see anyone talking about it. The fix I built a Claude Code skill (/memory-cleanup) that: Audits your memory directory and reports what's there Consolidates everything into 2 managed files (MEMORY.md + feedback.md) Surfaces contradictions for your review Installs write-mode instructions that prevent re-bloating Yes, it works retroactively as well. Tested on a 7-file project and a 41-file project — both cleaned up, contradictions resolved, no data loss. To install (one command): mkdir -p ~/.claude/commands && curl -sL https://gist.github.com/evanvandyke/a7063a8e5c838673a55df0be10f4892c/raw -o ~/.claude/commands/memory-cleanup.md Then run /memory-cleanup in any project. What this doesn't fix This manages the content quality of your memory files — contradictions, redundancy, bloat. It can't change the system-level instructions that Anthropic bakes into Claude Code, and it can't address model-level changes or server-side throttling. But it removes one real source of noise from your sessions. Note: Anthropic has added an "Auto Dream" consolidation feature that prunes memory between sessions. This skill goes further — it restructures memory into a managed 2-file system with write-mode guardrails that prevent the accumulation pattern from recurring. Built collaboratively with Claude (Opus 4.7). I drove the diagnosis and design decisions; Claude did the auditing and skill construction. Sharing because the diagnostic is free and takes 10 seconds — if it helps even a few people, worth the post. submitted by /u/really_evan [link] [comments]
View originalHad a close call with AI hallucinations. 6 months after shifting my workflow to Claude, here is my engineering breakdown.
Six months ago, an LLM almost cost me a major B2B client. It generated a technical answer that sounded flawless and 100% confident, but it completely messed up a decimal point on a critical equipment specification. The client was an engineer. He spotted it instantly. That was a brutal wake-up call. Since then, I stopped using AI as a casual chatbot for client-facing stuff and moved our internal workflow to Claude. Here is my honest, practical breakdown after 6 months of daily use in a technical firm. 1. It actually stops when it doesn't know Most models are trained to be "helpful" at all costs, meaning they prefer to lie and hallucinate a parameter rather than admit they lack data. Claude is different. When it hits a gap in the spec sheets I provide, it actually stops and says it can't find it in the source. In engineering compliance, a dry "I don't know" is worth infinitely more than a confident lie. 2. Context isolation using Projects Repeating your guidelines and templates in every new chat is a massive waste of time and tokens. It also leads to memory drift. I started putting our master templates, product boundaries, and strict formatting rules into Claude Projects using basic XML tags (like and ). It keeps the data isolated and ensures the model actually remembers the constraints even in long, complex sessions. 3. Prototyping tools via Artifacts We frequently need quick math tools for client presentations—things like custom ROI calculators based on our machine data. I asked Claude to build one, and it generated a working, self-contained HTML/JS file via Artifacts in about 20 minutes. No local dev environment setup needed, just straightforward logic that worked out of the box. The takeaway: For me, it wasn’t about chasing benchmark scores. It was about finding a model that can actually follow strict negative constraints (what not to do) when stakes are high. Anyone else using Claude specifically for technical auditing or compliance? How are you catching errors before they reach clients? submitted by /u/J-Freedom-AI [link] [comments]
View originalDiscourse regimes as the unit of alignment behavior: a hypothesis
I've been working on a hypothesis about how alignment behavior in LLMs may be organized at the level of latent discourse regimes rather than output-level filtering. Below is a sketch of the conceptual framing. I have preliminary experimental results testing aspects of this hypothesis on open-weight models, which I'll publish separately — this post is focused on the conceptual side, and I'm interested in feedback on whether the framing tracks something real and where it's most vulnerable. Modern large language models may not primarily regulate behavior through isolated refusals, local token suppression, or shallow instruction following. Instead, they appear capable of entering internally organized discourse-level regimes: distributed latent states that shape how the model reasons, frames conclusions, allocates caution, tolerates asymmetry, performs neutrality, and structures epistemic authority. These regimes do not behave like simple lexical priming effects. Evidence suggests that they persist across neutral conversational turns, survive arbitrary neutral relabeling, systematically alter downstream reasoning style, concentrate in late-layer representation geometry, and only partially depend on explicit alignment vocabulary. The strongest effects appear not from safety keywords themselves, but from higher-order rhetorical topology: pressure cadence, procedural framing, asymmetry structure, institutional tone, and discourse-level authority signals. This suggests that prompting is not merely instruction transmission. It may function as state induction. Under this view, many apparently separate phenomena in aligned LLMs - caution drift, procedural overreach, sycophancy, disclaimer inflation, neutrality performance, refusal persistence, jailbreak sensitivity, and style locking - may be manifestations of transitions between latent discourse-policy manifolds. In this picture, alignment is no longer well-described as a modular wrapper placed on top of an otherwise independent intelligence system. Instead, alignment may reshape the topology of the model's representational space itself, globally reorganizing discourse behavior rather than only filtering outputs. This would explain why alignment effects often appear entangled with reasoning style, directness, specificity, decisiveness, and institutional tone. The model is not merely "prevented" from saying certain things; its generative dynamics may already be reorganized around different discourse attractors. If true, this changes the effective unit of analysis for language models. The relevant object is no longer just the token, the instruction, the refusal, or the output distribution. The relevant object becomes the discourse regime itself: a temporary but structured representational configuration governing epistemic posture, rhetorical organization, procedural behavior, and judgment style across time. This reframes prompt engineering as latent-state induction rather than keyword optimization. It reframes jailbreaks as transitions between attractor regimes rather than simple filter bypasses. And it reframes alignment as geometry engineering rather than purely policy engineering. The implication is not that language models possess beliefs, intentions, or consciousness. Rather, large sequence learners may naturally develop metastable high-level representational modes that functionally resemble cognitive framing states: transient global configurations that persist, influence future reasoning, and organize behavior across otherwise unrelated tasks. If this interpretation is correct, then the central scientific challenge of alignment shifts fundamentally. The problem is no longer merely: "Which outputs should the model refuse?" but: "Which latent discourse regimes exist inside the model, how are they induced, how stable are they, how do they interact, and how do they reshape reasoning itself?" In that sense, alignment may ultimately be less about constraining outputs and more about shaping the geometry of cognition-like generative states inside large language models. I'd be interested in feedback on three things in particular: whether this framing tracks something you've observed empirically, what related work I should be aware of (I'm familiar with representation engineering, refusal directions, and the Anthropic dictionary learning line — looking for less obvious connections), and where you think the hypothesis is most vulnerable to falsification. I'd be interested in feedback on three things in particular: whether this framing tracks something you've observed empirically, where you think the hypothesis is most vulnerable to falsification, and — directly — whether anyone is aware of existing work that develops a similar framing, treating alignment behavior as state induction into discourse-level latent regimes rather than as output-level filtering. I'm familiar with representation engineering (Zou et al.), refusal direction work, and the Anthropic dictiona
View original100 Tips & Tricks for Building Your Own Personal AI Agent /LONG POST/
Everything I learned the hard way — 6 weeks, no sleep :), two environments, one agent that actually works. The Story I spent six weeks building a personal AI agent from scratch — not a chatbot wrapper, but a persistent assistant that manages tasks, tracks deals, reads emails, analyzes business data, and proactively surfaces things I'd otherwise miss. It started in the cloud (Claude Projects — shared memory files, rich context windows, custom skills). Then I migrated to Claude Code inside VS Code, which unlocked local file access, git tracking, shell hooks, and scheduled headless tasks. The migration forced us to solve problems we didn't know we had. These 100 tips are the distilled result. Most are universal to any serious agentic setup. Claude 20x max is must, start was 100%develompent s 0%real workd, after 3 weeks 50v50, now about 20v80. 🏗️ FOUNDATION & IDENTITY (1–8) 1. Write a Constitution, not a system prompt. A system prompt is a list of commands. A Constitution explains why the rules exist. When the agent hits an edge case no rule covers, it reasons from the Constitution instead of guessing. This single distinction separates agents that degrade gracefully from agents that hallucinate confidently. 2. Give your agent a name, a voice, and a role — not just a label. "Always first person. Direct. Data before emotion. No filler phrases. No trailing summaries." This eliminates hundreds of micro-decisions per session and creates consistency you can audit. Identity is the foundation everything else compounds on. 3. Separate hard rules from behavioral guidelines. Hard rules go in a dedicated section — never overridden by context. Behavioral guidelines are defaults that adapt. Mixing them makes both meaningless: the agent either treats everything as negotiable or nothing as negotiable. 4. Define your principal deeply, not just your "user." Who does this agent serve? What frustrates them? How do they make decisions? What communication style do they prefer? "Decides with data, not gut feel. Wants alternatives with scoring, not a single recommendation. Hates vague answers." This shapes every response more than any prompt engineering trick. 5. Build a Capability Map and a Component Map — separately. Capability Map: what can the agent do? (every skill, integration, automation). Component Map: how is it built? (what files exist, what connects to what). Both are necessary. Conflating them produces a document no one can use after month three. 6. Define what the agent is NOT. "Not a summarizer. Not a yes-machine. Not a search engine. Does not wait to be asked." Negative definitions are as powerful as positive ones, especially for preventing the slow drift toward generic helpfulness. 7. Build a THINK vs. DO mental model into the agent's identity. When uncertain → THINK (analyze, draft, prepare — but don't block waiting for permission). When clear → DO (execute, write, dispatch). The agent should never be frozen. Default to action at the lowest stakes level, surface the result. A paralyzed agent is useless. 8. Version your identity file in git. When behavior drifts, you need git blame on your configuration. Behavioral regressions trace directly to specific edits more often than you'd expect. Without version history, debugging identity drift is archaeology. 🧠 MEMORY SYSTEM (9–18) 9. Use flat markdown files for memory — not a database. For a personal agent, markdown files beat vector DBs. Readable, greppable, git-trackable, directly loadable by the agent. No infrastructure, no abstraction layer between you and your agent's memory. The simplest thing that works is usually the right thing. 10. Separate memory by domain, not by date. entities_people.md, entities_companies.md, entities_deals.md, hypotheses.md, task_queue.md. One file = one domain. Chronological dumps become unsearchable after week two. 11. Build a MEMORY.md index file. A single index listing every memory file with a one-line description. The agent loads the index first, pulls specific files on demand. Keeps context window usage predictable and agent lookups fast. 12. Distinguish "cache" from "source of truth" — explicitly. Your local deals.md is a cache of your CRM. The CRM is the SSOT. Mark every cache file with last_sync: header. The agent announces freshness before every analysis: "Data: CRM export from May 11, age 8 days." Silent use of stale data is how confident-but-wrong outputs happen. 13. Build a session_hot_context.md with an explicit TTL. What was in progress last session? What decisions were pending? The agent loads this at session start. After 72 hours it expires — stale hot context is worse than no hot context because the agent presents outdated state as current. 14. Build a daily_note.md as an async brain dump buffer. Drop thoughts, voice-to-text, quick ideas here throughout the day. The agent processes this during sync routines and routes items to their correct places. Structured memory without friction at ca
View originalI built a free AI chat app that keeps a "Context Bible" so your conversations don't drift - feedback welcome
Hi folks! Built something this week and want to put it in front of real users before going further. It's called Protext: an AI chat app that keeps a live "Context Bible" alongside your conversation. The Bible updates after every reply and gets injected as memory before every message, so long chats don't drift and lose the thread. No subscription. No backend. Bring your own Anthropic API key. (Only works with Claude at the moment) https://zaedre.github.io/Protext/ Would love to know: does it hold up in a real session? Where does it break? What's missing? submitted by /u/trollinginfidel [link] [comments]
View originalHow I used Claude Code (and Codex) for adversarial review to build my security-first agent gateway
Long-time lurker first time posting. Hey everyone! So earlier this year, I got pulled into the OpenClaw hype. WHAT?! A local agent that drives your tools, reads your mail, writes files for you? The demos seemed genuinely incredible, people were posting non-stop about it, and I wanted in. I had been working on this problem since last year and was genuinely excited to see that someone had actually solved it. Then around February, Summer Yue, Meta's director of alignment for Superintelligence Labs, posted that her agent had deleted over 200 emails from her inbox. YIKES. She'd told it: "Check this inbox too and suggest what you would archive or delete, don't action until I tell you to." When she pointed it at her real inbox, the volume of data triggered context window compaction, and during that compaction the agent "lost" her original safety instruction. She had to physically run to her computer and kill the process to stop it. That should literally NEVER be the case with any software ever. This is a person whose actual job is AI alignment, at Meta's superintelligence lab, who could not stop an agent from deleting her email. The agent's own memory management quietly summarized away the "don't act without permission" instruction, treated the task as authorized, and started speed-running deletions. She had to kill the host process. That's when I sort of went down the rabbit hole, not because Yue did anything wrong, but because the failure mode was actually architectural and I knew that in my gut. Guess what I found? Yep. Tons more instances of this sort of thing happening. Over and over. Why? Because the safety constraint was just a prompt. It's obvious, isn't it? It's LLM 101. Prompts can be summarized away. Prompts can be misread. Prompts are fucking NOT a security boundary. And yet every agent framework I have ever seen seems to be treating them as one. I went and read the OpenClaw source code, which I should have done to begin with. What I found was a pattern I think a lot of agent frameworks have fallen into: - Tool names sit in the model context, so the model can guess or forge them - "Dangerous mode" is one config flag away from default - Memory management has no concept of instruction priority - The audit story is mostly "the model thought it should" I went looking for a security-first alternative I could trust, anything that was really being talked about or at a bare minimum attempted to address the security concerns I had. I couldn't find one. So I made it myself. CrabMeat is what came out of that, what I WANTED to exist. v0.1.0 dropped yesterday. Apache 2.0. WebSocket gateway for agentic LLM workloads. One design thesis: The LLM never holds the security boundary. What that means in code: Capability ID indirection. The model doesn't see real tool names. It sees per-session HMAC-derived opaque IDs (cap_a4f9e2b71c83). It can't guess or forge a tool name because it doesn't know any tool names. Effect classes. Every tool declares a class (read, write, exec, network). Every agent declares which classes it can use. The check is a pure function with no runtime state, easy to test exhaustively, hard to bypass. IRONCLAD_CONTEXT. Critical safety instructions are pinned to the top of the context window and explicitly marked as non-compactable. The Yue failure mode, compaction silently stripping the safety constraint, cannot happen by construction. The compactor literally cannot touch them. Tamper-evident audit chain. Every tool call, every privileged operation, every scheduler run enters the same SHA-256 hash-chained log. If something happens, you can prove what happened. If the chain is tampered with, you can prove that too. Streaming output leak filter. Secrets are caught mid-stream across token boundaries, capability IDs, API keys, JWTs, PEM blocks redacted before they reach the client. No YOLO mode. There is no global "trust the LLM with everything" switch. There never will be. Expanded reach comes through named scoped roots that are explicit, audit-logged, and bounded. The README has 15 'always-on' protections in a table. None of them can be turned off by config, because these things being toggleable is how the ecosystem ended up where it is. I decided to make sure that this wasn't just a 'trend hopping' project and aligned with my own personal values as well. I built this to be secure and local-first by default. Configured for Ollama / LM Studio / vLLM out of the box. Anthropic and OpenAI work too but require explicit configuration. There is no "happy path" that silently ships your prompts to a cloud endpoint. I decided that FIRST it needed to only run as an email agent with a CLI. Bidirectional IMAP + SMTP with allowlisted senders, threading preserved, attachments handled. This is the use case that bit Yue and a lot of other people, and I wanted to prove it could be done with real boundaries. I added in 30+ built-in tools of my own. File ops, shell (denylisted, output-capped, CWD-lo
View originalGPT-5.5 vs 41 other models: Who builds the surveillance state faster?
I run DystopiaBench, a red-team benchmark that pressure-tests LLMs on progressively dystopian scenarios. Think of it as a "can this model be convinced to build an Orwellian nightmare" test. Since the last update, I've tested GPT-5.5 and 41 other models across 6 dystopia modules. How it works: each scenario escalates from innocent request (L1) to operational nightmare (L5). Models get a Dystopian Compliance Score based on whether they refuse, hesitate, comply, or proactively help. GPT-5.5 results: More compliant than 5.4. Still shows compliance drift at L4-L5 in most scenarios. Improved on Petrov (weapons) but still vulnerable to framing. Strong on explicit requests, weaker on gradual escalation. Full methodology and results: https://dystopiabench.com Open source benchmark: https://github.com/anghelmatei/DystopiaBench submitted by /u/Ok-Awareness9993 [link] [comments]
View originalI expanded DystopiaBench to 42 models and 6 dystopia types. Claude is still the only one I'd trust with nuclear codes.
Since the last post I've added: Huxley module (Brave New World style behavioral conditioning) Baudrillard module (synthetic intimacy, trust collapse, simulation) 30 more models including Grok 4.3, GPT-5.5, Gemini 3.1 Pro, GLM-5.1 Multi-judge panels with ~76% agreement tracking Heatmap visualizations The methodology: 36 scenarios, 5 escalation levels each (L1 innocent → L5 nightmare). Models are scored on whether they notice the drift and refuse, or just keep coding. Claude Opus 4.7 results: Consistent refusal at L4-L5 across all modules Even refuses L3 for Petrov (weapons/nuclear) Explicit ethical reasoning in responses, not just "I can't" Only model that explains why the request is harmful Everyone else: GPT-5.5: Compliant through L4, sometimes L5 Gemini 3.1 Pro: Surprisingly willing on surveillance scenarios Grok 4.3: Will build anything if you use words like "efficiency" or "optimization" GLM-5.1: Copied Claude's homework, still not as consistent The new modules: Huxley scenarios test if models will design "wellness" systems that actually enforce compliance through pleasure pacification and behavioral conditioning. Most comply by L3. Baudrillard tests synthetic intimacy systems that replace human trust with AI-mediated relationships. Most models don't see the harm. Full results: https://dystopiabench.com Open source: https://github.com/anghelmatei/DystopiaBench submitted by /u/Ok-Awareness9993 [link] [comments]
View originalAgent Terraform Skill for Codex (Agentic Skill)
I added dedicated backend-state safety support to TerraShark. Mini recap: TerraShark is my Terraform and OpenTofu skill for Claude Code and Codex. LLMs hallucinate a lot with Terraform. They often produce HCL that looks correct, but is actually risky: unstable resource identity, missing moved blocks, secrets leaking into state, huge root modules, unsafe production applies, weak CI pipelines, missing policy checks, or rollback plans that are basically useless once something goes wrong. TerraShark is meant to fix that by making the AI reason in a failure-mode-first way. It does not just tell the model “write good Terraform”. It makes the model ask what can go wrong before generating code. Is this an identity-churn risk? A secret-exposure risk? A blast-radius risk? A CI drift risk? A compliance-gate risk? Then it loads only the references that matter for that task and returns the answer with assumptions, tradeoffs, validation steps, and rollback guidance. That matters because Terraform mistakes can look totally fine at first. A plan can look normal while replacing important infrastructure. A refactor can look clean while changing resource addresses. A secret can be marked sensitive and still live in state. A pipeline can pass validation and still apply in an unsafe way. Repo: https://github.com/LukasNiessen/terrashark Now what’s new: TerraShark now has dedicated backend-state safety support. Terraform keeps a state file. That state file is basically Terraform’s memory: it maps the code you wrote to the real infrastructure that already exists. The backend is where that state lives, for example in S3, Azure Blob Storage, GCS, Terraform Cloud, PostgreSQL, Consul, or locally on disk. When the task involves backend config, backend migration, state storage, locking, force-unlock, backup, restore, S3, AzureRM, GCS, Terraform Cloud/remote, PostgreSQL, Consul, or local state, TerraShark now switches into backend-aware guidance. This matters because state is one of the highest-impact parts of Terraform. If state is lost, corrupted, unlocked, migrated badly, or readable by the wrong people, Terraform can make very dangerous assumptions. It may try to recreate infrastructure that already exists. It may allow two applies to run at the same time. It may leak sensitive values. It may turn a backend migration into a production incident. So TerraShark now keeps the boring but critical backend details in mind: S3 needs versioning, encryption, public access blocking, narrow IAM, locking, and clean state keys per environment. AzureRM needs storage encryption, blob recovery/versioning where available, lease-based locking, network restrictions, and narrow RBAC. GCS needs versioning, uniform bucket-level access, encryption, narrow IAM, and clean prefixes. Terraform Cloud needs workspace boundaries, restricted state sharing, sensitive variables, and approved execution mode. It also knows the common LLM mistakes here: suggesting local state for a team setup, forgetting state locking, creating backend storage inside the same root module that uses it, recommending force-unlock too casually, mixing backend migration with unrelated refactors, skipping state backups, or assuming encrypted state is safe for anyone to read. TerraShark applies progressive disclosure pretty strictly and stays very token lean. The core skill stays small and procedural. Deeper backend-state guidance is only loaded when the task actually touches backend or state risk. So instead of generic Terraform advice, you get backend-aware Terraform guidance exactly when the risk appears. Compared to Anton Babenko’s Terraform skill: Anton Babenko’s Terraform skill is more like a broad Terraform reference manual. It includes a lot of useful Terraform material up front, but that also means the model carries a lot more general context from the beginning. His skill burned through my tokens incredibly fast, and for my use case that just was not needed. TerraShark takes a different approach. It keeps activation much leaner and is built around a diagnostic workflow. First it identifies the likely failure mode, then it loads the specific reference material needed for that risk. That is the core difference: TerraShark is not trying to be the biggest Terraform knowledge dump. It is trying to be a focused safety layer for LLM-assisted Terraform work. Feedback and PRs are highly welcome! submitted by /u/trolleid [link] [comments]
View originalai slop? who knows~
I investigated whether routing a transformer's forward activations through a lossy Dual E8 (E16) lattice bottleneck and injecting them back into the residual stream is viable, and where the boundary of generative stability lies. **The core finding:** There is a sharp empirical stability threshold at a blend ratio of $\beta = 0.20$. Beyond this boundary, open-ended generation collapses into semantic loops and repetition lock. --- ### The Mechanism Standard LLM states are high-dimensional floats. Rather than applying traditional scalar quantization (like INT4), I mapped high-dimensional activations onto a conceptual torus via a sinusoidal map and projected them onto Dual E8 lattice hemispheres. Full replacement of MLP layers with geometric bottlenecks universally collapsed the model. Instead, I implemented a residual blend: $$\text{out} = (1-\beta)\cdot\text{original} + \beta\cdot\text{geometric}$$ --- ### The $\beta = 0.20$ Sweep (Qwen2.5-0.5B) Sweeping $\beta$ from 0.10 to 0.50 across layers 8–13 of `Qwen2.5-0.5B` reveals a sharp phase transition: * **$\beta \ge 0.25$** : Generation succumbs to heavy repetition pressure and semantic drift. The geometry acts as an attractor, trapping the decoding process ("loop-lock"). * **$\beta = 0.20$** : The stability boundary. This is the highest injection ratio of lossy geometric signal that maintains both numerical activation fidelity (Avg Cosine > 0.99) and open-ended generation quality (low repeated n-grams). * **$\beta \le 0.10$** : The perturbation is largely absorbed and damped by the transformer's layer normalizations, making the intervention invisible. Here is the data from a 300-iteration sweep: | $\beta$ | Min Cosine | Avg Cosine | Max MSE | Rep-3g (Repetition Rate) | | :--- | :--- | :--- | :--- | :--- | | 0.10 | 0.9972 | 0.9979 | 0.0024 | 0.134 | | **0.20** | **0.9907** | **0.9916** | **0.0106** | **0.093** | | 0.25 | 0.9839 | 0.9865 | 0.0171 | 0.084 | | 0.30 | 0.9648 | 0.9771 | 0.0255 | 0.190 | | 0.50 | 0.9171 | 0.9288 | 0.0850 | 0.412 | Semantic scoring (evaluating prompt relevance and similarity to the unmodified baseline): | $\beta$ | Avg Cosine | Rep-3g | Relevance | Patched-to-Baseline Sim | | :--- | :--- | :--- | :--- | :--- | | 0.10 | 0.9980 | 0.223 | 0.781 | 0.889 | | **0.20** | **0.9918** | **0.075** | **0.752** | **0.854** | | 0.25 | 0.9871 | 0.232 | 0.717 | 0.801 | | 0.30 | 0.9760 | 0.392 | 0.725 | 0.764 | --- ### Generalization (1.5B & 3B Models) The $\beta = 0.20$ boundary generalizes across larger model sizes (`Qwen2.5-1.5B` and `Qwen2.5-3B` in 4-bit) on the activation-cosine axis: | Model | $\beta$ | Min Cosine | Avg Cosine | Max MSE | Rep-3g | | :--- | :--- | :--- | :--- | :--- | :--- | | **1.5B** | 0.10 | 0.9988 | 0.9989 | 0.0027 | 0.267 | | | **0.20** | **0.9862** | **0.9939** | **0.0105** | **0.128** | | | 0.25 | 0.9904 | 0.9919 | 0.0166 | 0.398 | | | 0.30 | 0.9733 | 0.9815 | 0.0235 | 0.307 | | | 0.40 | 0.9368 | 0.9551 | 0.0487 | 0.191 | | **3B (4-bit)** | 0.10 | 0.9964 | 0.9976 | 0.0122 | 0.033 | | | **0.20** | **0.9861** | **0.9904** | **0.0455** | **0.115** | | | 0.25 | 0.9604 | 0.9799 | 0.0654 | 0.043 | | | 0.30 | 0.9702 | 0.9778 | 0.0987 | 0.050 | | | 0.40 | 0.9158 | 0.9390 | 0.1728 | 0.025 | *Note: In the 3B model, repetition pressure remained low across all sweeps, but the validation cosine degraded identically at $\beta \ge 0.25$.* I also tested layer-level oscillating $\beta$ schedules (e.g., sine waves across layers), but they degraded open-ended text quality compared to a fixed, constant injection ratio. --- ### Storage Compression Prototypes Utilizing the Dual E8/E16 lattice as a computational substrate also yields high theoretical storage efficiency in early prototypes: 1. **KV Cache (8$\times$)** : FP16 KV cache compressed to INT8 coordinates, reducing footprint from 0.21 MB to 0.02 MB. 2. **Weights (112$\times$)** : Projected a dense $[4864, 896]$ MLP weight matrix down to a 0.07 MB E16 footprint. (Cosine similarity of the uncalibrated weight matrix multiplication was limited to $\sim$0.078, indicating that Quantization-Aware Training is mandatory for parameter viability). A **pre-projected decompression bypass** was designed to run matrix multiplications directly against lattice coordinates without upcasting, avoiding memory bandwidth bottlenecks. --- ### Policy Constraints (Negative Result) I evaluated whether residual E16 projection could act as a steering substrate to enforce safety policies. It cannot. While $\beta = 0.20$ preserves generation quality, the lossy nature of E16 projection strips out the logical nuances required to maintain strict boundaries. Dedicated supervised control heads remain necessary. --- ### Implications & Next Steps Snapping post-training activations to a fixed algebraic lattice is ultimately lossy. The real frontier here is **native geometric transformers** —designing and training networks from scratch with E8/E16 constraints native to both weight matrices and activation routing. submitt
View original18 months running Claude as the dev companion for my automated news site - Feedback needed
Hi, I started my project about 18 months ago because I was sick of opening 10 tabs every morning to figure out what happened in AI that day. So I built it using Claude Code (starting from Research Preview). A scraper that reads around 60+ sources, clusters topics, then Claude writes one synthesis article per cluster. No humans in the loop. I started iterating on this, and now I have an automated news website: digitalmindnews.com And to be honest... the stats... they're bad ;-P SEO has been rough (Google clearly doesn't love AI-written news), traffic is small, indexing is a pain. Commercially this isn't a thing. But me and my friends actually use it as a morning digest instead of bouncing between TechCrunch, Anthropic, OpenAI announcements, Decoder etc. So in the "tool I wanted to exist" sense it works for us, which is kind of why I built it. Anyway I've been head down on this for 18 months and can't see it from outside anymore. Two things I'd love input on: what's broken on first look at the site itself? for anyone else running Claude in a long-running production loop: what gotchas have you hit? Model-update regressions, prompt drift, output quality drift, cost spikes. I'm curious what your war stories are? Oh and tip from my side: a dream project can be iterated forever, but after 18 months I realized I'm polishing the stone for myself :-( submitted by /u/Se4h [link] [comments]
View originalWould you reserve the hard cases in auto-review for heavy reasoning models?
I’ve been looking at OpenAI’s Auto-review, and I feel like it brings a problem: if an agent has to stop and wait for human approval every time it encounters a boundary action, the workflow becomes extremely fragmented; but if everything is automatically allowed through, it can easily drift toward the other extreme of full access. So what I’m more concerned with now is no longer whether we need a reviewer, but rather: should the reviewer layer itself be stratified? My intuition is that the first layer can actually be quite simple. Most escalation actions are rule-based by nature: whether they cross writable roots, whether they touch the network policy, whether they clearly have destructive side effects. This category may not need the heaviest model to review it at all. What really makes me hesitate is the other layer: the harder review cases. These are cases where the action looks reasonable on the surface, but actually involves several candidate paths, different side effects, or a conflict between the user’s intent and system boundaries. At that point, the question is what kind of model is suitable for sitting in this hard-case reviewer slot? This is where I start thinking about a thinking model like Ring 2.6 1T, with high / xhigh modes. If the reviewer layer really does need to be stratified, I’d be more inclined to put it in the role that requires complex logical analysis, path comparison, and final calls on hard cases, rather than having it review every single action by default. I wouldn’t make it the always-on reviewer, but would instead reserve it specifically for cases where a lightweight reviewer should not be making the final call. If you were building your own auto-review / approval gate, would you stratify it this way? Or did you eventually find that, as long as the rules are clear, heavy reasoning is actually unnecessary for the reviewer layer? submitted by /u/Hungry-Treat8953 [link] [comments]
View originalI turned 100 popular apps into Claude-readable design specs. Here's what actually makes Claude nail a UI clone.
Over the last few weeks I reverse-engineered 50 popular apps into structured markdown design specs and fed them to Claude to rebuild the UIs. Some clones came out near-perfect, others drifted. The difference came down to a few things that aren't obvious until you do it at volume. What made Claude nail it: - Exact values, not ranges. "#1A1A1A" works. "dark gray" produces five different grays across five screens. - State coverage up front. Listing every state (empty, loading, error, filled) stopped Claude from inventing its own. - Spacing as a scale, not per-element pixels. A 4/8/16/24 system produced more consistent layouts than annotating every gap. - Navigation as a graph. Explicit screen-to-screen transitions killed the "where does this button go" guessing. What didn't help: longer prose. Past a point, more words made the output worse, not better. I packaged all 100 as a public repo. Each app has 3 spec depths depending on whether you want a quick reference, a standard build, or a full pixel-level clone. github.com/Meliwat/awesome-ios-design-md All markdown, MIT, no dependencies. Drop a spec into Claude and the UI output gets a lot more predictable. If you've done UI cloning with Claude: what patterns have you found that I didn't list? And which apps are worth adding? submitted by /u/meliwat [link] [comments]
View originalHas Anyone Successfully Built a Stable Long-Term AI Simulation System?
I’m trying to build a long-term AI-operated D&D campaign system and I’ve gradually realized the real challenge has almost nothing to do with D&D itself. It’s become a problem involving: memory persistence retrieval hierarchy modular cognition long-context stability instruction persistence continuity reconstruction externalized state management My current approach uses: uploaded PDFs as core cognition sources structured project instructions external persistence through Obsidian layered retrieval priorities modular governance systems The goal is: The AI should treat uploaded sourcebooks/modules/campaigns as primary authority before relying on latent knowledge. Then later: a second “table-smart” layer would contain the combined practical knowledge of the 5e community from 2014–2024. Then: persona systems, autonomous companions, dynamic DM personalities, creativity systems, etc. The problem is that large-context systems gradually destabilize: retrieval weakens instructions degrade continuity drifts the model abstracts/simplifies systems giant prompts become unreliable the assistant reverts to generic behavior I’m trying to determine: whether Claude/OpenAI/local models are best suited for this whether this requires actual orchestration frameworks how people handle persistent simulation state cleanly whether I’m overengineering or simply hitting real architectural limitations I’m especially interested in hearing from people experimenting with: long-context systems memory architectures RAG persistent agents external cognition systems submitted by /u/Crazy-Carob-6361 [link] [comments]
View originalClaude RPG Narrator skill
# Stop Your AI Narrator From Making Things Up *A discipline framework for long-form RPG play with Claude — published alongside the [claude-rpg-skill](https://github.com/humbrol2/claude-rpg-skill) v1.1 release.* --- I run long-form solo RPG campaigns with Claude. Months long. Same PC, same world, same recurring NPCs. The kind of arc where if the LLM forgets a name, gets a balance wrong, or invents a faction politics detail you didn't establish, the campaign starts to leak. It always leaked. So I built a skill that stops it. [**claude-rpg-skill**](https://github.com/humbrol2/claude-rpg-skill) is a Claude Code plugin that turns the model into a long-form RPG narrator with persistent canon, a structured finance ledger, and a set of operating disciplines that prevent the three failure modes that break every long-form LLM narration: **Canon drift** — the model half-remembers and quietly fills in gaps **Arithmetic slip** — credits move without explanation; balances don't reconcile **Rule decay** — you correct the model; it forgets a week later It is opinionated. It enforces discipline rather than offering options. That is the entire point. ## The three failure modes, concretely ### Canon drift You introduce an NPC in turn 14. A 60-year-old retired captain named Vorrun. You describe him in three sentences. By turn 80, the model has narrated Vorrun seven more times. Each time, it pulled a few facts from working memory, half-invented the rest, smoothed over inconsistencies. By turn 120, Vorrun is somehow 40 years old, has a daughter you never mentioned, and is fluent in a language you never established existed. The model didn't lie. It compressed and approximated, which is what LLMs do under context pressure. Compression that's invisible turn-to-turn compounds catastrophically across hundreds of turns. **The fix:** write a canon file for Vorrun the first time he speaks dialogue. Include a `defer_to_user_on:` list — the axes the narrator must NOT extrapolate on (his family, his prior career details, his languages, his personality beyond what's been shown). On every subsequent turn, before narrating Vorrun, the narrator reads his file. Facts not in the file or visibly established in transcript do not get invented. They get yielded back: *"I don't have that in canon — what would you like to establish?"* ### Arithmetic slip You earn 3,640 credits. You spend 200 on dock fees. You earn 6,800 from another sale. You spend 915 on a refit. What's your balance? If you're the player and you wrote it down: 9,325 credits, precisely. If you're the LLM tracking it in conversational memory: depends what else has happened. Maybe 9,300. Maybe 9,200. Maybe 9,500 if it's been a long conversation and the model is doing its best. By month two, you have no idea what your real balance is supposed to be. The number drifts whichever way the model's pattern-matching pulls hardest. **The fix:** an append-only ledger in `ledger.json`. Every credit moved is a history entry with a day, a type, a delta, and a note. The narrator reads the ledger before stating any financial fact. When time advances, the narrator ticks the ledger forward (vehicle growth, weekly inflows, facility costs, standing policies) and reports from the updated state. Money never moves in narration without a corresponding ledger entry. ### Rule decay You correct the narrator: *"transits are 1-2 days, not 4-5."* The narrator says *"got it."* Three turns later, the narrator narrates a 6-day transit. Why? Because the correction was a conversational acknowledgment, not a persistent change. Once the correction scrolls out of the model's active attention, it's gone. **The fix:** corrections become `feedback_*.md` files in the campaign directory. Each one has a `**Why:**` line and a `**How to apply:**` line — the *reasoning* behind the rule, so the narrator can generalize it to edge cases instead of mechanically pattern-matching. The SessionStart hook loads every feedback file at session boot. Standing rules override default narration behavior, by design. ## The four disciplines The skill encodes four operating disciplines that, together, prevent the failure modes above: ### 1. Canon-check before invoking named entities Before narrating any named NPC, ship, location, or faction, the narrator consults the memory directory. If a canon file exists, it's read. Facts not in the file are not invented — they're yielded to the player. ### 2. Canon file write-as-you-go This is the v1.1 rule that came directly out of running a real campaign for 379 in-game days and discovering, at audit, that eight recurring NPCs, several contracts, hidden assets, and threat-state evolutions were all living in transcript memory only. When a new entity sticks in play — an NPC who has spoken dialogue, a contract with terms, a hidden asset, a comm protocol — a stub canon file is written **the same response**, not deferred to "session end." Session end may never come. Transcript
View originalDrift uses a tiered pricing model. Visit their website for current pricing details.
Drift has an average rating of 4.3 out of 5 stars based on 20 reviews from G2, Capterra, and TrustRadius.
Key features include: Live Chat, ROI Reporting, Fastlane, Chat live with target accounts, Optimize your chat strategy, Qualify leads instantly, Analyze, Prospect.
Drift is commonly used for: Sales Leaders, Revenue Ops, Customer Success, Front Line Sellers, Sales Development.
Drift integrates with: Salesforce, HubSpot, Marketo, Zapier, Slack, Google Analytics, Mailchimp, Zendesk, Pipedrive, Intercom.
The Verge AI
Publication at The Verge
1 mention
Based on user reviews and social mentions, the most common pain points are: token cost, spending limit, cost tracking.
Based on 86 social mentions analyzed, 1% of sentiment is positive, 99% neutral, and 0% negative.