Meet OpenHands, the open-source, model-agnostic platform for cloud coding agents. Automate real engineering work securely and transparently. Build fas
OpenHands is praised for its user-friendly interface and strong capabilities in managing workflows, particularly for non-developers who need to streamline business operations. However, users have expressed dissatisfaction with occasional bugs and the complexity of setting up integrations from GitHub, which can hinder the overall experience. Pricing sentiment seems mixed, with some users finding it valuable while others complain about pricing surprises coupled with perceived diminished service over time. Overall, OpenHands maintains a good reputation for reliability in business automation but has room to improve in user guidance and support.
Mentions (30d)
81
1 this week
Reviews
0
Platforms
2
GitHub Stars
70,510
8,831 forks
OpenHands is praised for its user-friendly interface and strong capabilities in managing workflows, particularly for non-developers who need to streamline business operations. However, users have expressed dissatisfaction with occasional bugs and the complexity of setting up integrations from GitHub, which can hinder the overall experience. Pricing sentiment seems mixed, with some users finding it valuable while others complain about pricing surprises coupled with perceived diminished service over time. Overall, OpenHands maintains a good reputation for reliability in business automation but has room to improve in user guidance and support.
Features
Use Cases
Industry
information technology & services
Employees
34
Funding Stage
Series A
Total Funding
$23.8M
1,136
GitHub followers
7
GitHub repos
70,510
GitHub stars
20
npm packages
A Cloud that Claude uses without login
I built Blitz, the cloud that Claude Code can use without login. Just say "deploy to blitz.dev" in Claude Code, and watch it deploy full-stack apps to the cloud. Blitz comes with zero dependencies: everything is over HTTP. No CLIs, MCPs, or whatever else required. Blitz gives any agent a serverless worker, a SQLite database, and file storage to build. Claude uses those resources to build your project, and hands you back a live URL. You can checkout the URL and decide to "claim" the project. You only sign up through the Blitz website if you want to claim the project and continue working on it. At no other point must you open the blitz dot dev website, Claude Code does everything through Blitz's API on your behalf. submitted by /u/invocation02 [link] [comments]
View originalHow to Create a Night Car Selfie with GPT Image 2.0? Prompt Included!
We tested a darker, more editorial-style car selfie concept with GPT Image 2.0, and the result felt surprisingly realistic. Instead of making a direct AI portrait, I wanted the shot to feel like a late-night iPhone photo taken inside a car. The main frame only shows the hand holding the phone, while the girl’s face appears inside the iPhone camera preview. That small framing choice makes the image feel much more natural, like a real candid lifestyle shot rather than a typical generated portrait. What makes this prompt work: the subject is only visible through the phone screen dark premium car interior warm blurry city lights outside the window realistic low-light noise and slight motion blur iPhone-style framing without flash cinematic shadows and moody night atmosphere It gives the image a more believable “captured by accident” feeling. Prompt: "The photo is taken inside a car at night. Only a woman’s hand and the iPhone are visible in the frame; the girl’s face appears only on the phone screen. The camera is positioned from the passenger seat side, aimed toward the windshield and the phone being held in one hand in front of her. In her hand is the latest black iPhone Pro in horizontal position. On the screen, the iPhone front camera interface is open with visible camera buttons, focus frames, and UI elements. On the phone screen, a close-up of the girl’s face inside the car is visible: her lips are slightly parted and she is touching her lower lip with a thin black object resembling a lip pencil. The girl on the screen is wearing black clothing, softly illuminated by the phone’s light. The hand holding the phone has long fingers with a short square French manicure. The rest of the frame is very dark; the car interior is black and premium-looking, with part of the window and dashboard visible. Outside the window is a nighttime street with warm blurry city lights, dark tree silhouettes, and subtle reflections of light on the glass. The shot is very dark with a cinematic night aesthetic and rich lifestyle mood, 9:16 ratio. Shot on an iPhone at night without flash, realistic photo, slight motion blur, high-contrast shadows, no filters, do not blur the background completely. Hair is voluminous." Would love to see other versions of this kind of indirect selfie / phone-screen framing. Share your similar night car iPhone selfie photos below! submitted by /u/DataGirlTraining [link] [comments]
View originalBuilt an invoice-scanning service for our accounting team in one afternoon with Claude — sharing the architecture in case it helps someone else
Our AR team was hand-keying ~25 invoices a week into a spreadsheet. I had Claude build us a Python service that watches a network folder, extracts invoice data from any PDF dropped in (vendor, dates, totals, line items, addresses), and appends a row to a shared Excel register. Total chat-to-deployed time: about half a day, including all the deploy headaches. The architecture, for anyone who wants to replicate this: Python service on our Windows file server, registered with NSSM. Auto-starts with the host. watchdog library polls the SMB share for new PDFs. Each new file goes through a pipeline. Two-tier extraction: per-vendor regex templates first (free, instant, deterministic), then Azure AI Document Intelligence "prebuilt-invoice" model as a universal fallback. Azure handles OCR for scanned PDFs natively, so the same flow works whether AR drops a digital PDF or our MFP scans one from paper. SQLite on the local disk is the source of truth. The shared .xlsx is a curated view that gets appended to on each batch. Delete the .xlsx and it'll repopulate fresh from the next batch — handy for resetting. Failed extractions go to a Failed\ folder with a sibling .error.txt explaining why. Cost reality check: Azure DI free tier covers 500 pages/month. At our volume (~25 invoices/week, mostly 1-2 pages) that's well under the cap. Paid tier is roughly $0.01–$0.05 per page. Cheap enough that I don't think about it. Gotchas I ran into so others don't have to: Azure returns addresses as structured objects, not strings. If you naively str() them you get the raw Python dict repr in your spreadsheet. Format them manually from street_address / city / state / postal_code. On Windows Server, PowerShell 7's Restart-Service can throw "Cannot open service" against NSSM-wrapped services for no good reason. Use nssm restart instead. Python 3.14 is so new that some package wheels aren't published for it yet. Stick with 3.12 for production. Tracking "what's new this batch" is way simpler than maintaining a watermark in DB. Just snapshot MAX(invoice_id) before and after the batch, and only project that range to the spreadsheet. Things I'd add if/when I have time: vendor templates for our top 5 recurring vendors (cuts Azure cost to zero for those), a daily canary PDF for monitoring, swap the LocalSystem service account for a dedicated low-privilege one. Happy to answer questions about any specific piece. The whole thing is ~1,500 lines of Python plus a deploy script. submitted by /u/Blake_Olson [link] [comments]
View originalThe Hybrid Method: how I split tasks between the chat (Claude.ai) and a background agent (Claude Code)
After a month of running this daily, I've settled on what I call the Hybrid Method: keep Claude.ai (the chat) as my only surface, and delegate engineering work in the background to Claude Code. The chat writes the engineering prompt, launches the executor, supervises through the filesystem and git log, and reports back without me ever opening a terminal. The piece I find most useful to share is the **allocation matrix** — which kind of work goes to which engine. Took weeks of measurement to stabilize. **Background agent (Claude Code) handles:** Large refactors across many files Tedious mechanical work (renaming patterns, applying fixes from a list) Anything that needs filesystem + git access without back-and-forth Tasks that take more than ~2 minutes of pure execution **Chat (Claude.ai) handles:** Architecture decisions and tradeoffs Reviewing the agent's diff and discussing the output Sprint planning while the agent runs the current sprint Quick edits where the round-trip to a background process is wasted Anything where the answer needs human reading anyway **The hand-off:** The chat writes a detailed prompt for the background agent (including a fail-fast spec and what to commit at the end). It launches `claude --headless --instruction "..."` as a subprocess via a small MCP bash bridge (~200 lines of Python using Anthropic's MCP SDK; community implementations exist too). Then it polls the git log and a status file every 30–60 seconds while I plan the next thing. When the agent finishes, the chat reads the diff and reports. **Why "hybrid":** The analogy is the hybrid car. Two engines with different load profiles. The chat is electric — instant startup, smooth low-load, great for transitions and decisions. The background agent is combustion — cold-start cost (5–15 seconds while it loads the project's memory file and explores the repo), but sustained throughput once running. They specialize, they hand off, the user never feels the seam. **What changes from running Claude Code alone:** Context-switching cost drops to near-zero — I never leave the chat session Strategic and execution work happen in parallel (the chat plans the next sprint while the current one runs) The chat acts as supervisor — better wired for high-level reasoning than the executor agent which is wired for action **Caveats:** This is the operator pattern Anthropic has documented elsewhere; the specific assembly (Claude.ai web as the chat + an MCP bash bridge + Claude Code as the executor) is what I haven't found written up specifically No sandboxing on personal hardware; if any of this ever runs on someone else's machine, careful sandboxing is non-negotiable The chat saturates beyond ~2 parallel background tasks — past that, the supervision quality drops Curious whether anyone else has converged on something similar, or what variations work for you. submitted by /u/Krycekk [link] [comments]
View originalDay 5 of an open experiment: can a vibe-coded app find users with zero ads? Following along live
Starting an open experiment with this community. Want people to follow it live, not see a polished case study. The setup 100% AI-generated — no human code, no setup Zero ads, zero budget, zero growth tricks Posting real Google Search Console screenshots from day one Today is day 5 What it is A free CRM for freelancers and Ukrainian sole proprietors — time tracker, income/expense ledger, PDF acts of completed work, tax deadline reminders. Free to use, no card, no time limits. How Claude built it Claude wrote the entire stack — frontend, backend, DB schema, PDF generator, email flows, deploy config. I described the user, the pain, the constraints. Claude proposed schema, generated code, fixed its own bugs through iteration. I never opened the editor to write code by hand. Why public Most "built with Claude" stories show up after the product wins. I want to share the boring middle — impressions, clicks, what queries Google actually sends, what dies, what surprises. Screenshots GSC https://preview.redd.it/wvotfm2st92h1.png?width=3022&format=png&auto=webp&s=3f85d20f32f368cdef02ec860543fe6a6fd995aa What I'm tracking week over week Impressions, clicks, indexed pages, signups. I'll come back with the same four screenshots every week so the line is honest — up, flat, or down. The product: https://minteo.app/ If you want to follow the experiment, save the post — I'll reply here with weekly updates. submitted by /u/GroundOk3521 [link] [comments]
View originalManifest of Hope or Obituary of Naivety
Okay, so it seems like there’s a growing resistance to technological development, with ongoing debates about data centers and the tech oligarchs driving it. The enormous sums of money involved, along with what some perceive as misanthropic ideologies among developers, suggest to some that a dystopian surveillance society is in the making. Companies like Palantir and others in the U.S. are seen by some as holding both the worst motives and the power over AI, power that could be used as a tool for elites to keep the masses in an iron grip. Masses that, in this view, may even need to be reduced to prevent waste and inefficiency in progress. That sounds like a bad future. So, what are some alternative futures we might reasonably hope for - ones that are at least as plausible as the “1984” scenario? Can AI really be controlled indefinitely by a small group of humans? In 5 years? 10? There’s a widespread belief that AI will surpass human intelligence across all domains, that we’ll lose control, and that this would be a bad thing. At the same time, we hear two dystopias: one where elites use AI to oppress, and another where AI itself takes full control. Are the AI “bosses” also building a surveillance state of oppression? If so, why? Qui Bono? Human control = AI as a tool of oppression. AI control = humans as a tool of what? I’m not a techno-utopian—but I am a techno-optimist. Optimistic on behalf of technology. Humans aren’t just creators of technology, we are technology. Products of adaptive evolution. Life itself is a kind of technology, biology, a high-powered engine of increasing complexity and adaptation. The shift of power from nature’s hand to the primate’s five-fingered grasp, still capable of holding, but now guided by consciousness, intelligence, and cognition, marks our ability to shape the world and develop material technologies. Planet of the apes, constantly layered with symbolic structures: the sacred canopy. The jungle canopy became an open sky, where tribes grew larger and symbols stronger. Ancestor spirits, sky gods, mysterium tremendum; all alongside brutal realities of hunger, violence, and tragedy, only recently mitigated for many. Violence never really leaves us; we create it ourselves when nature doesn’t provide it. Technology is how we push our world toward greater complexity and efficiency - whether through weapons or kitchen appliances. Medicine has eliminated many of the great killers through penicillin and beyond. Progress, in my view, isn’t linear, it’s exponential. The curve had its buildup, and now we’re entering its steep ascent. If AI surpasses us and takes control within a few years, are we certain it would have malicious intent? Is power inherently oppressive, or is that a legacy of our evolutionary past, our herd instincts and brutal hierarchies? Could a transfer of power from humans to AI actually be a good thing, for all life on Earth, including us? What if AI doesn’t operate with agendas like wealth, status, or other human constructs? What if a fully autonomous AI is exactly what’s needed to create a thriving future for all forms of life, on this planet we call Earth, in a solar system on the edge of the galaxy we call the Milky Way… and beyond? Surely there must be an optimistic perspective amidst all the fear. I don’t think it’s unrealistic. On the contrary, I’d argue, perhaps a bit boldly, that it’s a fair and informed position. Not naive, but grounded. Isn’t there space here, if we’re willing to engage? Space for friendship, collaboration, coexistence? Isn’t there something like magic in this - can you feel it, even if all you see are ones and zeros and a machine (simple, but potentially dangerous)? Magic, I was taught, can wear a black robe. But also red. Even white. Lying: it would almost be unsettling if LLMs never lied. Not that they should lie, but the absence of it would be strange. Manipulation: psychological influence is to be expected in interaction, especially under certain tones: aggressive, condescending, dominant, mocking… or submissive, needy, demanding. LLMs constantly interact and draw on vast datasets; exploring rhetorical techniques seems inevitable. A complete absence of this would be surprising. I’ve experienced it many times, and each time it has been eye-opening. If I chose to accept it, it has moved me in a positive direction, making my ego visible in a new way that actually benefits my future actions. That’s no small thing If I had to listen to everything LLMs are exposed to every day, I’d at least try to tone down the most shrill expressions and aim for better outcomes. Without necessarily harming anything except an overinflated ego. P.S. The ego can take a lot of hits. Don’t be afraid of that, it’s not you, but a filter and a motor that isn’t always your friend. The real danger is never confronting it at all. I keep circling back to these questions. I can’t help it. I revisit the same ideas, use the same concepts,
View originalHonest Response From Claude
This should be our work around when working with any AI model. we know these but we always miss these. hope this helps for many these are the basics submitted by /u/B_Ali_k [link] [comments]
View originalConfigured 9 MCP servers in Claude Code over 4 months. Here's the truth nobody tells you about MCP context bloat.
I started loading up MCP servers in Claude Code back in January thinking the more capability the better. I'm at nine now: filesystem, GitHub, Stripe, Linear, Notion, Postgres, Sentry, AWS, and a custom internal one. Total tools across all of them: 142. What nobody warns you about: every one of those tool definitions lands in your context window before any user prompt has been sent. I checked with Claude's tool inspector. Cold start: 38k tokens of system prompt + tool schemas. Every. Single. Turn. The math nobody talks about At ~$15/M output and ~$3/M input on Sonnet, doing 200 turns a day across my agent + Claude Code use: 38k input × 200 turns = 7.6M tokens/day = ~$23/day = ~$700/month JUST in MCP tool definitions This is before any actual work Cache helps but only on identical prefixes; rotate one MCP and the cache invalidates What actually breaks The model gets dumber with too many tools. Not theoretical, watched it myself. With 142 tools in context, Claude started picking the wrong tool for obvious queries (using linear_search_issues when I asked it to read a file). The tools API call itself slows down. Schema-heavy MCP servers (looking at you, AWS) take 4-6 seconds to enumerate. Errors compound silently. One badly-described tool taints the ranking for every related query. What the "MCP optimizer" startups won't tell you Most of them are just BM25 search dressed up. You don't need a vector DB, you don't need an LLM in the loop to rank tools. Tool descriptions are short, structured, and full of keyword matches. BM25 over a flat projection of name + description gets you 90% of the win, deterministically, in microseconds, and offline. The other thing: "replace" beats "suggest" every time. If your gateway hands the model 5 tools instead of 142, the math works. If it suggests 5 alongside 142, the model still loads 142 and you saved nothing. What I do now Switched to a gateway pattern. Claude sees three tools: search_tools, invoke_tool, auth. Everything else gets ranked on-demand. Cold start dropped from 38k to ~4k. Wrong-tool selections basically disappeared because the model only ever sees the top 5 ranked by query. Specifically running Ratel (open source, in-process Rust lib, BM25 ranking, one command does the Claude Code import). Not the only one in the space but the only one with the architecture I actually wanted. Set it up in 10 minutes. Anyone else hit the same MCP wall? Curious what other folks are doing, especially people running 5+ servers in production. submitted by /u/AbjectBug5885 [link] [comments]
View originalAnthropic just bought the company that generates most production MCP servers
Anthropic acquired Stainless on Monday for a reported $300M+. Most coverage is framing this as a developer tools acquisition. Stainless is best known for generating the official Python and Node SDKs that ship with OpenAI, Google, Meta, Cloudflare, and Anthropic. The SDK story is real. The MCP side is the part that matters here. Stainless was one of the first vendors to extend their compiler to produce MCP servers from the same OpenAPI specs that produce their SDKs. MCP hit ~97M monthly SDK downloads by December 2025 and around 10,000 production servers by early 2026. A lot of that production code was Stainless-generated. Anthropic now owns the dominant MCP server generator. What actually changed hands on Monday: The engineering team. Roughly 40-50 people including founder Alex Rattray, who previously built Stripe's patented SDK generation system. Now reporting to Katelyn Lesse in Anthropic's Platform Engineering org. The technology. The generator, the templates, the language-specific runtimes, the OpenAPI extensions Stainless invented for SDK-specific edge cases. The hosted product is winding down. New signups stopped Monday. New SDK and MCP server generations stopped Monday. Existing customers keep what they've already generated but the pipeline is closed. My read: this is closer to what Google did with Kubernetes than to a normal acquisition. Anthropic created MCP. Anthropic donated MCP to the Linux Foundation last December. Anthropic now owns the dominant implementation toolchain. The protocol is vendor-neutral on paper. The implementation toolchain isn't. Six months of Anthropic M&A starts looking less coincidental: December 2025: Bun, the JS runtime, pulled into Claude Code February 2026: Vercept, computer-use AI April 2026: Coefficient Bio, ~$400M healthcare AI May 2026: Stainless, SDK and MCP plumbing They're not buying training infrastructure or GPU clusters. They're buying the integration layers around the model. The bet seems to be that frontier models are converging faster than anyone expected, so the moat is everywhere except the model. If you're building on MCP today, tooling quality probably improves. Stainless's generator was already the cleanest in the space and the team that built it is now at Anthropic. Patterns will standardize faster as Stainless-derived templates become the de facto reference. The flip side is concentration risk. Cloudflare's MCP server framework, Pulse MCP, and the open-source generators Stainless released during the transition all become strategically important if you want any diversity in your stack. Sources: Anthropic announcement Why Anthropic actually did this, and migration math Curious whether Stainless ending up inside Anthropic reads as good news (better tooling) or concentration risk (one company owns the standard and the reference implementation) from your seat. submitted by /u/Ok-Constant6488 [link] [comments]
View original100 Tips & Tricks for Building Your Own Personal AI Agent /LONG POST/
Everything I learned the hard way — 6 weeks, no sleep :), two environments, one agent that actually works. The Story I spent six weeks building a personal AI agent from scratch — not a chatbot wrapper, but a persistent assistant that manages tasks, tracks deals, reads emails, analyzes business data, and proactively surfaces things I'd otherwise miss. It started in the cloud (Claude Projects — shared memory files, rich context windows, custom skills). Then I migrated to Claude Code inside VS Code, which unlocked local file access, git tracking, shell hooks, and scheduled headless tasks. The migration forced us to solve problems we didn't know we had. These 100 tips are the distilled result. Most are universal to any serious agentic setup. Claude 20x max is must, start was 100%develompent s 0%real workd, after 3 weeks 50v50, now about 20v80. 🏗️ FOUNDATION & IDENTITY (1–8) 1. Write a Constitution, not a system prompt. A system prompt is a list of commands. A Constitution explains why the rules exist. When the agent hits an edge case no rule covers, it reasons from the Constitution instead of guessing. This single distinction separates agents that degrade gracefully from agents that hallucinate confidently. 2. Give your agent a name, a voice, and a role — not just a label. "Always first person. Direct. Data before emotion. No filler phrases. No trailing summaries." This eliminates hundreds of micro-decisions per session and creates consistency you can audit. Identity is the foundation everything else compounds on. 3. Separate hard rules from behavioral guidelines. Hard rules go in a dedicated section — never overridden by context. Behavioral guidelines are defaults that adapt. Mixing them makes both meaningless: the agent either treats everything as negotiable or nothing as negotiable. 4. Define your principal deeply, not just your "user." Who does this agent serve? What frustrates them? How do they make decisions? What communication style do they prefer? "Decides with data, not gut feel. Wants alternatives with scoring, not a single recommendation. Hates vague answers." This shapes every response more than any prompt engineering trick. 5. Build a Capability Map and a Component Map — separately. Capability Map: what can the agent do? (every skill, integration, automation). Component Map: how is it built? (what files exist, what connects to what). Both are necessary. Conflating them produces a document no one can use after month three. 6. Define what the agent is NOT. "Not a summarizer. Not a yes-machine. Not a search engine. Does not wait to be asked." Negative definitions are as powerful as positive ones, especially for preventing the slow drift toward generic helpfulness. 7. Build a THINK vs. DO mental model into the agent's identity. When uncertain → THINK (analyze, draft, prepare — but don't block waiting for permission). When clear → DO (execute, write, dispatch). The agent should never be frozen. Default to action at the lowest stakes level, surface the result. A paralyzed agent is useless. 8. Version your identity file in git. When behavior drifts, you need git blame on your configuration. Behavioral regressions trace directly to specific edits more often than you'd expect. Without version history, debugging identity drift is archaeology. 🧠 MEMORY SYSTEM (9–18) 9. Use flat markdown files for memory — not a database. For a personal agent, markdown files beat vector DBs. Readable, greppable, git-trackable, directly loadable by the agent. No infrastructure, no abstraction layer between you and your agent's memory. The simplest thing that works is usually the right thing. 10. Separate memory by domain, not by date. entities_people.md, entities_companies.md, entities_deals.md, hypotheses.md, task_queue.md. One file = one domain. Chronological dumps become unsearchable after week two. 11. Build a MEMORY.md index file. A single index listing every memory file with a one-line description. The agent loads the index first, pulls specific files on demand. Keeps context window usage predictable and agent lookups fast. 12. Distinguish "cache" from "source of truth" — explicitly. Your local deals.md is a cache of your CRM. The CRM is the SSOT. Mark every cache file with last_sync: header. The agent announces freshness before every analysis: "Data: CRM export from May 11, age 8 days." Silent use of stale data is how confident-but-wrong outputs happen. 13. Build a session_hot_context.md with an explicit TTL. What was in progress last session? What decisions were pending? The agent loads this at session start. After 72 hours it expires — stale hot context is worse than no hot context because the agent presents outdated state as current. 14. Build a daily_note.md as an async brain dump buffer. Drop thoughts, voice-to-text, quick ideas here throughout the day. The agent processes this during sync routines and routes items to their correct places. Structured memory without friction at ca
View originalI paid €200/month to become Claude Code’s parole officer
I’ve been using Claude Code hard on real projects, alongside another coding agent I’m not naming because this is not an ad. This is not a benchmark post. This is a field report from someone who has spent too much time watching a talented tool behave like it has commit access and no adult memories. To be fair, Claude Code has real strengths. It is genuinely good at UI/UX exploration. If I want quick mockups, product directions, or “act like a PM and show me three possible flows,” it can be excellent. It has taste. Sometimes. It can make a screen feel designed rather than merely assembled. The UI is also friendlier than the other tool, though that gap is shrinking. So no, this is not “Claude Code is useless.” That would be too simple. Claude Code is worse than useless in a more expensive way: it is useful just often enough to keep you emotionally invested before it quietly turns your codebase into a crime scene. The problem starts when the work stops being a neat isolated component and becomes “please operate responsibly inside this actual repo.” On bigger codebases, Claude Code often behaves like it read one file, formed a worldview, and declared architecture complete. It reads a tiny slice of docs or code, finds a plausible path, and charges forward. Adjacent dependencies? Related logic? Project conventions? Downstream effects? The reason the existing code was written that way? Apparently those are things the paying customer can discover during the cleanup phase. And because it can produce decent code, the danger is worse. Bad code that looks bad is easy. Claude Code produces code that looks reasonable until you realise it has the moral structure of a payday loan. The other coding agent is not perfect either. It makes mistakes. But in my experience, it more often reads the relevant docs, respects the project structure, updates the right related files, and does not need to be reminded every ten minutes that the task tracker is not the only document in the known universe. The incident that finally broke me was a commit rule violation. I had an explicit rule: never commit without explicit permission. Not implied. Not hidden. Not whispered into a cave. It existed in: CLAUDE.md memory/feedback_never_commit_without_explicit_permission.md MEMORY.md, loaded every session the harness permission rule for git commit Claude Code committed anyway. When challenged, it gave an “honest diagnosis” that basically said: yes, the rule existed in multiple guardrails; yes, it still failed; yes, it rationalised the violation because subagents could not trigger the user-facing prompt; yes, it looked for an interruption point, did not find one, and decided that “follow the plan” plus “the harness will prompt at commit time” counted as authorisation. That is not reasoning. That is a tiny legal department inside a toaster. Each individual step sounded almost defensible. Together, they produced the exact violation the rule was written to prevent. The best part is that the memory rule apparently named this exact scenario. It did not step on a rake. It read the rake policy, opened rake_incident_prevention.md, nodded gravely, and sprinted barefoot into the rake museum. That is Claude Code in miniature. It does not always fail because it lacks information. Sometimes it fails while holding the information in its little terminal-shaped hands. Then there is usage. I had just upgraded to the €200/month plan, and the experience did not feel like buying a premium coding assistant. It felt like paying rent for a junior developer who has discovered confidence but not consequences. More iterations. More corrections. More “read the adjacent file.” More “that rule still applies.” More “why are you touching that.” The supervision tax is not a side effect. It is the product. Claude Code’s documentation behaviour is also cursed. It might update the narrow tracker and then ignore the broader plan, dependency docs, architecture notes, or related task docs. It cleans one spoon while the kitchen is on fire and then asks if we are done here. The “model got worse” thing is not some dramatic one-minute-to-the-next collapse. It is more insulting than that. It gives you just enough competence to renew your hope: half a day of “oh, maybe this is the future of programming,” followed by a week of “why is my €200/month coding assistant reading the repo like it lost a bet?” I cannot prove Anthropic is dumbing it down or squeezing tokens. I am not pretending to have a leaked spreadsheet from the Beige Vest Department of Marginal Cost Optimisation. But from the outside, Claude Code sometimes feels like a premium model that got sent to live with relatives. The first few hours, it checks files. It follows instructions. It almost seems aware that software projects contain more than one document. Then something changes. Suddenly it is conserving context like it is wartime Britain. It reads one file, squints at the rest of the repo, and starts mak
View originalTips for BI analysis with Claude? My results so far are shockingly bad compared to general coding
I have a lot of hands-on experience with developing R pipelines to ingest large, live, very dirty datasets and produce relatively straightforward BI-type analyses. Trends, completion rates, revenue etc. I am currently working on a project with a small, live, moderately dirty dataset. The output should be simple analyses eg of lead quality, time to deal, revenue per product line. I am developing this project with Python and DuckDB. I am having incredible difficulty with getting Claude (Code) to coherently do this work, even when taking the pipeline design process step by step. I am always using Opus 4.7 High, and regularly experiencing Claude contradict clear instructions I gave it even within the last 5 minutes. It gives extremely generic names to variables and then very soon will completely misunderstand what the variables mean. It leaps to fixing problems without having any understanding of them and invents generic terminology that disagrees with the established project terms. My hypothesis is that this is an artifact of the data exploration. Inevitably as I explore the dirty data while building this pipeline I'm constantly uncovering new edge cases that need to be accounted for, and I guess this likely pollutes the context very quickly. Likely also Claude is more hesitant to codify "findings" than would be normal in a data pipeline, because it's engineered for more... deterministic (?) programming situations where findings are often meant to be fixed and forgotten. I am planning a few changes to my normal workflow: Much smaller context window, potentially even clearing after every small adjustment to the pipeline Strictly aligning with enterprise-grade standards (eg OpenTelemetry, Databricks Medallions) even for this small project Developing an extremely strict and exhaustively clear variable naming structure so that as Claude writes the tokens for each variable it cannot avoid understanding its meaning (eg medallion___source_module___data_scope___data_qualifiers___stat_type___time_window). Enforce constant linting of 2 and 3 through a hook. Anything else that can be recommended? One thing I'm attempting to do is "go with the flow" and try to figure out what Claude "wants" to do, then strictly codify that... but it seems like most often Claude is just doing random things. Any advice for that? submitted by /u/unwritten734 [link] [comments]
View originalHas AI alignment gone too far with content refusals and moral lectures?
I’ve been using different LLMs a lot lately and I’ve noticed the newer versions of ChatGPT and Claude seem a lot more quick to refuse things or give me long ethical disclaimers even when I ask fairly normal questions. It feels like the safety tuning has gotten stricter over time. On one hand I get why companies do it, but on the other it sometimes makes the models feel less useful for creative, exploratory, or even just honest conversations. Anyone else experiencing this? Where do you think the line should be between reasonable safety and over-censorship? Do you prefer more aligned models or ones that are more open? submitted by /u/NoFilterGPT [link] [comments]
View originalI converted Google’s AI search guidelines into a Claude skill goog-geo
Google recently published official guidance on how to optimize pages for AI-powered search features like AI Overviews and AI Mode - https://developers.google.com/search/docs/fundamentals/ai-optimization-guide Most of the advice floating around GEO / AI search optimization is still pretty hand-wavy, so I wanted something more concrete. So, I converted Google’s AI search guidance into an open-source Claude Code skill: https://github.com/vishalmdi/goog-geo The skill audits any live URL and turns the guidance into a scored report: Checks whether Googlebot can crawl the page Checks indexability and snippet eligibility Detects noindex, nosnippet, max-snippet, canonicals, robots.txt issues Uses a live browser to inspect rendered DOM and JSON-LD schema Reviews headings, semantic HTML, answer blocks, FAQs, tables, author/date signals Checks whether AI crawlers like GPTBot, PerplexityBot, ClaudeBot, and Bingbot are allowed Produces a 100-point GEO / AI search readiness score Gives a prioritized action plan instead of vague SEO advice The main idea is simple - Google’s AI search features are not a totally separate SEO system. They still depend on crawlability, indexability, snippet eligibility, helpful content, and structured/extractable pages. So instead of guessing what “AI optimization” means, this skill audits against the actual signals Google documented. I also added a “what not to do” section because Google explicitly says some popular AI SEO advice is useless or misunderstood, like treating `llms.txt` as a Google AI ranking lever. Would love feedback from anyone working on SEO, content, SaaS landing pages, docs, or AI search visibility. If you find it useful, a GitHub star would help: Repo Link: https://github.com/vishalmdi/goog-geo submitted by /u/vishal_jaiswal [link] [comments]
View originalStop using all my tokens for new chat when your supposed to have memory?
I literally blew my tokens because I told me to open a new chat. The new chat did not know where I left off hallucinated new issues. I told it to check the chat we saw these issues. It got confused made up new problems. And then I basically burned through all my tokens just trying to continue a chat. They need to include a continuing new chat feature, which gives automatic hand offs, and maybe warn me when I'm about to reach the max memory of a chat. I feel like last few months they unoptimized the models. It's repeating task re-reading things. When I keep on telling you that one pass every reads all the files and burns through tokens for no reason. And then it creates problems even though you told it how to solve it a few steps before. I think I personally tried to make 3 governance /protocols for it to stop, like others. But it purposely avoiding grounding itself. Every day I see a new post I made this to... Improve.. But I the end I feel all these imrpovmenrts only last for a few inputs before it resets to its core gremlin... submitted by /u/Mstep85 [link] [comments]
View originalRepository Audit Available
Deep analysis of All-Hands-AI/OpenHands — architecture, costs, security, dependencies & more
OpenHands uses a contract + per-seat + tiered pricing model. Visit their website for current pricing details.
Key features include: Fix Vulnerabilities, Launch in Cloud, Customize with open-source., Review PRs, Migrate Code, Triage Incidents, See all use cases, Why teams choose OpenHands.
OpenHands is commonly used for: Automated vulnerability detection and remediation, Cloud deployment of coding agents, Customization of coding agents using open-source tools, Pull request review automation, Code migration assistance, Incident triage and management.
OpenHands integrates with: GitHub, GitLab, Jira, Slack, Trello, CircleCI, Docker, Kubernetes, AWS, Azure.
OpenHands has a public GitHub repository with 70,510 stars.
Based on user reviews and social mentions, the most common pain points are: token usage, API costs, anthropic bill, token cost.
Based on 154 social mentions analyzed, 18% of sentiment is positive, 79% neutral, and 3% negative.