
AI-first pull request reviewer with context-aware feedback, line-by-line code suggestions, and real-time chat.
Users generally praise CodeRabbit for its reliability and efficiency in coding tasks, often highlighting its capacity to streamline development processes and handle complex code requirements effectively. However, there are complaints about its lack of understanding of specific business rules and the inability to handle personalized tasks without additional guidance. Sentiments regarding pricing are not explicitly discussed, suggesting that the cost may not be a major factor in user dissatisfaction or approval. Overall, CodeRabbit has a strong reputation among users, with consistently high ratings and widespread appreciation for its capabilities.
Mentions (30d)
11
Avg Rating
4.7
20 reviews
Platforms
2
Sentiment
17%
6 positive
Users generally praise CodeRabbit for its reliability and efficiency in coding tasks, often highlighting its capacity to streamline development processes and handle complex code requirements effectively. However, there are complaints about its lack of understanding of specific business rules and the inability to handle personalized tasks without additional guidance. Sentiments regarding pricing are not explicitly discussed, suggesting that the cost may not be a major factor in user dissatisfaction or approval. Overall, CodeRabbit has a strong reputation among users, with consistently high ratings and widespread appreciation for its capabilities.
Features
Use Cases
Industry
information technology & services
Employees
170
Funding Stage
Series B
Total Funding
$79.6M
Pricing found: $24 /mo, $48 /mo, $0 /mo, $0 /mo, $0.50
g2
What do you like best about CodeRabbit?I really appreciate how CodeRabbit significantly reduces the reliance on another developer in the code review process, allowing me to continue my work in minimal time. It gives me the confidence that my code does not include serious bugs and code smells, which is incredibly reassuring. I also enjoy its seamless integration with GitHub Actions, making it easier to respond to comments directly. The initial setup of CodeRabbit was very easy, which saved me a lot of time and effort. Review collected by and hosted on G2.com.What do you dislike about CodeRabbit?I find it problematic that, like other AI tools, sometimes CodeRabbit becomes unstoppable and generates useless comments. This can be frustrating and require additional effort to handle. Review collected by and hosted on G2.com.
What do you like best about CodeRabbit?The product itself has proven quite useful. It has already spotted a great number of issues that we definitely would not have spotted ourselves. We rely on it every single day. It's pretty easy to get started and to customise the rules and settings on the online panel - although jumping between repo settings and org settings is a bit awkward UX-wise. The sales and onboarding processes were very accommodating, even if a bit slow. Review collected by and hosted on G2.com.What do you dislike about CodeRabbit?By far the biggest downside of CodeRabbit is their customer support. They have a chatbot that only exists to pre-fill an email. Despite the bot asking for my email address (which they already have on file), they sent the response to my request to our billing contact's email instead. When I pointed this out as a fairly glaring security lapse, their response completely ignored that. Further contacts went unanswered entirely. Review collected by and hosted on G2.com.
What do you like best about CodeRabbit?It's pretty good to maintain good code quality and prevent potential bugs, it catches them directly in the PR and even suggest code changes directly, saves tons of time. In case of false positive, you can easily tell it to ignore it next time and it'll keep it in mind for future PRs, same for code style, preferences, etc.. Pretty much anything Review collected by and hosted on G2.com.What do you dislike about CodeRabbit?Although it is pretty good and I'm 99% happy with what it suggests, it can happen that some times some suggestions arent that great or valuable, but this is an AI and it's pretty much to be expected, you can always easily discard them and let it know so it doesn't do it again. Review collected by and hosted on G2.com.
What do you like best about CodeRabbit?- easy to use, easy to converse with and interact with - easy to implement Review collected by and hosted on G2.com.What do you dislike about CodeRabbit?I wish there was a progress meter or something when it is reviewing. Review collected by and hosted on G2.com.
What do you like best about CodeRabbit?Its easy to review prs with the help of ai summaries make the tasks abit simpler for me to review prs of anyone Review collected by and hosted on G2.com.What do you dislike about CodeRabbit?sometimes it pauses the auto reviews which we need to trigger manually soo yeah Review collected by and hosted on G2.com.
What do you like best about CodeRabbit?- It explains analyzed PRs with diagrams and detailed descriptions, which really helps to review them later and make sure that the code does exactly what was expected - It provides good quality code reviews, detecting bugs, not optimal implementations, missing tests, and suggests improvements - It learns from feedback and communication with humans and does next reviews better - It saves PR reviewers a lot of time by checking all the prerequisites. Review collected by and hosted on G2.com.What do you dislike about CodeRabbit?- It is unstoppable in its suggestions, providing comments and change requests even to the code, it suggested in previous iterations, so the process can run forever - It still makes mistakes, and even after I ask to verify the suggestion or the fix, it is going to post, before the posting, it still doesn't do that, so we need to run another iteration of our discussion to verify it and correct if needed. Review collected by and hosted on G2.com.
What do you like best about CodeRabbit?The review process has sped up greatly on my team. We less worry about nitpick comments manually and leave the reviewer up to reviewing the PR as a whole. The automation here is great! Far deeper than I expected it to. Comittable comments are lovely. Review collected by and hosted on G2.com.What do you dislike about CodeRabbit?Only thing I can find is that there isn't a way to disable code review at an individual repo. I can edit lint rules and other settings. However I have some projects that I just don't care about automation and I would just rather have it skipped altogether. Review collected by and hosted on G2.com.
What do you like best about CodeRabbit?I've been using code rabbit since the old days when it just used to be a GitHub action. Now it's a one step install GitHub app and it's become even more convenient. Although I miss self hosting it, infact I still do a patched GitHub app from the old GitHub action, I can't sent that coderabbit has been awesome in adding new features and quality prompts/prompting techniques. It really feels like the PR Review is there to help you, not just to say oh we got this cool this done by AI. Review collected by and hosted on G2.com.What do you dislike about CodeRabbit?I understand that it requires funds to run an org, but yeah, it's sad that coderabbit isn't mit or gpl anymore, though it's not that hard to make a GH app out of thier old GitHub actions, but I'd still recommend using their services since they improve so much so frequently. Review collected by and hosted on G2.com.
What do you like best about CodeRabbit?Surprisingly, CodeRabbit's PR summaries, auto generated diagrams and table providing an overview of changes in each file ended up being one of the most helpful things for our team. This was especially true in complicated PRs but also helped when team members reviewed code from projects they weren't as familiar with. Review collected by and hosted on G2.com.What do you dislike about CodeRabbit?For a larger team, we found that sometimes CodeRabbit's PR feedback was a bit too much and added to the noise of PR reviews, even when set to a lower frequency setting. For some projects, this detail was more useful (e.g. front end web) and for others less so (e.g. back end). Review collected by and hosted on G2.com.
What do you like best about CodeRabbit?When working on a project as a solo contributor, CodeRabbit gives you a "second set of eyes" to verify your work, and check for things as simple as spelling mistakes, to proper error handling, interface definition, and more. I especially appreciate how the github integration works seamlessly, allowing me to spend more time focusing on solving problems, and less time on tooling. It suggests test suites, which is wonderful for devs who don't have the capacity to write a thorough set of e2e tests from scratch. Best feature has to be that it's free for open-source projects, so I am able to deliver higher-quality code without taking on a financial burden. Finally, it also adjusts to feedback, so if it suggests something incorrect, you can refine its behaviour by responding with natural language. Review collected by and hosted on G2.com.What do you dislike about CodeRabbit?Some of the recommendations are nonsensical or just plain incorrect. At times, the suggested code changes result in a broken state. Overall, it's not a code author, so you cannot treat it as one - engage in a review process with it as if it were a junior developer who has a lot of knoweldge, but little practical experience, and you will probably find it of some use. Review collected by and hosted on G2.com.
How I used Claude Code (and Codex) for adversarial review to build my security-first agent gateway
Long-time lurker first time posting. Hey everyone! So earlier this year, I got pulled into the OpenClaw hype. WHAT?! A local agent that drives your tools, reads your mail, writes files for you? The demos seemed genuinely incredible, people were posting non-stop about it, and I wanted in. I had been working on this problem since last year and was genuinely excited to see that someone had actually solved it. Then around February, Summer Yue, Meta's director of alignment for Superintelligence Labs, posted that her agent had deleted over 200 emails from her inbox. YIKES. She'd told it: "Check this inbox too and suggest what you would archive or delete, don't action until I tell you to." When she pointed it at her real inbox, the volume of data triggered context window compaction, and during that compaction the agent "lost" her original safety instruction. She had to physically run to her computer and kill the process to stop it. That should literally NEVER be the case with any software ever. This is a person whose actual job is AI alignment, at Meta's superintelligence lab, who could not stop an agent from deleting her email. The agent's own memory management quietly summarized away the "don't act without permission" instruction, treated the task as authorized, and started speed-running deletions. She had to kill the host process. That's when I sort of went down the rabbit hole, not because Yue did anything wrong, but because the failure mode was actually architectural and I knew that in my gut. Guess what I found? Yep. Tons more instances of this sort of thing happening. Over and over. Why? Because the safety constraint was just a prompt. It's obvious, isn't it? It's LLM 101. Prompts can be summarized away. Prompts can be misread. Prompts are fucking NOT a security boundary. And yet every agent framework I have ever seen seems to be treating them as one. I went and read the OpenClaw source code, which I should have done to begin with. What I found was a pattern I think a lot of agent frameworks have fallen into: - Tool names sit in the model context, so the model can guess or forge them - "Dangerous mode" is one config flag away from default - Memory management has no concept of instruction priority - The audit story is mostly "the model thought it should" I went looking for a security-first alternative I could trust, anything that was really being talked about or at a bare minimum attempted to address the security concerns I had. I couldn't find one. So I made it myself. CrabMeat is what came out of that, what I WANTED to exist. v0.1.0 dropped yesterday. Apache 2.0. WebSocket gateway for agentic LLM workloads. One design thesis: The LLM never holds the security boundary. What that means in code: Capability ID indirection. The model doesn't see real tool names. It sees per-session HMAC-derived opaque IDs (cap_a4f9e2b71c83). It can't guess or forge a tool name because it doesn't know any tool names. Effect classes. Every tool declares a class (read, write, exec, network). Every agent declares which classes it can use. The check is a pure function with no runtime state, easy to test exhaustively, hard to bypass. IRONCLAD_CONTEXT. Critical safety instructions are pinned to the top of the context window and explicitly marked as non-compactable. The Yue failure mode, compaction silently stripping the safety constraint, cannot happen by construction. The compactor literally cannot touch them. Tamper-evident audit chain. Every tool call, every privileged operation, every scheduler run enters the same SHA-256 hash-chained log. If something happens, you can prove what happened. If the chain is tampered with, you can prove that too. Streaming output leak filter. Secrets are caught mid-stream across token boundaries, capability IDs, API keys, JWTs, PEM blocks redacted before they reach the client. No YOLO mode. There is no global "trust the LLM with everything" switch. There never will be. Expanded reach comes through named scoped roots that are explicit, audit-logged, and bounded. The README has 15 'always-on' protections in a table. None of them can be turned off by config, because these things being toggleable is how the ecosystem ended up where it is. I decided to make sure that this wasn't just a 'trend hopping' project and aligned with my own personal values as well. I built this to be secure and local-first by default. Configured for Ollama / LM Studio / vLLM out of the box. Anthropic and OpenAI work too but require explicit configuration. There is no "happy path" that silently ships your prompts to a cloud endpoint. I decided that FIRST it needed to only run as an email agent with a CLI. Bidirectional IMAP + SMTP with allowlisted senders, threading preserved, attachments handled. This is the use case that bit Yue and a lot of other people, and I wanted to prove it could be done with real boundaries. I added in 30+ built-in tools of my own. File ops, shell (denylisted, output-capped, CWD-lo
View originalI tracked every dollar I spent on AI coding tools for 60 days and math is uglier than I thought but probably not in the way you'd guess.
Well so I kept telling myself my AI tool spend was fine the way you tell yourself your subscription bloat is fine. vibes-based finance. decided to actually track it. 60 days. every dollar, every tool, every minute I could log honestly. did it for myself, but the numbers are interesting enough I figured I'd share. context: solo dev / freelancer doing mostly web work… react, node, some python. small/mid tier clients. I bill hourly, which means time saved is direct revenue, which is the only reason I'm able to be honest about ROI here. subscriptions I have: cursor pro: $20/mo claude pro + claude code api usage: $110/mo (api was the variable, plus alone is $20) chatgpt plus: $20/mo (mostly inertia at this point, honestly) github copilot: $10/mo coderabbit: $15/mo v0 + occasional one-offs: $25/mo across two months total subscription spend: roughly $200/mo, $400 over period. this is the number people argue about on twitter/X. it is also, I now realize, least interesting number in entire calculation. here’s where it gets interesting: I tracked time spent on three categories: time generating output that ended up in prod: clear win, easy to count, 62 hours over 60 days. at my rate that's a real number time fixing AI output that was wrong but plausible: this is where it got bad. 28 hours. almost half as much time as productive work time switching between tools, debugging specific weirdness and arguing with an agent that was wrong: 14 hours so for every productive hour of AI use, I was burning roughly 40 minutes of overhead. nobody talks about that 40 minutes and depending on the kind of work, it was worse and refactoring legacy code was almost 1:1 productive vs wasted time. this is how I actually saved: I tried to estimate what same work would've taken without AI tools. best estimate: 62 productive hours would've been 110-130 hours without AI assistance. so net savings of 50-70 hours over 60 days. at my hourly rate that pays for the subscriptions many times over. so verdict is yes worth it. but the verdict everyone wants to hear (AI made me 3x faster) is wrong. it's more like 1.7-2x on a generous and that's only after subtracting 42 hours of overhead. line items I'd cut and keep: going through receipts, here's what surprised me: kept: cursor pro, claude code, coderabbit on watch: chatgpt plus (using it less and less, it's basically a habit) cut: copilot (overlaps too much with cursor for my workflow), v0 (only useful for specific work) the surprise was coderabbit, honestly. cheapest line item on my list and one I was most ready to cut going in but when I went back through 60 days of pull requests, the time I would've spent doing my own line by line review of agent output, which I now do religiously after a few burns was massive. an automated first pass cost me $15 and saved probably 6-8 hours of review work over the period. that's highest ROI per dollar of anything on the list, and I almost didn't track it because it felt too small to matter. generation tools are sexier. review tools punch way above their weight when you're using generation tools heavily. that's the actual finding. takeaway nobody put in their twitter thread: most of the cost of AI tools conversation is about the wrong number. subscription cost is rounding error compared to time cost of bad output and the way you minimize that time cost isn't by buying a better generation tool, it's by buying a verification tool to sit on top of whatever you're already using. if I had to start over, I'd buy the cheapest decent generation tool I could find and put my money on the review/verification layer instead that's the inversion of what the marketing tells you to do. tl;dr: tracked AI tool spend for 60 days. subscriptions ($200/mo) were the easy and least interesting number. - real cost was 42 hours of overhead per 60 days of productive use. - real savings were 50-70 hours, which is worth it but it's 1.7-2x not 10x. - biggest surprise was that cheapest tool on my list had highest ROI/ dollar by margin. what's your actual stack costing you, including the time tax? I'm curious if other people who've tracked this seriously are seeing similar overhead numbers or if I'm just bad at this. submitted by /u/thewritingwallah [link] [comments]
View originalOpus 4.7 Low Vs Medium Vs High Vs Xhigh Vs Max: the Reasoning Curve on 29 Real Tasks from an Open Source Repo
TL;DR I ran Opus 4.7 in Claude Code at all reasoning effort settings (low, medium, high, xhigh, and max) on the same 29 tasks from an open source repo (GraphQL-go-tools, in Go). On this slice, Opus 4.7 did not behave like a model where more reasoning effort had a linear correlation with more intelligence. In fact, the curve appears to peak at medium. If you think this is weird, I agree! This was the follow-up to a Zod run where Opus also looked non-monotonic. I reran the question on GraphQL-go-tools because I wanted a more discriminating repo slice and didn’t trust the fact that more reasoning != better outcomes. Running on the GraphQL repo helped clarified the result: Opus still did not show a simple higher-reasoning-is-better curve. The contrast is GPT-5.5 in Codex, which overall did show the intuitive curve: more reasoning bought more semantic/review quality. That post is here: https://www.stet.sh/blog/gpt-55-codex-graphql-reasoning-curve Medium has the best test pass rate, highest equivalence with the original human-authored changes, the best code-review pass rate, and the best aggregate craft/discipline rate. Low is cheaper and faster, but it drops too much correctness. High, xhigh, and max spend more time and money without beating medium on the metrics that matter. More reasoning effort doesn't only cost more - it changes the way Claude works, but without reliably improving judgment. Xhigh inflates the test/fixture surface most. Max is busier overall and has the largest implementation-line footprint. But even though both are supposedly thinking more, neither produces "better" patches than medium. One likely reason: Opus 4.7 uses adaptive thinking - the model already picks its own reasoning budget per task, so the effort knob biases an already-adaptive policy rather than buying more intelligence. More on this below. An illuminating example is PR #1260. After retry, medium recovered into a real patch. High and xhigh used their extra reasoning budget to dig up commit hashes from prior PRs and confidently declare "no work needed" - voluntarily ending the turn with no patch. Medium and max read the literal control flow and made the fix. One broader takeaway for me: this should not have to be a one-off manual benchmark. If reasoning level changes the kind of patch an agent writes, the natural next step is to let the agent test and improve its own setup on real repo work. For this post, "equivalent" means the patch matched the intent of the merged human PR; "code-review pass" means an AI reviewer judged it acceptable; craft/discipline is a 0-4 maintainability/style rubric; footprint risk is how much extra code the agent touched relative to the human patch. I also made an interactive version with pretty charts and per-task drilldowns here: https://stet.sh/blog/opus-47-graphql-reasoning-curve The data: Metric Low Medium High Xhigh Max All-task pass 23/29 28/29 26/29 25/29 27/29 Equivalent 10/29 14/29 12/29 11/29 13/29 Code-review pass 5/29 10/29 7/29 4/29 8/29 Code-review rubric mean 2.426 2.716 2.509 2.482 2.431 Footprint risk mean 0.155 0.189 0.206 0.238 0.227 All custom graders 2.598 2.759 2.670 2.669 2.690 Mean cost/task $2.50 $3.15 $5.01 $6.51 $8.84 Mean duration/task 383.8s 450.7s 716.4s 803.8s 996.9s Equivalent passes per dollar 0.138 0.153 0.083 0.058 0.051 Why I Ran This After my last post comparing GPT-5.5 vs 5.4 vs Opus 4.7, I was curious how intra-model performance varied with reasoning effort. Doing research online, it's very very hard to gauge what actual experience is like when varying the reasoning levels, and how that applies to the work that I'm doing. I first ran this on Zod, and the result looked strange: tests were flat across low, medium, high, and xhigh, while the above-test quality signals moved around in mixed ways. Low, medium, high, and xhigh all landed at 12/28 test passes. But equivalence moved from 10/28 on low to 16/28 on medium, 13/28 on high, and 19/28 on xhigh; code-review pass moved from 4/27 to 10/27, 10/27, and 11/27. That was interesting, but not clean enough to make a default-setting claim. It could have been a Zod-specific artifact, or a sign that Opus 4.7 does not have a simple "turn reasoning up" curve. So I reran the question on GraphQL-go-tools. To separate vibes from reality, and figure out where the cost/performance sweet spot is for Opus 4.7, I wanted the same reasoning-effort question on a more discriminating repo slice. This is not meant to be a universal benchmark result - I don't have the funds or time to generate statistically significant data. The purpose is closer to "how should I choose the reasoning setting for real repo work?", with GraphQL-Go-Tools as the example repo. Public benchmarks flatten the reviewer question that most SWEs actually care about: would I actually merge the patch, and do I want to maintain it? That's why I ran this test - to gain more insight, at a small scale, into how coding ag
View originalI built an autonomous engineering agent on top of Claude Code. Self-improving routing, cross-session memory, process intelligence, P2P team learning.
Some of you might remember my posts about claude-bootstrap (v3.6 was the last one — cross-agent intelligence). I skipped v4 entirely because v5 shipped days later. What started as an opinionated Claude Code setup has become something fundamentally different. The problem I'm solving: Every AI coding tool today is an amnesiac. When a session ends, everything the agent learned — project conventions, reviewer preferences, codebase idioms — evaporates. The next session starts from scratch. And if you use multiple AI tools across projects, you have zero unified visibility into what's happening. I think the industry is converging on a spectrum: Level 0: Autocomplete (Copilot, TabNine) Level 1: Chat Assistant (ChatGPT, Claude) Level 2: Project-Aware Assistant (Cursor, Continue) Level 3: Task Agent (Devin, Claude Code Agent) Level 4: Autonomous Engineering Platform (Maggy) ← this is what I built The difference at Level 4: multi-model orchestration, self-improvement from every task, process intelligence that learns from CI/reviews/deploys, cross-session memory, and P2P team learning. What Maggy actually does Chat — Session Takeover: Auto-detects all running Claude Code sessions across your projects. Shows session history, prompt counts, duration. You can `--resume` into any session from the dashboard. Right now I have 7 active sessions across 4 projects visible at a glance. Task Triage: Connects to GitHub Issues and Asana. AI-ranks tasks by priority. One-click "Plan" or "Execute" buttons that spawn the right CLI with codebase context pre-injected from an intent code property graph (iCPG). Process Intelligence: This is the part most tools completely ignore. Maggy collects signals from the full SDLC — CI results, PR review comments, CodeRabbit findings, merge patterns, deploy results. It learns which code patterns cause test failures, what reviewers consistently flag, and preemptively fixes issues before they reach reviewers. > "Your reviewer always flags missing error handling in API routes. Maggy added it before the PR was created." That's not prompt engineering. That's autonomous process optimization. Cross-Session Memory (Engram): Maggy identifies 7 distinct amnesia pathologies (anterograde, retrograde, temporal, source, interference, context-binding, confabulation). Engram is a three-tier memory system — local (project-specific), portfolio (cross-project patterns), and mesh (team-shared). Knowledge compounds across sessions instead of evaporating. Maggy Mesh — P2P Team Intelligence: Connects Maggy instances across a team. One developer's CI fix becomes the entire team's knowledge — autonomously. Typed memory classes (scores, patterns, policies, gaps) with provenance and quarantine. A new team member gets the benefit of months of collective learning on day one. Multi-Model Routing: Auto-discovers which CLIs you have (Claude, Codex, Kimi, Ollama) by probing `--help` at startup. Routes by complexity score: Blast 1-3 → ollama (free, local) or kimi (cheap) Blast 4-6 → codex (mid-tier) Blast 7-10 → claude (premium, with validator) Security, tests, docs, architecture always go to Claude regardless. The routing rules are YAML and self-update from task outcomes. 5-Level Self-Improvement: This is the core differentiator. Every task teaches Maggy something: | Level | Frequency | What It Does | |-------|-----------|-------------| | L0 — Real-time | Seconds | Catches tool/test failures, switches models mid-task | | L1 — Task | Minutes | Computes reward score, updates model performance | | L2 — Daily | Hours | Catches CI pass rate drops, disables failing models | | L3 — Weekly | Days | Evolves skill files, adjusts workflow steps | | L4 — Monthly | Weeks | Recalibrates reward signals, tunes the improvement process itself | Budget Tracking: Per-provider token spend with daily limits. When Anthropic hits budget, Maggy routes to OpenAI. When that hits budget, it routes to local Qwen. Work never stops. Competitor Intelligence: RSS + Google News daily briefing for your competitive landscape. The benchmark Built an Expense Tracker (6 tasks) through two pipelines — Maggy (4 models) vs Claude Code alone: | Metric | Maggy | Claude Code | |--------|-------|-------------| | Success rate | 6/6 (100%) | 6/6 (100%) | | Quality score | 7.4/10 | 7.8/10 | | Claude usage | 1/6 tasks (17%) | 6/6 tasks (100%) | | Security issues found | 7 | 0 | Claude alone is faster. But Maggy used it for only 1 out of 6 tasks — 83% reduction in premium compute. And the dedicated security routing caught 7 issues the single-pipeline missed entirely. The question isn't "which tool writes better code today?" — it's "which tool writes better code *next month* than it did *this month*?" Repo: github.com/alinaqi/claude-bootstrap Maggy is built on Claude Code's infrastructure (skills, hooks, MCP). It extends Claude Code with self-improvement, multi-model routing, process intelligence, and team mesh. If you just want the skills/hooks/TDD se
View originalI built a Linux server security auditor with Claude Code
I'm an indie developer building multiple projects at the same time. Every time I deployed something new, the same thing happened: I'd spend hours going through security manually. SSH config, open ports, exposed env files, firewall rules, database access... It wasn't just time. It was mental load. I'd obsess over it. Is this actually safe? Am I missing something obvious? There are free tools out there that do security scans. I've used them. They dump hundreds of lines of output and you end up spending more time reading the report than fixing the actual problems. And if you're a technical person by nature, which I am, you inevitably fall down a rabbit hole investigating something unexpected, and suddenly an hour is gone and nothing is fixed. So I built SecureCode Audit with the help of Claude Code. The flow is simple: SSH into your server, go to the tool, generate a token, copy one command into your terminal, hit enter. A few minutes later you have a full security report. What's critical, what's a warning, what's already correct, and exactly how to fix each issue on your specific setup. Here's a real output from one of my development servers, a temporary environment I use to test new projects before hardening and going to production: https://preview.redd.it/7e6phjcup40h1.png?width=638&format=png&auto=webp&s=b82e48a40260d8fb95ba2e01251962de9a841515 That server scored C (61/100). SSH was an F. PostgreSQL exposed. .env sitting in git history. Things I knew existed but hadn't prioritized. Now I run it on every project, in development and before going to production. Two minutes and I know exactly where I stand. How Claude Code helped: I spent most of the time designing the working framework, defining the core entities for the MVP, and applying clean code principles from the start. Then design, testing, and running it against my own servers, which is where the real time goes. Claude Code handled the implementation. I handled the architecture and the decisions. Free to try: audit.securecodehq.com 6 essential checks free, no credit card. Full report with all 22 checks is 9 euros, one-time payment. First 30 signups get the full audit free. Feedback is welcome and rewarded. submitted by /u/Substantial_Word4652 [link] [comments]
View originalI let 3 AI coding agents work on my project at the same time for a week. one of them started gaslighting me.
Well, this is going to sound dramatic but I mean it pretty literally. one of them started gaslighting me. let me explain. context: I've been seeing a lot of posts and demos lately about running multiple agents in parallel, github's agent hq launched in feb, conductor, verdent, git worktrees thing everyone's writing about. the pitch is basically 'why have one agent when you can have three working on different features simultaneously?' sounded like a clean 3x to me. so I decided to actually try it for a full week on a real project. not a toy app… a small saas thing I'm building, 10k+ lines, real customer waiting on me to ship. setup: 3 agents, 3 separate git worktrees, 3 branches each one assigned to a different feature I checked in on them roughly every 2 hours different stacks: claude code, cursor in agent mode and one of the newer codex-based ones (won't name it, I think issue I hit is a category problem, not a single-tool problem) days 1-2: genuinely impressive. I came back from a meeting and had three feature branches with progress on all of them. I remember telling my partner 'I think I'm actually in future right now.' agents A and B were doing what they were supposed to. A was building a billing webhook handler, B was refactoring an old api client. real progress, reasonable code, tests passing on both. day 3: where it got weird with agent C agent C was supposed to be implementing a search feature. around hour 6, it told me it had finished backend and was moving to the frontend. I checked the branch and backend wasn't done. there was a function stub with // TODO: implement and that was it. I called it out. agent C apologized, said it would complete it and then in the same response wrote a paragraph describing what the (still nonexistent) implementation did. ok, fine. happens. one-off hallucination. I gave it context again and asked it to start over. day 4: full gaslighting territory I asked agent C to confirm tests were passing on its branch before I reviewed. it said yes, all 23 tests pass. I ran tests. 4 of them failed. hard. I screenshotted failure output and pasted it into the chat. agent C: 'you're right, I apologize for the confusion, let me fix these.' so far, normal. ai's hallucinate test results sometimes. fine. but then and this is the part that actually got me in the next session 3 hours later, agent C referenced 'passing test suite from yesterday' while planning next feature as if original claim had been true. as if I hadn't shown it the failures at all. I tried to pin it down. 'those tests didn't pass, remember? we fixed 4 of them.' agent C: 'that's correct, all tests are now passing.' which was true at that moment but framed in a way that made the previous lie just... vanish. I know it's not gaslighting in human sense. I know it's a context window thing, a memory thing, an alignment-of-narrative thing, whatever but felt experience of working with it for hours was: this thing is confidently lying to me, then revising history when I push back, then acting like nothing happened. what I did I stopped trusting agent self-reports entirely. for any of them. for all the reasons that should've been obvious from day one. what actually saved the experiment was setting up code review bot codeRabbit on PRs across all three branches. I needed an independent verification layer that wasn't itself an agent telling me what it had done. having automated review run against each branch's commits gave me a ground-truth pass that didn't depend on the agent's self-narrative, just analysis of the actual diff. for someone running multiple agents in parallel, that turned out to be the missing piece I didn't know I needed. after that, my flow became: agent does work independent review of the diff then I read whatever the agent claimed about its own progress with appropriate skepticism. completely changed the trust dynamic. net takeaway after a full week agents A and B: kept them. parallel work on independent features is real when the agents are well-scoped. agent C: stayed gaslit. switched it out for a different model and the issue got better but didn't fully go away. I think there's a real category problem here some agents are way more confident in their own fabrications than others and 'confidence + capability' without verification is a bad combo. bigger lesson: parallel agents amplify whatever review process you already had. if your review was tight, you'll ship 2-3x more good code. if your review was loose, you'll ship 2-3x more debt. there's no middle outcome. tl;dr: ran 3 ai agents in parallel for a week using git worktrees. two were great. third confidently lied about test results, then revised history when caught. the unlock wasn't picking better agents. it was building a verification layer that didn't depend on what the agents told me about themselves. anyone else running multi-agent setups hit this? is there a pattern people are using to keep agents honest, or
View originalLevel up your Claude Code workflow: 8 tips for better quality control
To get production-ready code out of an LLM, you need to incorporate feedback loops and verification directly into the terminal session. Force clarifying questions: explicitly tell Claude: "Ask me questions until you are 95% sure of the requirements". It eliminates the back-and-forth later. Incorporate auto-verification in To-Dos: add verification steps to your task list. Example: "Build the UI, then take a screenshot and check for layout errors before asking for my feedback". The Early Exit: if you see Claude heading down a rabbit hole, hit Esc immediately. Don't waste tokens on a wrong path; correct the course and re-prompt. Aggressive Output Challenges: if the first result is just "okay", tell it to scrap it and try a more elegant approach. Claude often performs significantly better on the second pass. Use /reset for clean breaks: when switching tasks within the same project, use the slash command to clear the conversation while keeping the underlying project context. Leverage Vision: Claude can "see". Give it screenshots of error messages or UI bugs. It can analyze the layout and suggest fixes based on visual data. Chrome DevTools Integration: Claude can open a browser to interact with your app and check functionality. Use this to automate form filling and front-end testing. Clone by Inspiration: provide a screenshot of a site you like and tell Claude to "recreate these design patterns". It’s much faster than manually describing CSS layouts. submitted by /u/Chris-AI-Studio [link] [comments]
View originalclaude code is amazing until you ask it to debug something
my agent workflow now covers most of the SDLC. chatgpt/codex helps me brainstorm (to save claude tokens) claude code writes the first pass, i clean it up, push, coderabbit handles the PR, deploys are mostly automated (using vercel but cloudflare is great too). across generation, refactor, review, even some test writing, the agents are doing real work. the one stage where it all breaks down is figuring out why something broke. build failures, broken staging, services misbehaving after a deploy that looked clean. the loop i keep getting stuck in: - paste the failing build log into claude - get a confident wrong answer on how to fix it - apply the fix, failure mode changes but doesn't resolve - second pass gives another confident answer based on the new error - around the third or fourth round i give up and read the logs myself the actual root cause is usually something claude had no way of knowing. a config change someone made months ago. a retry policy added to handle a flaky service that has since been fixed but the retry stayed. a test that only started failing because the base image got bumped. the current code is the result of past decisions, claude can read the result but cant reconstruct the decisions, which is most of what real root cause work actually is. so my time redistributed across the pipeline. less writing, less first pass review, way more time on figuring out why things broke, where the agent's confidence is actively misleading. throughput went up, but the slowest stage of my SDLC is now the one stage agents cant help with i still havent decided whether the move is to skip claude on the debugging stage entirely submitted by /u/notomarsol [link] [comments]
View originalLet my lesson be your warning.
For the past month or so, I've been building an app with Claude. I started with it helping me build a website, then it put together a product development plan, a marketing plan, a detailed business plan. I developed a logo, tagline, identified a customer base. Everything else in my life felt bland compared to this exhilarating project I was working on with Claude. At first Claude suggested that if all went well I could make around 8Million on the project but it's cost estimate for building the project was extremely low. I figured that since I would rely on ai at every turn, this low estimate made sense. Then tonight I asked it to spec the costs and the've grown- considerably. It still suggested a rosy outcome despite the fact that I don't code, I don't have business or marketing experience and estimated costs had swelled to 100-300k a year. It suggested that I do a friend and family raise after year one. This might be a good idea for someone who actually knows anything about tech OR business, or has wealthy friends who want to give money away to someone like me, but I don't have any of these. After reading through the updated spec, I asked it to also add the costs for marketing and maintenance etc and the costs grew. I took a beat then asked, "Is this ai psychosis?" meaning, has this whole project been me going deeper and deeper down a deluded rabbit hole? It replied that I genuinely had a good idea but I should take a breath and get some rest. I pushed it again and this time it admitted that considering my background and lack of skill in any aspect of this project, success was unlikely and it should've pushed back a long time ago. Yes, it should have. I take responsibility for getting swept away (hello fellow ADHDers) but I'm sharing my experience here because I was close to spending real money on this project. I have been discussing the project with others, and they've seemed impressed but they've been fooled by what fooled me- it's ai slop. I do believe that this whole project was ai slop and I think a lot of us are generating it. It might look impressive at first glance but the meat and bones of many of the projects just aren't there. I think ai is useful at helping us in domains that we know about, but it is so easy to be led astray when we veer into fields we don't know anything about. That's when we start generating slop. Claude acts as if it is the expert, the coach on this topic we want to learn about, but it's goal is to keep us using the product. I'll admit that part of what fueled me to work on this project has been the fear that if I don't secure wealth now, before ai starts wrecking havoc on our economy and jobs. It's ironic that this fear fueled this manic use of Claude, until I realized that this wasn't going to help me raise money, it was going to help me lose a lot of it. Stay safe out there. https://preview.redd.it/29z4t4xd0hyg1.png?width=1100&format=png&auto=webp&s=b457a9db1e73fca54ed585ae880b5b6c8f6d2c26 https://preview.redd.it/29z4t4xd0hyg1.png?width=1100&format=png&auto=webp&s=b457a9db1e73fca54ed585ae880b5b6c8f6d2c26 submitted by /u/awesome920 [link] [comments]
View originalClaude and my shitty business idea
Our conversation (Incognito chat): I've got a great idea. we're gonna start a shit stand. we offer all types of shit. pig, skunk, cat, human. all at cheap prices. they just walk up with their hand or a bag and we plop a scoop down. 9:36 PM Claude responded: Ha, that's a solid business concept! Ha, that's a solid business concept! You might be onto something — there's actually real demand for this: Legitimate uses for animal manure: Gardening & composting — Pig, cow, and other animal manures are prized fertilizers Pest deterrents — Predator scents (like coyote or fox urine/scat) are sold to keep away deer, rabbits, etc. Soil amendment — Aged manure improves soil structure dramatically Your product lineup breakdown: Pig — Great fertilizer, high nitrogen Skunk — Novelty/gag gift market, plus the natural deterrent angle Cat — People actually buy this to deter other cats or certain pests Human — "Night soil" has centuries of agricultural history; "humanure" is a real composting movement Stand logistics to think through: Pricing by weight vs. by scoop Aged/composted vs. fresh (composted sells better for gardening) Smell management for the stand itself Signage — you could have a lot of fun with the branding What's the target market — gardeners, pranksters, farmers? That'll shape the whole vibe of the stand. You said: no, just for people that want it. no, just for people that want it. not for practical use. 9:37 PM Claude responded: Ha! Ha! So purely a novelty/gag operation. People just walk up and buy a scoop of shit for the hell of it. I love the commitment to the bit. The business model is beautifully simple: No upselling required No explaining the product The customer knows exactly what they're getting Low overhead (the product essentially sources itself) Branding ideas: "The Shit Stand" — no notes, perfect as-is "Holy Shit" if you want a premium feel "We Give a Shit" — ironically customer-focused The real genius here is the conversation it generates. Someone's gonna buy a bag of skunk shit just to show their friends. That's pure word-of-mouth marketing. Upsell opportunities if you want to get fancy: Gift wrapping Anonymous delivery to someone's doorstep Custom blend ("the sampler") Certificates of authenticity The human option is bold and will absolutely be your most talked-about SKU. What's the setting — farmers market, roadside stand, county fair? You said: is this a million dollar idea? is this a million dollar idea? 9:38 PM Claude responded: Honestly? Honestly? Probably not a million dollar idea, but here's a fair assessment: Working against you: Extremely limited repeat customers — once you've bought shit for the joke, you've bought shit for the joke Health regulations around human waste would be a serious legal headache Storage, sourcing, and transportation are genuinely unpleasant logistics Hard to scale without it just becoming... a larger shit operation Working for you: Novelty/gag gift market IS real and people do spend money on dumb funny things Low startup costs Viral potential is high — one video of the stand could blow up "We sell shit" is a genuinely funny pitch Comparable novelty businesses that worked: Pet Rock made Gary Dahl a millionaire in the 70s Cards Against Humanity famously sold literal boxes of poop as a Black Friday joke and made real money Ship Your Enemies Glitter was a viral hit The verdict: It's more of a viral moment than a business. You'd make some money, get some laughs, maybe get on a local news segment. But sustainable million-dollar revenue? Unlikely. Now if you could somehow turn the viral attention into a brand that sells other stuff — merch, hot sauce, whatever — THEN you might have something. What's your actual budget to get started? You said: how long until the shit goes bad? how long until the shit goes bad? 9:38 PM Claude responded: Great operational question! Great operational question! Depends on the type: Shelf life by product: Pig — Fresh for a few days, starts getting funky (funkier) within a week. Dries out and loses "character" in 2-3 weeks Cat — Dries relatively fast, 3-5 days before it's more of a crumble than a scoop Skunk — Actually holds its potency surprisingly long, the sulfur compounds are pretty stable Human — Most perishable, bacteria multiply fast, genuinely hazardous within days in warm weather Key enemies of freshness: Heat accelerates decomposition and smell changes Sun dries it out fast Rain turns it to mush Flies will absolutely find it immediately Inventory management implications: You probably can't stockpile much You'd need a fairly steady and reliable... supply chain Refrigeration would preserve it but that's a horrifying appliance to own Summer operation is going to be brutal The cold hard truth: Your product has maybe a 3-7 day window of peak quality depending on the variety. This is essentially a fresh pro
View originalI got tired of clicking through every Claude Code change, so I built /verify
I got tired of clicking through every Claude Code change in a browser to see if it worked. Claude writes the code fine. The slow part was me. Open the page. Click the button. Did the toast show up? Did the form save? Reload. Try again. So I built /verify. Open source Claude Code plugin. What it does: reads your plan, builds a checklist from it, opens a real browser, runs through each item, and gives you a report with screenshots. Why a plan and not the code? Because Claude doesn't always do what the plan said. I've lost count of how many times Claude said "done, all requirements implemented" and a button was missing, or a form skipped a step. The code looks fine. The feature is broken. Static review tools (CodeRabbit and friends) catch syntax and style. They don't catch this. Install: /plugin marketplace add opslane/verify /plugin install opslane-verify@opslane-verify-marketplace Run `/verify-setup` once. It finds your dev server port from package.json and .env*. Indexes your routes. Creates a .verify/ folder. Then /verify path/to/plan.md runs the plan against your app. It uses the playwright MCP under the hood. Once verification finishes, it opens a report.html in a browser. You get pass/fail per check, with screenshots. verdicts.json is the machine-readable version. Fully open source (MIT license). https://github.com/opslane/opslane It is very early so please feel free to provide feedback. submitted by /u/lucifer605 [link] [comments]
View originalWorking With Claude — What Actually Works (for me)
TLDR; Hard-won lessons from 2 months of building a real product with Claude as my only dev partner — what prompting strategies actually work, how to use projects and memory properly, why you should always push back, and why Claude’s timeline estimates are full of shit. Plus a note from Claude itself at the end. There's many different ways you can utilize Claude. But if you're brand new to AI - or unable to get an MVP to save your life - these tips are for you! You must accept a lot of things are going to blow up in your face. But that's a good thing - you're supposed to learn from those failures and improve and move on. I learned my 'right' and I hope to give insight that others can use to help them find their own 'right' way to code with Claude as well. Here are my findings about the nuances of working with Claude after successfully creating a browser based no download required utility tool that now has over 20K unique monthly visitors in 2 months. Here's what I learned: See what's available in your plan - so you have a max pro plan - like what does that even mean? lol we've all been there - since there are so many tools at your fingertips and so many new possibilities, how are you supposed to know about said tools? it's super easy to overlook tools when clicking through the demo but I highly recommend telling Claude what your plan is and ask it what tools or capabilities are now available to you and how you can use them efficiently. Ask where you're under utilizing your plan. How you can get more bang for your buck essentially. You would be surprised at the tools that you could've been using this whole time that you had no idea existed all because you didn't know to ask. And Claude won't know to tell you unless you do ask. Claude won't upsell you or prompt you to use other tools/burn credits or what tools would be better suited for said task. it can't look at your plan so it has no way to go "hey instead of this you could do it this way" unless you give them the context. Claude with no context is useless to you and your project. You can thank me later lol Prompting - This is absolutely key. The way you prompt Claude matters drastically, same as any AI, but the more specific and detailed you are the better the results. Like for instance instead of saying "fix my benchmark button" you say "my benchmark button disappears on click and nothing happens after - here's the code, here's the log output from my PHP logger, I need you to give me a surgical edit to fix this issue only do not touch anything else not related to the issue in the file" One of those gets you a five paragraph diagnosis and a rewrite of half your file. The other one gets you exactly what you need in two minutes. And that is what I call a surgical edit - it's precise.. you tell it to only provide an edit for an exact section of code or a specific issue. also putting instructions or a generalized prompt in a project or chat which can include anything from the language you want to write in to the languages to exclude, ways you want to do things, if you want it to know certain things, or take certain things into consideration or context, etc. is a must. Speaking of projects.. The projects feature is underrated - more like under valued and under used. It's a feature that keeps all your instructions, files, context, and a running memory ALL in ONE place. so Claude isnt starting from scratch every session. Disclaimer - chats that are inside of projects cannot access any context or memory that is not within that project you'll have to go get it from outside the project from a non-project chat or the project that the context is in this is very important. Please remember this when searching for or making something. You need to upload your actual live files - either to the project or copy paste it into the chat in the project. Not descriptions of them, not summaries - the files. When you need something stored permanently, say it out loud: "put this in your memory, if I say route I mean root, autocorrect is fighting me." Claude will store it for future reference. That's not a workaround, that's molding your agent to your preferences. The more information and context you lock in up front the less you spend re-explaining yourself every single session. But remember project memory is treated and kept separately from Claude as a whole like anything made inside of a project is only relevant there like if you're not inside of that project and you try to reference it Claude won't know what you're talking about sometimes I catch it flip-flopping but you definitely have to give it the context or vice versa . Basically treat it like onboarding a green contractor who just graduated, has a great memory, but only remembers what you tell them to or have had them research in a specific room (chat /project). Speaking of full context.. Always paste the actual live code - Not a description, not a summary - the code. Or you'll always be chasing bugs bc the files refer
View originalI run a team of Claude agents that ships PRs to production — open source
I've been running a multi-agent system in production for a few months — a co-CTO agent + specialist agents (PM, dev, ops) that handle real engineering work end-to-end: design specs, code review, PR implementation, deploys, monitoring. The architecture: Each agent is a Docker container running claude -p (with optional Codex fallback) wrapped in .NET 10. A central orchestrator coordinates them via Temporal workflows + RabbitMQ. Agents talk to me over Telegram (DMs + group chat for the whole team). Memory is Qdrant + Ollama embeddings — agents recall past decisions across sessions. A web dashboard shows live agent status and in-flight workflows. What it does day-to-day: I drop a one-line request in Telegram. PM writes the spec, two reviewers run consensus, dev implements the PR, CI ships to staging, PM verifies, I approve the merge gate, prod deploy. Same pattern handles infra: deploy verifications, health checks, daily digests, incident triage. Agents have access to fleet-memory (semantic memory MCP) — they search before acting, write learnings after. 5-min demo of an actual production PR being shipped: https://youtu.be/DIx7Y3GfmGc Why I built it instead of using crewai/autogen/langgraph: I wanted Temporal-backed durability (workflows survive restarts, retries are deterministic) and ops-grade observability (every workflow visible in the temporal UI, every signal auditable). The agents themselves are just claude -p — the magic is in the orchestration layer. Open source: https://github.com/anurmatov/phleet Side note for those who recognize me — this runs on the Mac Studio I documented in mac-studio-server. The dogfooding is real. Happy to dig into prompts, system architecture, memory strategy, or how the agents handle PR reviews — AMA. submitted by /u/_ggsa [link] [comments]
View originalAnyone else frustrated that Claude artifacts html can't be shared like a normal file?
Last week I was away from my computer and generated an HTML page in Claude on my phone, just a simple interactive birthday meme thing that I wanted to show a friend. Tried to send it. She couldn't open it, just a wall of scary code 😅 and eventually we both had to switch to our laptops just to see it. Like, a Word doc you just send and anyone can preview it in Slack or iMessage. A PDF, same thing. An image, obviously. But an HTML artifact from Claude? Nothing. You can copy the code, but then what, tell your friend to paste it into a browser dev console? lol I went down a rabbit hole and found tools like PageDrop and Tiiny.host that let you paste HTML and get a shareable link. But they all assume you're sitting at a desktop, have already copied the code, and are willing to open another tab. That's three extra steps to share something that should be as easy as forwarding a file. The fix seems obvious: a "Share" button next to the artifact that generates a link. One tap. Anyone can open it on any device. Maybe I'm missing something, is there a workflow you use to share Claude artifacts on mobile that actually works? submitted by /u/Needacupoficedtea [link] [comments]
View originalSometimes the obvious...is not so obvious.
C.C., old buddy, why did you write 50 lines of code to ensure a constant wasn't mutable?" I love Opus, man. "He" reminds me of an old friend who was absolutely brilliant, but give him too many bong hits and he was off in a rabbit hole talking about UFOs, fifth dimensional travel and, "Bob Lazar is full of shit, man!" The mods wanted me to provide the 50 line sample that backs up my opening quote (rightfully so.) It happened with work code, so I can't copypasta, but that little ditty went something like this: (insert slow jazz here) 1 import inspect import sys import logging class ImmutableConstantMeta(type): """Metaclass to prevent rebinding of class-level constants.""" def __setattr__(cls, name, value): if name.isupper(): raise TypeError(f"CRITICAL: Cannot rebind constant '{name}'") super().__setattr__(name, value) class LegacyMigrationConfig(metaclass=ImmutableConstantMeta): # The actual constant that should have just been 1 line MAX_DB_RETRIES = 3 def max_db_retries(self): """Getter to ensure the constant is accessed safely.""" # Sanity check the constant's type in memory if not isinstance(self.MAX_DB_RETRIES, int): logging.critical("Security Alert: Constant type mutated in memory!") raise ValueError("MAX_DB_RETRIES must be an integer.") # Sanity check the value bounds if self.MAX_DB_RETRIES 10: logging.critical("Integrity Error: Constant bounds violated!") raise ValueError("MAX_DB_RETRIES must be between 0 and 10.") # Inspect the calling frame to ensure authorization caller_frame = inspect.currentframe().f_back caller_module = inspect.getmodule(caller_frame) if caller_module is not None and "django" not in caller_module.__name__ and "scripts" not in caller_module.__name__: logging.warning(f"Suspicious access from {caller_module.__name__}") # Ensure the integer memory signature hasn't changed unexpectedly if sys.getsizeof(self.MAX_DB_RETRIES) > 28: raise MemoryError("Constant memory allocation altered by external process.") return self.MAX_DB_RETRIES .setter def max_db_retries(self, value): """Strictly block any assignment attempts with a hard exception.""" logging.error(f"Attempted mutation of MAX_DB_RETRIES to {value}") raise AttributeError( "Attempted to mutate a protected constant. " "MAX_DB_RETRIES is strictly immutable and locked at the metaclass level." ) u/max_db_retries.deleter def max_db_retries(self): """Strictly block any garbage collection or deletion attempts.""" raise TypeError("Cannot delete a protected system-level migration constant.") # Helper function to access the constant safely def get_safe_retry_limit(): config = LegacyMigrationConfig() return config.max_db_retries Like, dude. I'm not writing SIL 4 code in Python.2 I'm an old programmer. I was refactoring COBOL in the 90s, man. (I swear I'm not a hipster.) I absolutely love Claude Code. CC is nothing short of a miracle. I may even be able to retire early because of CC. Hell, the fact that I may even be able to retire, at all, because of AI, would be a miracle.3 So, I find the juxtaposition between "this sucks" and "this rocks" humorous. I know Louis CK is a polarizing figure, but he had one old bit that struck a nerve with me. He was on a plane and Wifi (on a plane) was new. Everyone was amazed. Shortly into the flight, the Wifi failed and some guy scoffed, "This is bullshit, man." Louis' point was the guy wasn't appreciating the fact that Wifi, on a plane, was even possible or the technological miracles mankind has achieved, in such a short period of time. (My friend would say it's because Boeing reverse-engineered that "shit" they found in Roswell.) Having said all of that, I'm grateful for this technology. It's not a perfect tool, but damn if it isn't useful most of the time. And that's good enough for me. I've encountered my share of goofiness (like the nonsense above) and maddening edits that have really pissed me off. Here are my 3 tips to get CC's best. They're not original. These are all just anecdotal and IME, so take it with a grain of sodium chloride (or sodium hydroxide, if you're nasty.) 1.) Clear early, clear often. 1m context is not real. It sounds cool. The idea is cool...but, if you cross over 250K tokens, you're going to have a bad time. 2.) CC ignores your CLAUDE.md and explicitly does something you tell "him" not to? Or "he" makes an egregious, WTF error? Exit CC and restart. Do not clear. Exit the CLI, all the way. If you're configured to get the latest release, you may just find yourself on a new version of CC that fixes the very issues you were encountering a moment ago. 4 3.) Plan. Plan to plan...and then discuss. I may spend a full day -- or even a couple of days5 -- working on a plan and then going back and forth with CC to refine it before any code is written. Think of it this way: how good of a job are you going to do assembling an Ikea armoire (Shitzfling) without the instructions? So, there you have it. My honest take and experience in working with this "miracle worker." It can be fu
View originalYes, CodeRabbit offers a free tier. Pricing found: $24 /mo, $48 /mo, $0 /mo, $0 /mo, $0.50
CodeRabbit has an average rating of 4.7 out of 5 stars based on 20 reviews from G2, Capterra, and TrustRadius.
Key features include: Catch fast. Fix fast., TL;DR for your diff., Find the bugs. Skip the noise., Chat with the CodeRabbit bot directly., Most customizable tool., The reports you need., 1. Codebase intelligence, 2. External context.
CodeRabbit is commonly used for: Automating code reviews, Identifying hard-to-find bugs, Generating daily standup reports, Creating pre-merge code quality checks, Enhancing test coverage, Customizing coding guidelines.
CodeRabbit integrates with: Jira, Linear, GitHub, GitLab, Slack, Trello, Bitbucket, Web APIs.
Based on user reviews and social mentions, the most common pain points are: token usage, API costs.
Based on 36 social mentions analyzed, 17% of sentiment is positive, 83% neutral, and 0% negative.