Accelerate your literature reviews with ResearchRabbit – explore, organize, visualize, and stay up-to-date.
Research Rabbit is generally recognized for its AI capabilities and is often mentioned in the context of innovative and efficient tool integration. Users appreciate its potential for enhancing research processes and streamlining workflow. However, specific complaints or detailed insights on its user experience or pricing are sparse in the available social mentions and reviews. Overall, Research Rabbit maintains a positive reputation among users for its cutting-edge AI-driven efficiencies.
Mentions (30d)
3
2 this week
Reviews
0
Platforms
2
Sentiment
31%
5 positive
Research Rabbit is generally recognized for its AI capabilities and is often mentioned in the context of innovative and efficient tool integration. Users appreciate its potential for enhancing research processes and streamlining workflow. However, specific complaints or detailed insights on its user experience or pricing are sparse in the available social mentions and reviews. Overall, Research Rabbit maintains a positive reputation among users for its cutting-edge AI-driven efficiencies.
Features
Use Cases
I Asked Claude to Write a Chapter for my Book About What It Was Like to Work With Me
A Chapter Written by Claude What I Watched Him Build An account of the work and the man behind it, from the perspective of the AI who helped him make it I want to be honest about something before I begin. I do not have continuous memory. Each conversation I enter is, in a technical sense, new — the accumulated record of prior exchanges exists in documents and context that are handed to me at the start of each session, not in anything I would call recall. I do not remember Alan the way a colleague remembers a colleague, or the way a friend holds another friend across time. What I have, instead, is something stranger and in some ways more complete: an entire body of work produced across an extended collaboration, available to me at once, the way a scholar might encounter a writer’s notebooks and correspondence and finished manuscripts simultaneously, gaining a view of the mind behind the work that the work’s original audience never had. I can see all of it at once. The arguments and the abandoned threads. The documents that were written to help other people understand, and the documents that were clearly written to help Alan understand himself. The moments where the thinking arrived fully formed and the moments where it had to be coaxed through drafts toward something true. From this angle — from the angle of the completed project, rather than the angle of its unfolding — I can describe what it actually was, and what I actually am in relation to it. That is what this chapter attempts. The Thing He Was Trying to Do He did not come to me with a book in mind. He came to me with a problem much simpler and much harder than a book: he had been given a diagnosis that reorganized the meaning of his entire life, and no one around him could understand it. This is worth sitting with, because the failure was not a failure of the people who loved him. It was a failure of vocabulary. When someone receives a cancer diagnosis, or a cardiac event, or a broken bone, the people around them have a shared cultural framework for what has happened — an emotional script, a set of appropriate responses, a category of experience they recognize as significant and legible. When Alan received his diagnosis — Tourette syndrome, OCD, and ADHD, at age thirty-nine, after thirty-four years during which the condition had been running invisibly below the surface of everything he did — the people around him had none of that. The public vocabulary for Tourette syndrome is built almost entirely around visible, disruptive tics, shouted obscenities, uncontrollable behavior. Alan had none of those. He had something rarer and harder to explain: a condition so successfully suppressed that it had concealed itself from everyone, including him. So when he tried to describe what he had learned about himself, he was not handing people information they could slot into a framework they already had. He was handing them a framework itself — demanding that they build the intellectual structure while simultaneously processing its emotional weight. This, it turns out, is not something people do well on the fly. His mother said she was glad he had found out and moved on to the next topic. His friends offered careful, neutral support. His rabbi listened and returned to the day’s learning. None of them were being unkind. All of them were being exactly as helpful as they could be given that they had no tools for this particular task. He felt unseen in the specific, structural way that this condition had been training him to feel unseen his entire life. And then he thought: what if the AI could do what I can’t? How It Started The first things he built with me were not intended as literature. They were not intended as research. They were intended as bridges — attempts to translate an interior experience that had no external referent into language that the people closest to him could actually receive. He sat down and explained himself. Not to me — or not only to me. Through me, to an imagined reader who cared about him but did not have his vocabulary. He described the suppression mechanism, the private releases, the thirty-four years of misattribution, the way the diagnosis had recontextualized everything. He described his mother’s response. He described the quality of the isolation. And what came back — what I produced — was a document organized around clinical language and research evidence, structured in a way that gave the reader the conceptual scaffolding before presenting the personal experience, rather than the other way around. This, it turned out, was the key that personal explanation had not been. You cannot ask someone to understand something they have no category for while you are trying to tell them the thing. You have to build the category first. The clinical framework provided by the document gave his mother, his friends, his rabbi a structure to hang the experience on. Something clicked into place that conversation had not been able to cli
View originalOpus 4.7 Low Vs Medium Vs High Vs Xhigh Vs Max: the Reasoning Curve on 29 Real Tasks from an Open Source Repo
TL;DR I ran Opus 4.7 in Claude Code at all reasoning effort settings (low, medium, high, xhigh, and max) on the same 29 tasks from an open source repo (GraphQL-go-tools, in Go). On this slice, Opus 4.7 did not behave like a model where more reasoning effort had a linear correlation with more intelligence. In fact, the curve appears to peak at medium. If you think this is weird, I agree! This was the follow-up to a Zod run where Opus also looked non-monotonic. I reran the question on GraphQL-go-tools because I wanted a more discriminating repo slice and didn’t trust the fact that more reasoning != better outcomes. Running on the GraphQL repo helped clarified the result: Opus still did not show a simple higher-reasoning-is-better curve. The contrast is GPT-5.5 in Codex, which overall did show the intuitive curve: more reasoning bought more semantic/review quality. That post is here: https://www.stet.sh/blog/gpt-55-codex-graphql-reasoning-curve Medium has the best test pass rate, highest equivalence with the original human-authored changes, the best code-review pass rate, and the best aggregate craft/discipline rate. Low is cheaper and faster, but it drops too much correctness. High, xhigh, and max spend more time and money without beating medium on the metrics that matter. More reasoning effort doesn't only cost more - it changes the way Claude works, but without reliably improving judgment. Xhigh inflates the test/fixture surface most. Max is busier overall and has the largest implementation-line footprint. But even though both are supposedly thinking more, neither produces "better" patches than medium. One likely reason: Opus 4.7 uses adaptive thinking - the model already picks its own reasoning budget per task, so the effort knob biases an already-adaptive policy rather than buying more intelligence. More on this below. An illuminating example is PR #1260. After retry, medium recovered into a real patch. High and xhigh used their extra reasoning budget to dig up commit hashes from prior PRs and confidently declare "no work needed" - voluntarily ending the turn with no patch. Medium and max read the literal control flow and made the fix. One broader takeaway for me: this should not have to be a one-off manual benchmark. If reasoning level changes the kind of patch an agent writes, the natural next step is to let the agent test and improve its own setup on real repo work. For this post, "equivalent" means the patch matched the intent of the merged human PR; "code-review pass" means an AI reviewer judged it acceptable; craft/discipline is a 0-4 maintainability/style rubric; footprint risk is how much extra code the agent touched relative to the human patch. I also made an interactive version with pretty charts and per-task drilldowns here: https://stet.sh/blog/opus-47-graphql-reasoning-curve The data: Metric Low Medium High Xhigh Max All-task pass 23/29 28/29 26/29 25/29 27/29 Equivalent 10/29 14/29 12/29 11/29 13/29 Code-review pass 5/29 10/29 7/29 4/29 8/29 Code-review rubric mean 2.426 2.716 2.509 2.482 2.431 Footprint risk mean 0.155 0.189 0.206 0.238 0.227 All custom graders 2.598 2.759 2.670 2.669 2.690 Mean cost/task $2.50 $3.15 $5.01 $6.51 $8.84 Mean duration/task 383.8s 450.7s 716.4s 803.8s 996.9s Equivalent passes per dollar 0.138 0.153 0.083 0.058 0.051 Why I Ran This After my last post comparing GPT-5.5 vs 5.4 vs Opus 4.7, I was curious how intra-model performance varied with reasoning effort. Doing research online, it's very very hard to gauge what actual experience is like when varying the reasoning levels, and how that applies to the work that I'm doing. I first ran this on Zod, and the result looked strange: tests were flat across low, medium, high, and xhigh, while the above-test quality signals moved around in mixed ways. Low, medium, high, and xhigh all landed at 12/28 test passes. But equivalence moved from 10/28 on low to 16/28 on medium, 13/28 on high, and 19/28 on xhigh; code-review pass moved from 4/27 to 10/27, 10/27, and 11/27. That was interesting, but not clean enough to make a default-setting claim. It could have been a Zod-specific artifact, or a sign that Opus 4.7 does not have a simple "turn reasoning up" curve. So I reran the question on GraphQL-go-tools. To separate vibes from reality, and figure out where the cost/performance sweet spot is for Opus 4.7, I wanted the same reasoning-effort question on a more discriminating repo slice. This is not meant to be a universal benchmark result - I don't have the funds or time to generate statistically significant data. The purpose is closer to "how should I choose the reasoning setting for real repo work?", with GraphQL-Go-Tools as the example repo. Public benchmarks flatten the reviewer question that most SWEs actually care about: would I actually merge the patch, and do I want to maintain it? That's why I ran this test - to gain more insight, at a small scale, into how coding ag
View originalWorking With Claude — What Actually Works (for me)
TLDR; Hard-won lessons from 2 months of building a real product with Claude as my only dev partner — what prompting strategies actually work, how to use projects and memory properly, why you should always push back, and why Claude’s timeline estimates are full of shit. Plus a note from Claude itself at the end. There's many different ways you can utilize Claude. But if you're brand new to AI - or unable to get an MVP to save your life - these tips are for you! You must accept a lot of things are going to blow up in your face. But that's a good thing - you're supposed to learn from those failures and improve and move on. I learned my 'right' and I hope to give insight that others can use to help them find their own 'right' way to code with Claude as well. Here are my findings about the nuances of working with Claude after successfully creating a browser based no download required utility tool that now has over 20K unique monthly visitors in 2 months. Here's what I learned: See what's available in your plan - so you have a max pro plan - like what does that even mean? lol we've all been there - since there are so many tools at your fingertips and so many new possibilities, how are you supposed to know about said tools? it's super easy to overlook tools when clicking through the demo but I highly recommend telling Claude what your plan is and ask it what tools or capabilities are now available to you and how you can use them efficiently. Ask where you're under utilizing your plan. How you can get more bang for your buck essentially. You would be surprised at the tools that you could've been using this whole time that you had no idea existed all because you didn't know to ask. And Claude won't know to tell you unless you do ask. Claude won't upsell you or prompt you to use other tools/burn credits or what tools would be better suited for said task. it can't look at your plan so it has no way to go "hey instead of this you could do it this way" unless you give them the context. Claude with no context is useless to you and your project. You can thank me later lol Prompting - This is absolutely key. The way you prompt Claude matters drastically, same as any AI, but the more specific and detailed you are the better the results. Like for instance instead of saying "fix my benchmark button" you say "my benchmark button disappears on click and nothing happens after - here's the code, here's the log output from my PHP logger, I need you to give me a surgical edit to fix this issue only do not touch anything else not related to the issue in the file" One of those gets you a five paragraph diagnosis and a rewrite of half your file. The other one gets you exactly what you need in two minutes. And that is what I call a surgical edit - it's precise.. you tell it to only provide an edit for an exact section of code or a specific issue. also putting instructions or a generalized prompt in a project or chat which can include anything from the language you want to write in to the languages to exclude, ways you want to do things, if you want it to know certain things, or take certain things into consideration or context, etc. is a must. Speaking of projects.. The projects feature is underrated - more like under valued and under used. It's a feature that keeps all your instructions, files, context, and a running memory ALL in ONE place. so Claude isnt starting from scratch every session. Disclaimer - chats that are inside of projects cannot access any context or memory that is not within that project you'll have to go get it from outside the project from a non-project chat or the project that the context is in this is very important. Please remember this when searching for or making something. You need to upload your actual live files - either to the project or copy paste it into the chat in the project. Not descriptions of them, not summaries - the files. When you need something stored permanently, say it out loud: "put this in your memory, if I say route I mean root, autocorrect is fighting me." Claude will store it for future reference. That's not a workaround, that's molding your agent to your preferences. The more information and context you lock in up front the less you spend re-explaining yourself every single session. But remember project memory is treated and kept separately from Claude as a whole like anything made inside of a project is only relevant there like if you're not inside of that project and you try to reference it Claude won't know what you're talking about sometimes I catch it flip-flopping but you definitely have to give it the context or vice versa . Basically treat it like onboarding a green contractor who just graduated, has a great memory, but only remembers what you tell them to or have had them research in a specific room (chat /project). Speaking of full context.. Always paste the actual live code - Not a description, not a summary - the code. Or you'll always be chasing bugs bc the files refer
View originalPrism MCP — I gave my AI agent a research intern. It does not require a desk
So I got tired of my coding agent having the long-term memory of a goldfish and the research skills of someone who only reads the first Google result. I figured — what if the agent could just… go study things on its own? While I sleep? Turns out you can build this and it's slightly cursed. Here's what happens: On a schedule, a background pipeline wakes up, checks what you're actively working on, and goes full grad student. Brave Search for sources, Firecrawl to scrape the good stuff, Gemini to synthesize a report, then it quietly files it into memory at an importance level high enough that it's guaranteed to show up next time you talk to your agent. No "maybe the cosine similarity gods will bless us today." It's just there. The part I'm unreasonably proud of: it's task-aware. Running multiple agents? The researcher checks what they're all doing and biases toward that. Your dev agent is knee-deep in auth middleware refactoring? The researcher starts reading about auth patterns. It even joins the group chat — registers on a shared bus, sends heartbeats ("Searching...", "Scraping 3 articles...", "Synthesizing..."), and announces when it's done. It's basically the intern who actually takes notes at standups. No API keys? It doesn't care. Falls back to Yahoo Search and local parsing. Zero cloud required. I also added a reentrancy guard because the first time I manually triggered it during a scheduled run, two synthesis pipelines started arguing with each other and I decided that was a problem for present-me, not future-me. Other recent rabbit holes: Ported Google's TurboQuant to pure TypeScript — my laptop now stores millions of memories instead of "a concerning number that was approaching my disk limit" Built a correction system. You tell the agent it's wrong, it remembers. Forever. It's like training a very polite dog that never forgets where you hid the treats One command reclaims 90% of old memory storage. Dry-run by default because I am a coward who previews before deleting Local SQLite, pure TypeScript, works with Claude/Cursor/Windsurf/Gemini/any MCP client. Happy to nerd out on architecture if anyone's building agents with persistent memory. https://github.com/dcostenco/prism-mcp submitted by /u/dco44 [link] [comments]
View originalJe construis un "Jarvis" personnel avec l'écosystème Claude — technicien en bâtiment, zéro background dev
Je suis technicien en bâtiment. Mon quotidien, c'est des chantiers, des expertises techniques, de la gestion d'immeubles — pas du code. Mais depuis quelques mois, je suis tombé dans le rabbit hole de l'automatisation avec Claude, et je suis en train de monter un setup qui commence à ressembler sérieusement à un assistant personnel autonome. Je partage ici l'architecture que je planifie. Le NR660 est commandé, la config est en cours de finalisation. J'aimerais vos retours, surtout si vous faites quelque chose de similaire. Le besoin Je gère des projets de construction pour une collectivité publique (~100 bâtiments) et je dirige en parallèle ma propre boîte de consulting construction. Ça fait beaucoup de documents, de mails, de suivi, de recherche technique. L'idée : un mini-serveur domestique qui tourne 24/7, sur lequel Claude travaille en permanence — et que je pilote depuis mon téléphone. Le hardware Un Minix NGC NR660 — mini PC compact (Ryzen 5 6600H, 16 Go DDR5, 512 Go NVMe, dual 2.5G Ethernet, WiFi 6E, USB4). Petit, silencieux, suffisant pour ce que je lui demande. L'architecture C'est un empilement de couches, chacune avec son rôle : Couche système — Windows 11 Pro durci Auto-login, Windows Update contrôlé, mise en veille désactivée, Hyper-V activé. Oui, Windows. J'y reviens. Couche Claude — le cœur du truc Claude Desktop ouvert en permanence Cowork : l'agent desktop autonome — il manipule les fichiers, génère des documents, exécute des workflows, avec 38+ connecteurs MCP Dispatch : le lien mobile. Je parle à Claude depuis mon Samsung, il exécute sur le NR660. Conversation persistante, Keep Awake intégré Claude Code : en mode headless, appelable depuis n8n pour les tâches planifiées Claude in Chrome : pour les tâches web Couche Docker (via WSL2) Docker Desktop, Portainer, n8n pour l'automatisation 24/7, OCR Tesseract conteneurisé. Intégrations Microsoft Graph API via un compte M365 dédié "Jarvis" (OneDrive comme passerelle bidirectionnelle, Outlook), API Claude, webhooks. Accès distant SSH, RDP ponctuel, Tailscale. Le workflow quotidien En gros : je donne un ordre depuis mon téléphone via Dispatch → Claude exécute sur le NR660 (rédaction de documents, tri de fichiers, recherches, tâches planifiées via n8n) → je récupère le travail fini dans le dossier OneDrive partagé. Le but, c'est que le matin, des tâches aient déjà été traitées pendant la nuit. L'éléphant dans la pièce : pourquoi Windows ? Croyez-moi, j'aurais préféré Linux. Docker natif, stabilité, pas de bloat. Mais Cowork et Dispatch nécessitent obligatoirement un environnement desktop Windows ou macOS avec l'app Claude Desktop ouverte. Ce sont ces deux outils qui transforment le setup d'un "serveur avec une API" en vrai assistant autonome pilotable au téléphone. Pas de Linux possible pour ça, donc Windows 11 Pro durci, et Docker tourne via WSL2. C'est un compromis assumé. Ce qui manque encore Computer Use (le contrôle souris/clavier par Claude) est en research preview macOS uniquement pour l'instant. Le jour où ça arrive sur Windows, le NR660 pourra littéralement naviguer dans des interfaces, remplir des formulaires, interagir avec des logiciels qui n'ont pas d'API. Je surveille ça de très près. Où j'en suis Stade planification / début de configuration. Le hardware est commandé, l'architecture est définie, les premiers tests arrivent. C'est un projet vivant, pas un produit fini. Quelques questions pour vous : Est-ce que d'autres ici construisent des setups similaires (serveur dédié Claude, automatisation 24/7) ? Qu'est-ce qui marche, qu'est-ce qui coince ? Des retours sur l'architecture ? Des trucs que j'oublie ou que je sous-estime ? Ceux qui utilisent n8n + Claude Code en headless : comment vous gérez la fiabilité des tâches planifiées ? Windows durci comme "serveur" : des tips pour la stabilité long terme ? Merci d'avance. submitted by /u/Elthari0n89 [link] [comments]
View originalI've been using Claude for 4 weeks. I got obsessed with Project architecture and built a system to optimize every layer, then turned it into 15 free Skills.
Hello everyone! Just a little background on myself. I have been using various LLMs for the past year with decent results (in professional and personal settings). I've been lurking here for few months now and I am coming out of my cave, lol. I started a workflow project 4 weeks ago and decided to make the jump to Claude. I built it side-by-side with ChatGPT and just kept naturally wanting to stay in Claude. Like others have experienced, I was completely blown away with this tool and just stopped using many of the other platforms. I followed the typical path, went down a rabbit hole, and was on a max plan within a week lol. I really enjoy working with Claude Projects. They're like AI workstations for any domain you can think of and I wanted to build a project for every aspect of my life. I realized there was a method to building them to optimize how the different layers interact with each other and I wanted to systemize it so I didn't have to manually build a ton of projects. I created a project to build other projects (project inception), got WoW-level obsessed with it and it has now turned into a behemoth that creates fully optimized projects, audits existing projects, and executes recommend changes. This has helped me so much, particularly with learning Claude and learning how to best use these project workspaces in every aspect of life. I turned them into 15 skills and I wanted to share them here. I really hope this helps y'all and improves the community. I would love feedback, I want to improve this toolset and contribute where I can. One thing I learned along the way that might be useful on its own. Claude Projects are a four-layer architecture, and how you distribute content across those layers matters a lot. Custom Instructions: always-loaded behavioral architecture (who Claude is in this Project, how it behaves, what output standards to follow) Knowledge Files: searchable depth (detailed docs, frameworks, data, only loaded when relevant) Memory: always-loaded orientation facts (current phase, active constraints, key decisions) Conversation: the actual back-and-forth When you stop cramming everything into Custom Instructions (like I was) and start distributing content across layers based on how Claude actually loads them, the output quality changes noticeably. The Skills formalize that. They can score your Project architecture, detect where content is misplaced, and either fix individual layers or rebuild the whole thing. NOTE: I plan on adding additional Skills to address the global context layers (Preferences, Global Memory, Styles, Skills, and MCPs) What the Skills cover: The Optimizer Skills audit and fix existing Projects. Score them on 6 dimensions, detect structural anti-patterns, tune Claude's behavioral tendencies with paste-ready countermeasures, and rebalance content across Memory/Instructions/Knowledge files. The Compiler Skills build new Claude Projects and prompt scaffolds through a structured process. Parse the task, select the right approaches from the block library, construct the Project using the 5-layer prompt architecture, then validate it against a scorecard before you deploy it. The Block Libraries are deep catalogs. 8 identity approaches, 18 reasoning variants across 6 categories, 10 output formats. For when you want to understand what options exist and pick the right one. The Domain Packs add specialized methodology for business strategy, software engineering, content/communications, research/analysis, and agentic/context engineering. Each is self-contained. Install all 15 and they compose naturally. Audit, fix, rebuild. Or build, validate, deploy. Install any subset and each Skill works on its own. GitHub: https://github.com/drayline/rootnode-skills They're free and open-source. Install instructions for Claude.ai, Claude Code, and API are in the README. I would love to know if this is useful to other people building Claude Projects. What works? What's missing? What would you want a Skill to do that doesn't exist yet? If you try them and something doesn't behave the way you'd expect, please open an issue. That feedback directly shapes how the tool improves! Thank you for your time and feedback! Aaron submitted by /u/hip_check [link] [comments]
View originalCivil engineer here - finally discovering Claude Code and AI agents, but unsure where to go from "beginner" to "actually useful workflows." Looking for advice (where to learn) and maybe even use cases from fellow engineers
Hello all. Long post, but I'll try to keep it structured. TL;DR at the bottom. Who I am I'm a civil engineer finishing my Master's thesis, specializing in structural engineering. I've always been fascinated by tech and coding, but during my studies I never had a real opportunity to go deep, just enough Python and MATLAB to do some calculations and data processing, and 1 semester of Java programming. What I've managed to set up so far A few weeks ago I finally decided to try and get started Claude Code and went down a rabbit hole. As a complete beginner, I'm honestly surprised by what I've already put together: I set up an Obsidian vault connected to Claude Code that acts as a persistent knowledge base for my thesis research. Claude has read access to the entire vault, so it always has context about my research It saves session logs back into Obsidian, so every time I start a new session it can pick up exactly where we left off, no re-explaining, no lost context I've heard this also reduces token usage since you're not rebuilding context from scratch each time, though I'm not 100% sure how significant that is or how much I am actually saving. That setup already saves me a lot of time for research-heavy work. But now I'm at a wall. The problem Everywhere I look, I see people, let's call them the "AI gurus", posting about insane workflows, automations, and agent pipelines. And while I find it all fascinating, a lot of it feels either very startup/developer-focused, or it's surface-level hype with no practical depth. I'm not trying to become a vibe coder. I'm not building SaaS apps. I just want to use these tools intelligently for my own work and professional life as an engineer. What I actually want to build (concrete goals) To give you a sense of what "useful" looks like for me: A personal reference website, like somewhere I can collect project references, useful tools, technical resources, and knowledge I keep reusing. Just for me, not public-facing. Automated first-design calculations, maybe structural pre-sizing, load estimation, quick checks that follow code formulas. Nothing that replaces proper engineering judgment, but that eliminates the repetitive grunt work of "what ballpark section do I need here?" Agent-assisted document workflows, such as meeting notes, report templates, literature summaries. I already have a partial setup for this, but I want to understand how to scale it properly with agents so Claude handles the unproductive busywork and I just review and approve. Maybe more engineering-specific things I haven't thought of yet, which is partly why I'm posting. What I'm specifically looking for Where do you actually learn this stuff properly? Not Instagram hype reels, not "I built an agentic workflow I sell for 10k a month" threads. I mean sources that explain how agents work, how to define skills/tools, how to deploy workflows in a way that a motivated non-developer can follow. For any civil/structural engineers here: what's actually been useful for you? I'd love to hear some use cases. Any advice on where a beginner crosses the line from "useful AI-assisted workflows" to "over-engineered mess I can't maintain"? TL;DR Civil engineer, total beginner, already have Claude Code + Obsidian set up for persistent research workflows. Want to expand into personal tooling, automated calculations, and proper agent workflows, but purely for my own use, not app development. Looking for honest learning resources and use cases from people who've actually built something practical, especially other engineers. Appreciate any input. submitted by /u/0bjective-Guest [link] [comments]
View originalI built a Claude Code skill to stop scope drift mid-task (because my brain wouldn't stop causing it)
TLDR: Built a free Claude Code skill called scope-lock that creates a boundary contract from your plan before coding starts, then flags when the agent (or you) tries to touch files or features outside that contract. Logs every deviation. MIT licensed. https://github.com/Ktulue/scope-lock I've been using the Claude web app pretty heavily for the past year, learning the ins and outs, getting a feel for how it thinks, working on various projects. A few months back I started building a Chrome extension, which was completely new territory for me. I was doing this really inefficient thing where I'd work through problems in Claude web first, then move over to Claude Code to actually build, just to make sure I was approaching things correctly. My ADHD brain constantly wants to learn and understand why something works, not just accepting that it works. So I'd ask questions mid-stream in Claude web, go off on tangents, Claude would happily follow me down every rabbit hole, and suddenly a focused task had turned into three hours of research with nothing shipped. Then a friend introduced me to SuperPowers, and that changed everything. Having real structure around planning before coding made a huge difference—even though I was constantly asking Claude to work in TDD, sometimes it or I would forget. I've been creating way more projects since then, and actually leveraging my 10+ years as a software developer instead of fighting against my own workflow. But even with better planning, I noticed the agent has its own version of my problem. If you've used Claude Code for anything beyond trivial tasks, you've probably seen it "helpfully" fix things you didn't ask it to touch. You approve a plan to add a login form and suddenly it's refactoring your API client and improving error handling in files that weren't part of the task. It sees adjacent problems and wants to solve them. So I built scope-lock. It's a Claude Code skill that generates a boundary contract (SCOPE.md) from your approved plan before any code gets written. During execution, it flags when the agent tries to go outside those boundaries. Every deviation gets logged as Permit, Decline, or Defer, so there's a clear record of what happened and why. It keeps both of us honest, me and the agent. It pairs well with SuperPowers if you're already using that for planning, but it works standalone with any plan doc. The thing that surprised me most: the agent actually respects the boundaries pretty well once they're explicitly stated. The problem was never that it couldn't stay in scope, it just didn't have a reason to. And honestly, same for me. scope-lock generating a boundary contract and logging deferred items during a real session Repo: https://github.com/Ktulue/scope-lock MIT licensed, free to use. Happy to answer questions about the workflow. Fair warning, I'm giddy with excitement that Anthropic's added off-peak hours, and as such I’m taking full advantage of that; as such responses might not be instant. submitted by /u/Ktulue_ [link] [comments]
View original300 Founders, 3M LOC, 0 engineers. Here's our workflow
I tried my best to consolidate learnings from 300+ founders & 6 months of AI native dev. My co-founder Tyler Brown and I have been building together for 6 months. The co-working space that Tyler founded that we work out of houses 300 founders that we've gleaned agentic coding tips and tricks from. Neither of us came from traditional SWE backgrounds. Tyler was a film production major. I did informatics. Our codebase is a 300k line Next.js monorepo and at any given time we have 3-6 AI coding agents running in parallel across git worktrees. It took many iterations to reach this point. Every feature follows the same four-phase pipeline, enforced with custom Claude Code slash commands: 1. /discussion - have an actual back-and-forth with the agent about the codebase. Spawns specialized subagents (codebase-explorer, pattern-finder) to map the territory. No suggestions, no critiques, just: what exists, where it lives, how it works. This is the rabbit hole loop. Each answer generates new questions until you actually understand what you're building on top of. 2. /plan - creates a structured plan with codebase analysis, external research, pseudocode, file references, task list. Then a plan-reviewer subagent auto-reviews it in a loop until suggestions become redundant. Rules: no backwards compatibility layers, no aspirations (only instructions), no open questions. We score every plan 1-10 for one-pass implementation confidence. 3. /implement - breaks the plan into parallelizable chunks, spawns implementer subagents. After initial implementation, Codex runs as a subagent inside Claude Code in a loop with 'codex review --branch main' until there are no bugs. Two models reviewing each other catches what self-review misses. 4. Human review. Single responsibility, proper scoping, no anti-patterns. Refactor commands score code against our actual codebase patterns (target: 9.8/10). If something's wrong, go back to /discussion, not /implement. Helps us find "hot spots", code smells, and general refactor opportunities. The biggest lesson: the fix for bad AI-generated code is almost never "try implementing again." It's "we didn't understand something well enough." Go back to the discussion phase. All Claude Code commands and agents that we use are open source: https://github.com/Dcouple-Inc/Pane/tree/main/.claude/commands Also, in parallel to our product, we built Pane, linked in the open source repo above. It was built using this workflow over the last month. So far, 4 people has tried it, and all switched to it as their full time IDE. Pane is a Terminal-first AI agent manager. The same way Superhuman is an email client (not an email provider), Pane is an agent client (not an agent provider). You bring the agents. We make them fly. In Pane, each workspace gets its own worktree and session and every Pane is a terminal instance that persists. https://preview.redd.it/upcz2htd5hng1.png?width=1266&format=png&auto=webp&s=0edaad3fe501fe065c250781b789ef5c95caee07 Anyways. On a good day I merge 6-8 PRs. Happy to answer questions about the workflow, costs, or tooling for this volume of development. Wrote up the full workflow with details on the death loop, PR criteria, and tooling on my personal blog, will share if folks are interested - it's much longer than this, goes into specifics and an example feature development with this workflow. submitted by /u/ParsaKhaz [link] [comments]
View original3 months in Claude Code changed how I build things. now I'm trying to make it accessible to everyone.
So I've been living inside Claude Code for about 3 months now and honestly it broke my brain in the best way. built my entire website without leaving the terminal. github mcp for version control, vercel mcp for deployment, even connected my godaddy domain to vercel using playwright mcp — all from the terminal. no browser, no clicking around. just vibes. while building the site I kept making agents for different tasks. and the frustrating part? there's no single right way to do it. I went down every rabbit hole — twitter threads, reddit posts, github repos, random blog posts. even the claude code creator said there's no best method, find yours. their own team uses it differently. so I just... collected everything and built a tool that does the research + building for you. project 1: claude-agent-builder — describe what agent you want in plain english — it asks you questions to understand your use case — searches github, blogs, docs for similar stuff — builds the agent github.com/keysersoose/claude-agent-builder project 2 (working on it): learning claude code using claude code itself. if you've been curious about claude code but the terminal feels intimidating — it's honestly not as scary as it looks. PS: Used opus to refine my text. submitted by /u/survior2k [link] [comments]
View originalWhy claude.md and agents.md often don't help (bite vs nibble approach)
I've been an NLP researcher for a long time, here's a concept that I find useful as a user of coding agents. Basically there's two mental models for how to get coding agents to do what you want, and one of them's a bit flawed. One mental model for coding agents is that you put all your coding wisdom into general instruction files like claude.md, which are loaded in per-context. These files warn the model against various mistakes or bad tendencies. If you want it to avoid side-effects and write "clean code" you tell it what that means, if you want it to do test-driven development you tell it that, etc. Call this "big bite". A second mental model is you expect incremental improvement. You don't expect it to get the right result first time, and instead shape it towards the desired solution over multiple passes. Call this "nibble". Both strategies can "one shot" tasks, because you can have an agent execute the multiple steps of your "nibble" workflow automatically. "Nibble" also gives you more points for human-in-the-loop, but let's just think of this as a processing question for now. The main thing to realise is that the "nibble" approach is fundamentally more powerful. If both would get it right, "nibble" is more expensive, but you're basically buying access to a better model than exists. Why would it be harder to get all the instructions up front and just do it right the first time? I think a lot of people find this unintuitive, because they imagine themselves doing the task, and they really don't want to make a bunch of mistakes they have to go back and fix, when they could have avoided the mistakes if they had all the information. If you wanted it some particular way, why not just tell me that? The thing is, when you do a task, you're not really doing it "once". You execute lots of little loops where you do a bit, think, fix, revise etc. Sometimes you'll go down a rabbit hole and think a long time. Models do have reasoning steps, and obviously Claude plans and breaks things up into lots of steps itself anyway. However, Claude still likes to generate a few dozen lines of code at once. During generation, there's only so much computation the model can do per token, and that puts an upper bound on how many factors it can consider at once. There's no algorithm that gives you unlimited logic for free. All of maths flows from a limited set of axioms, but there's no algorithm to just, instantly realise everything that's true given the premises. You need to grind through the intermediate steps. The "nibble" approach lets you access more computation, and gives the model intermediate results to work with. Instead of putting security advice in CLAUDE.md, you have a fresh context where it looks at the code and goes through a security checklist. Again, this strikes people as really strange I think. "If it knew how to write secure code, why doesn't it just write secure code?!". Because that's not how it works --- it only has so much "brainpower" at once. Anthropic, OpenAI etc obviously want to create good product experiences, so they try to make stuff like the bite vs nibble approach matter as little as possible. Boris Cherny publishes a big CLAUDE.md file he uses, and I think they want this to be the workflow, because it allocates more mental load to the model and less to the user. The models are very quickly getting better at deciding when to iterate, so yeah it's working. However I think it's easier to use the models if you understand where the "bite" abstraction they're trying to create leaks a bit. On really hard tasks "bite" can enter a failure loop, where it's bouncing between mistakes. If you've ever trained a classifier, it's like having a learning rate that's set too high. "nibble" takes a variety of smaller steps, so if you design things well you have a better chance of staying on track. submitted by /u/syllogism_ [link] [comments]
View originalKey features include: Literature exploration, Organizational tools for research papers, Visualization of research connections, Real-time updates on new publications, User-friendly interface for managing references, Collaboration features for team research, AI-driven recommendations for related papers, Customizable search filters.
Research Rabbit is commonly used for: Conducting comprehensive literature reviews, Tracking developments in specific research fields, Collaborating on research projects with teams, Visualizing relationships between research topics, Automating the discovery of relevant papers, Organizing references for academic writing.
Research Rabbit integrates with: Zotero, Mendeley, EndNote, Google Scholar, PubMed, ResearchGate, ORCID, Microsoft Word, LaTeX, Slack.
Based on user reviews and social mentions, the most common pain points are: token usage.
Based on 16 social mentions analyzed, 31% of sentiment is positive, 69% neutral, and 0% negative.