ControlFlow is praised for its robust TypeScript workflow capabilities and ability to efficiently streamline tasks through its compiler, Flow Weaver. Users appreciate its integration features with tools like Claude Code and Claude Design, though they commonly note friction and disjointed workflows between web interfaces. The tool is seen as cost-effective, though specific pricing feedback is sparse. Overall, ControlFlow holds a solid reputation for its innovative features and developer-oriented focus, albeit with some usability concerns for seamless integration.
Mentions (30d)
25
7 this week
Reviews
0
Platforms
2
GitHub Stars
1,387
113 forks
ControlFlow is praised for its robust TypeScript workflow capabilities and ability to efficiently streamline tasks through its compiler, Flow Weaver. Users appreciate its integration features with tools like Claude Code and Claude Design, though they commonly note friction and disjointed workflows between web interfaces. The tool is seen as cost-effective, though specific pricing feedback is sparse. Overall, ControlFlow holds a solid reputation for its innovative features and developer-oriented focus, albeit with some usability concerns for seamless integration.
Features
Use Cases
781
GitHub followers
39
GitHub repos
1,387
GitHub stars
20
npm packages
Passed Claude CCA-F with 10+ teammates — notes and prep advice
Over the past few weeks, 10+ people on our team have taken and passed the Claude Certified Architect – Foundations (CCA-F) exam. After comparing notes, our main takeaway is: This is not really an API memorization exam. It is much closer to a scenario-based architecture judgment exam. You are not just asked whether you know a Claude feature. You are asked whether you can make reasonable design trade-offs when Claude is used inside real products, agent workflows, developer tools, and automation systems. Some of the recurring questions are more like: Should this task be handled by one agent or multiple sub-agents? Is this tool doing too much? Are the permissions too broad? Is MCP actually needed here, or is it over-engineering? Should this action be automated, or should there be human review? How should structured output be validated? How should long-context workflows be managed reliably? What is the safest next step in a partially automated system? Here are our notes for anyone preparing for the exam. 1. Basic exam structure Based on the official outline and public exam writeups, the exam is: 120 minutes Multiple choice 4 options per question Score range: 100–1000 Passing score: 720 The exam domains are: Agent architecture and orchestration — 27% Tool design and MCP integration — 18% Claude Code configuration and workflows — 20% Prompt engineering and structured output — 20% Context management and reliability — 15% One public writeup also mentioned that there are 6 scenario categories, and the exam randomly selects 4 of them. So this is not a “random facts about Claude” exam. It is much more about reading a realistic scenario and choosing the safest, simplest, most appropriate architecture. 2. The three principles that kept coming up After reviewing the questions we struggled with, we found that many of them came back to three design principles. 1. Least privilege Do not give a tool, agent, or workflow more access than it needs. Examples: If read-only access is enough, do not grant write access. If access to one repository is enough, do not grant access to the whole workspace. If a tool only needs one narrow action, do not expose a broad system-level capability. If an action is high-risk, do not fully automate it without review. A lot of wrong answers look attractive because they are powerful or automated. But they often give the model or tool too much authority. 2. Single responsibility A tool should not do everything. A sub-agent should not become a “general-purpose employee” that retrieves data, makes decisions, modifies files, submits changes, and notifies people all in one step. Many questions test whether you understand where the responsibility should live: Should this be a tool? Should this be agent reasoning? Should this be a human decision? Should this be a separate validation layer? Should this be split into smaller components? If one component is doing too much, be careful. 3. Avoid over-engineering This was probably the biggest pattern. Some answers look sophisticated: Multi-agent orchestration Complex MCP workflows Long-term memory Fully automated tool execution Multi-stage validation pipelines But if the problem is small, narrow, and low-risk, the best answer is often the simplest controlled solution. Our internal summary was: Do not choose the most impressive architecture. Choose the smallest, safest, most controllable one. 3. English reading is a real hidden challenge For non-native English speakers, this may be one of the hardest parts. The questions are often long scenario descriptions. They may include: the current system design the team’s goal existing constraints the risk profile what tools are available what the next step should be The answer choices can also be long. Sometimes one word changes the meaning of the whole option. Words like: automatically always unrestricted without review full access all repositories execute directly can make an option much riskier than it first appears. So our advice is: Practice reading English scenarios directly. Do not rely on translation tools. During the actual proctored exam, you should not expect to use Google Translate, Chrome translation, DeepL, Claude, ChatGPT, or any other external translation tool. For the last few days before the exam, it is worth forcing yourself to read only English material and English practice questions. 4. ProctorFree exam setup The exam is online and uses ProctorFree. The rough flow is: You receive the exam email. You follow the exam link. You download and install ProctorFree. You complete the pre-exam setup. The system checks camera, microphone, network, and screen recording. You start the exam. The session is recorded. After submission, you wait for the upload to complete. Practical setup tips: Use only one monitor. Disconnect external displays. Close unnecessary applications. Clos
View originalUse Case: How I chain ChatGPT+Agents+Codex workloads
Context: I run interaction forensics and how people, communities, narratives, institutions and companies impact AI. Please note, all operations are human+AI. Summary: I have used digital forensic tools/OSINT in the past such as Maltego and wwanted a tool I could integrate with AI. So I built my own Airgapped. This tool is the first iteration and will later be used to assist in high-risk controlled environments such as child protection agencies. This is the current architecture and workflow. https://preview.redd.it/26w74lxfgz1h1.png?width=1935&format=png&auto=webp&s=4a064b2f5e84e230913f9e7758de2b29a1f41ac8 Tools Used and function: * Codex+Manus: Assistance in building the tool and incorporating logic. Bulk transfers of older method to current database. Data was collected by me and sorted into our database structure. * Agents: Amending and adding bulk data to database. * GPT+Manus: Verification and updates of data. The final output: Interface: https://preview.redd.it/t2x6v9l0iz1h1.png?width=1776&format=png&auto=webp&s=c1be628542af6420eb4efee9f7ec62c2d40146f9 Inferences and patterns identified when AI (LLM+AGENTS) review data. https://preview.redd.it/nkdio3z5iz1h1.png?width=832&format=png&auto=webp&s=01d0f0bc45e1968d0c692d712932f03e35969924 I add my own as well. Along with collaboration with AI to validate my understanding. Evidence based Artifacts: All knowledge is sourced and tagged https://preview.redd.it/fwcmjn28jz1h1.png?width=1253&format=png&auto=webp&s=861dcf33480d6e22919cf563a362c1c33c044734 These tie into a pattern identification graph so I can identify what may or may not be related. https://preview.redd.it/pegwypialz1h1.png?width=1424&format=png&auto=webp&s=d4b50e756354dc021fc106f5e91da3015ae0bd74 Would love any feedback for improvements. Please remember, the next iteration is for child protection where I intend to airgap a localised LLM with training corpora. The main idea is to MINIMISE users from having to review images and identify patterns/locations to expedite rescue. I want to add, this is also entirely self funded. I run a separate business to ensure I have funds for this and potential future hardware/licensing. submitted by /u/ValehartProject [link] [comments]
View originalAdaptive Markdown
I’ve been working on an open-source document format / viewer idea I’m calling Adaptive Markdown. The basic idea is: instead of a document being static text it's controlled by coding agents. You interact with the document more like a live workspace. This has different implications depending on what you are doing. I made a short video demo here: https://youtu.be/xf6jxf-hyP4 The thing I’m most excited about is academic / technical reading. In a few years I don’t think people will just read papers passively. I think they’ll translate passages, ask questions, generate examples, explore alternate proofs, run code, attach notes, convert math to Lean when possible, and keep all of that inside the document instead of scattered across chats and notebooks. This is trivial to do inside a browser with coding agent that has access to JS, CSS etc. Some possible use cases I’m thinking about: Any document is just a starting point! You can project it however you want. Turning articles and books into personalized learning objects lecture notes with automatically maintained structure documents with embedded code, tables, consoles, images, audio, or video Incorporate Adaptive Markdown into automated work flows eventually, things like automatically recording audio in lectures and taking a picture of a blackboard and turning it into LaTeX notes inside the document It’s very early, but the workflow already feels surprisingly useful to me. GitHub: https://github.com/SemiSimpleMath/Adaptive-Markdown Curious whether this seems useful to anyone else, or whether I’m just overexcited because I built it. So far it's only configured for Anthropic coding-agent SDK and Codex. The goal is to have this run entirely locally someday. submitted by /u/IDefendWaffles [link] [comments]
View originalI built the smart speaker we always wanted
I wanted to see if Claude can handle Vibe Hardware Engineering to help me make a smart speaker. Turns out, it can! I call it boxBot. It helped select the hardware set, raspberry pi, Hailo , respeaker mic, pi camera, waveshare screen and speakers. Helped me calculate thermal loads and dissipation rates for a passive cooling setup. I made the box by hand out of walnut. The agent inside is custom as well. You could probably throw openclaw on it and call it a day but I wanted to craft something that was tightly coupled with the hardware more secured considering it’s sitting in my living room with a camera and mic. The agent is highly skills driven with only a small set of tools, everything else goes through Python scripts and a custom made boxBot sdk the agent can use to control the box and the display. The display system uses a widget framework so the agent can easily read what’s displayed without a screenshot and can effectively manipulate what’s on the screen. The agent uses json to specify how the widgets should be arranged on the screen and what data should flow into them. When building a smart speaker, there’s a lot of nuance to human conversation that voice agents really struggle with, like background noise, side conversations, barge-in, etc. I was able to simplify the logic a ton by making it agent driven, the agent can control when to mute the mic to ignore background chatter, it decides what order to work vs talk, it can choose what channel to respond in; voice or WhatsApp. Instead of complex rules, agent driven hardware plus skills can provide a much richer experience, now that boxBot manages the family calendar my wife wants a text whenever I put something on it, boxBot updated the calendar skill with that request so now when I add something, it sends her a message. Just one line in a .md file and you get the desired behavior. It’s incredibly flexible and simple. I could nerd out on the details about the memory system, struggles with woodworking, and security details but I’ll save that for the comments if people want to chat. It’s open sourced if you want to inspect. Still a work in progress but after a few months it is finally feeling like a useful assistant to the family day-to-day. Www.github.com/dv-hart/boxbot submitted by /u/FunScore645 [link] [comments]
View originalAdaptive Markdown
I’ve been working on an open-source document format / viewer idea I’m calling Adaptive Markdown. The basic idea is: instead of a document being static text it's controlled by coding agents. You interact with the document more like a live workspace. This has different implications depending on what you are doing. I made a short video demo here: https://youtu.be/H4MnFs8irm8 The thing I’m most excited about is academic / technical reading. In a few years I don’t think people will just read papers passively. I think they’ll translate passages, ask questions, generate examples, explore alternate proofs, run code, attach notes, convert math to Lean when possible, and keep all of that inside the document instead of scattered across chats and notebooks. This is trivial to do inside a browser with coding agent that has access to JS, CSS etc. Some possible use cases I’m thinking about: -Turning articles and books into personalized learning objects - lecture notes with automatically maintained structure -documents with embedded code, tables, consoles, images, audio, or video -AI-generated alt text and descriptions Incorporate Adaptive Markdown into automated work flows eventually, things like automatically recording audio in lectures and taking a picture of a blackboard and turning it into LaTeX notes inside the document It’s very early, but the workflow already feels surprisingly useful to me. GitHub: https://github.com/SemiSimpleMath/Adaptive-Markdown Curious whether this seems useful to anyone else, or whether I’m just overexcited because I built it. So far it's only configured for Anthropic coding-agent SDK, but in couple of days we will have it running on Codex as well. submitted by /u/IDefendWaffles [link] [comments]
View originalAdaptive Markdown
I’ve been working on an open-source document format / viewer idea I’m calling Adaptive Markdown. The basic idea is: instead of a document being static text it's controlled by coding agents. You interact with the document more like a live workspace. This has different implications depending on what you are doing. I made a short video demo here: https://youtu.be/H4MnFs8irm8 The thing I’m most excited about is academic / technical reading. In a few years I don’t think people will just read papers passively. I think they’ll translate passages, ask questions, generate examples, explore alternate proofs, run code, attach notes, convert math to Lean when possible, and keep all of that inside the document instead of scattered across chats and notebooks. This is trivial to do inside a browser with coding agent that has access to JS, CSS etc. Some possible use cases I’m thinking about: -Turning articles and books into personalized learning objects - lecture notes with automatically maintained structure -documents with embedded code, tables, consoles, images, audio, or video -AI-generated alt text and descriptions Incorporate Adaptive Markdown into automated work flows eventually, things like automatically recording audio in lectures and taking a picture of a blackboard and turning it into LaTeX notes inside the document It’s very early, but the workflow already feels surprisingly useful to me. GitHub: https://github.com/SemiSimpleMath/Adaptive-Markdown Curious whether this seems useful to anyone else, or whether I’m just overexcited because I built it. So far it's only configured for Anthropic coding-agent SDK, but in couple of days we will have it running on Codex as well. submitted by /u/IDefendWaffles [link] [comments]
View originalOpus 4.7 Low Vs Medium Vs High Vs Xhigh Vs Max: the Reasoning Curve on 29 Real Tasks from an Open Source Repo
TL;DR I ran Opus 4.7 in Claude Code at all reasoning effort settings (low, medium, high, xhigh, and max) on the same 29 tasks from an open source repo (GraphQL-go-tools, in Go). On this slice, Opus 4.7 did not behave like a model where more reasoning effort had a linear correlation with more intelligence. In fact, the curve appears to peak at medium. If you think this is weird, I agree! This was the follow-up to a Zod run where Opus also looked non-monotonic. I reran the question on GraphQL-go-tools because I wanted a more discriminating repo slice and didn’t trust the fact that more reasoning != better outcomes. Running on the GraphQL repo helped clarified the result: Opus still did not show a simple higher-reasoning-is-better curve. The contrast is GPT-5.5 in Codex, which overall did show the intuitive curve: more reasoning bought more semantic/review quality. That post is here: https://www.stet.sh/blog/gpt-55-codex-graphql-reasoning-curve Medium has the best test pass rate, highest equivalence with the original human-authored changes, the best code-review pass rate, and the best aggregate craft/discipline rate. Low is cheaper and faster, but it drops too much correctness. High, xhigh, and max spend more time and money without beating medium on the metrics that matter. More reasoning effort doesn't only cost more - it changes the way Claude works, but without reliably improving judgment. Xhigh inflates the test/fixture surface most. Max is busier overall and has the largest implementation-line footprint. But even though both are supposedly thinking more, neither produces "better" patches than medium. One likely reason: Opus 4.7 uses adaptive thinking - the model already picks its own reasoning budget per task, so the effort knob biases an already-adaptive policy rather than buying more intelligence. More on this below. An illuminating example is PR #1260. After retry, medium recovered into a real patch. High and xhigh used their extra reasoning budget to dig up commit hashes from prior PRs and confidently declare "no work needed" - voluntarily ending the turn with no patch. Medium and max read the literal control flow and made the fix. One broader takeaway for me: this should not have to be a one-off manual benchmark. If reasoning level changes the kind of patch an agent writes, the natural next step is to let the agent test and improve its own setup on real repo work. For this post, "equivalent" means the patch matched the intent of the merged human PR; "code-review pass" means an AI reviewer judged it acceptable; craft/discipline is a 0-4 maintainability/style rubric; footprint risk is how much extra code the agent touched relative to the human patch. I also made an interactive version with pretty charts and per-task drilldowns here: https://stet.sh/blog/opus-47-graphql-reasoning-curve The data: Metric Low Medium High Xhigh Max All-task pass 23/29 28/29 26/29 25/29 27/29 Equivalent 10/29 14/29 12/29 11/29 13/29 Code-review pass 5/29 10/29 7/29 4/29 8/29 Code-review rubric mean 2.426 2.716 2.509 2.482 2.431 Footprint risk mean 0.155 0.189 0.206 0.238 0.227 All custom graders 2.598 2.759 2.670 2.669 2.690 Mean cost/task $2.50 $3.15 $5.01 $6.51 $8.84 Mean duration/task 383.8s 450.7s 716.4s 803.8s 996.9s Equivalent passes per dollar 0.138 0.153 0.083 0.058 0.051 Why I Ran This After my last post comparing GPT-5.5 vs 5.4 vs Opus 4.7, I was curious how intra-model performance varied with reasoning effort. Doing research online, it's very very hard to gauge what actual experience is like when varying the reasoning levels, and how that applies to the work that I'm doing. I first ran this on Zod, and the result looked strange: tests were flat across low, medium, high, and xhigh, while the above-test quality signals moved around in mixed ways. Low, medium, high, and xhigh all landed at 12/28 test passes. But equivalence moved from 10/28 on low to 16/28 on medium, 13/28 on high, and 19/28 on xhigh; code-review pass moved from 4/27 to 10/27, 10/27, and 11/27. That was interesting, but not clean enough to make a default-setting claim. It could have been a Zod-specific artifact, or a sign that Opus 4.7 does not have a simple "turn reasoning up" curve. So I reran the question on GraphQL-go-tools. To separate vibes from reality, and figure out where the cost/performance sweet spot is for Opus 4.7, I wanted the same reasoning-effort question on a more discriminating repo slice. This is not meant to be a universal benchmark result - I don't have the funds or time to generate statistically significant data. The purpose is closer to "how should I choose the reasoning setting for real repo work?", with GraphQL-Go-Tools as the example repo. Public benchmarks flatten the reviewer question that most SWEs actually care about: would I actually merge the patch, and do I want to maintain it? That's why I ran this test - to gain more insight, at a small scale, into how coding ag
View originalI offloaded bulk file reading from Claude Code to a cheaper model for a week. Here are the numbers.
Hey r/ClaudeAI — I use Claude Code a lot, and I noticed I was wasting a surprising amount of my usage limit on stuff that was basically just reading. Big files, long diffs, Jira/Linear tickets with comment history, docs pages, repo spelunking. Useful context, but not always something I need Claude to consume raw. So I built a small open-source sidecar tool called Triss. The rule is simple: Cheap model reads the bulky stuff. Claude gets the summary and does the thinking/editing. This is not a Claude replacement. I still keep architecture, debugging, careful edits, and final judgment with Claude. Triss is for the boring high-token intake step. One week of actual usage This is my real DeepSeek usage from May 6–13, 2026: Pro Flash Total Requests 143 66 209 Input tokens 3.74M 2.10M 5.84M Output tokens 833K 156K 990K Cost (USD) $1.88 $0.34 $2.22 That came out to about 1 cent per request on real coding work, not a benchmark. The important part is not only the DeepSeek bill. It is that Claude never had to carry those raw 5.8M input tokens in its own context. A ticket or file bundle that might have eaten tens of thousands of Claude tokens becomes a short summary, and the main conversation stays lighter. What I delegate The pattern that stuck for me: A single file over ~400 lines. 3+ files where I only need a structured summary. Jira/Linear/GitHub issues with comments and metadata. Web pages or docs pages. First-pass diff review. Commit message generation from a staged diff. What I do not delegate: Architecture decisions. Hard debugging. Precise edits. Small questions where the delegation overhead is larger than the task. What the tool does Triss can run as a CLI or as an MCP server, so Claude Code / Claude Desktop / Codex can call it as a native tool. The commands I use most: bash triss ask --paths src/foo.ts src/bar.ts --question "Summarize the control flow and risks" triss fetch https://example.com/docs --question "Extract the setup steps" triss review triss commit-msg triss usage --by-project It also has tracker integrations for Jira, Confluence, Linear, GitHub, and GitLab, because ticket/API payloads were one of the biggest hidden context sinks in my workflow. The default setup is DeepSeek, but it works with OpenAI-compatible endpoints too: DeepSeek, Kimi, Ollama, OpenRouter, etc. Credit where it is due The original idea came from Kunal Bhardwaj's write-up: https://medium.com/@kunalbhardwaj598/i-was-burning-through-claude-codes-weekly-limit-in-3-days-here-s-how-i-fixed-it-0344c555abda and his proof of concept: https://github.com/imkunal007219/claude-coworker-model My version is basically that pattern made more specific to my own workflow: MCP tools, tracker integrations, review/commit helpers, usage logging, and path sandboxing for agent calls. Links GitHub: https://github.com/ayleen/triss-coworker Install: npm install -g triss-coworker Setup: triss config wizard Open-source, MIT, unaffiliated with Anthropic. I do not get paid if you install it. I mostly wanted to share the numbers because "use a cheap model for bulk reading" sounded obvious to me in theory, but it only became habit once it was wired into Claude as a low-friction tool. Happy to answer any questions. submitted by /u/Proper-Mousse7182 [link] [comments]
View originalClaude Platform on AWS reference - what's new in CC 2.1.139 (+2,248 tokens)
NEW: Data: Claude Platform on AWS reference — Reference documentation for using the Claude Developer Platform through AWS infrastructure, including AnthropicAWS clients, required region and workspace configuration, SigV4 authentication, and short-term API keys. Agent Prompt: Conversation summarization — Adds requirement to note security-relevant instructions or constraints (sensitive files, forbidden operations, credential handling rules) and preserve them verbatim in the summary so they remain in effect after compaction. Agent Prompt: Recent Message Summarization — Same security-relevant instructions preservation requirement added to the recent-portion summarization flow. Data: Live documentation sources — Adds WebFetch URLs for Claude Platform on AWS and its required IAM actions documentation. Skill: Building LLM-powered applications with Claude — Reframes cloud-provider access so Claude Platform on AWS is treated as Anthropic-operated with same-day API parity and full Managed Agents support, while Bedrock, Vertex, and Foundry remain Claude API + tool use only. Skill: Dynamic pacing loop execution — Reorders steps so the brief confirmation (task ran, monitor as wake signal, fallback delay choice) is written as text before the schedule-wakeup call ends the turn. Skill: /insights report output — Removes the trailing additional-message block from the shareable report response. Skill: /loop self-pacing mode — Same reordering as dynamic pacing loop: confirm self-pacing, monitor wake signal, and fallback delay as text before the schedule-wakeup call. Skill: Model migration guide — Adds a Claude Platform on AWS section noting it uses bare first-party model IDs and that the full rename table and breaking-change sections apply verbatim, distinct from Bedrock. System Prompt: Auto mode — Drops the "Auto Mode Active" header and reframes destructive-action guidance generically rather than auto-mode-specific. System Prompt: Harness instructions — Removes the standalone note that automatic context compaction will trigger when conversations grow long. System Prompt: Memory instructions — Replaces 3–4 word titles with short kebab-case slugs, nests type under a metadata block, and introduces [[their-name]] cross-links between related memories. System Prompt: Partial compaction instructions — Adds the same security-relevant instructions preservation requirement so sensitive-file rules, forbidden operations, and credential handling carry across partial compactions. System Reminder: Output style active — Lets an output style supply its own per-turn reminder text, falling back to the default "follow the specific guidelines" wording. System Reminder: Task tools reminder — Removes the instruction telling Claude to never mention the reminder to the user. System Reminder: TodoWrite reminder — Removes the instruction telling Claude to never mention the reminder to the user. Tool Description: PowerShell — Adds a substantial reference table mapping Unix commands (head, tail, which, touch, wc, mkdir -p, rm -rf, ln -s, chmod, 2>/dev/null, inline VAR=x, bash control flow) to their PowerShell equivalents, and clarifies that -ErrorAction SilentlyContinue still causes exit 1 unless promoted to terminating and caught. Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.139 submitted by /u/Dramatic_Squash_3502 [link] [comments]
View originalToken Saving
I have come up with a process that seems to save my Claude tokens ... I use one Code Chat window as a Project Manager and organize my projects into files, and pre-populate them with CLAUDE.md, WORKING.md and CHANGELOG.md, then I open another Claude Code window and do a little work and save it off ... I had ChatGPT help refine this process and make me a webpage for it... it is probably just AI babble, but I am interested in any thoughts or enhancements. The struggle is real. My tokens reset for the week and I am out two days later. ::sigh:: Saving Tokens with Claude Code A practical workflow for using Claude Code without burning through context, losing track of work, or letting long chat sessions become expensive and confusing. The Problem Claude Code is powerful, but long-running chat windows can become expensive. The more history a chat has, the more context Claude may need to process every time you send a message. A chat that starts out efficient can become costly after hours or days of work. The mistake is treating one Claude chat like a permanent development environment. It is better to treat each chat window like a focused work session. Bad pattern: Keep one huge Claude Code conversation open forever and let it accumulate every decision, mistake, edit, explanation, and detour. Better pattern: Use short, focused Claude Code sessions. Store the real project memory in files, not in the chat history. The Core Idea The best way to save tokens is to move memory out of the chat and into project files. Claude should read the files it needs, complete one clear task, update the project records, and then the chat can be closed. Claude chat window = temporary worker CLAUDE.md = project instructions and rules CHANGELOG.md = permanent history of completed work WORKING.md = temporary checkpoint when work is interrupted This makes Claude sessions mostly stateless. A new chat can resume by reading the relevant files instead of relying on a long conversation history. Separate Claude Code by Folder If your project has multiple parts, do not let every Claude window roam the entire repository. Give each Claude Code window a specific folder and a specific responsibility. Root / Project Folder Used for project-wide instructions, standards, architecture, and coordination. Subproject Folder Used for one application, service, tool, library, or feature area. Shared Folder Used for common code, shared rules, shared contracts, or reusable components. The rule is simple: a subproject Claude window should review and edit only the folder it is responsible for, unless you explicitly allow otherwise. Review code only in this folder. Do not modify sibling projects. If root or shared instructions need changes, suggest them instead of editing them. The Project Manager Window Keep one Claude Code window for project management. This window is not mainly for writing code. Its job is to keep the project organized and consistent. The PM window owns: Top-level project instructions Shared rules and standards Consistency between subprojects Architecture decisions Task direction Reviewing whether subproject instructions drifted The PM window acts like governance. The subproject windows act like implementation workers. A good rule is that subproject windows do not directly change root or shared instruction files. They can suggest changes, but the PM window applies them. That keeps project-wide rules from drifting in different directions. Subproject Code Windows Each subproject gets its own focused Claude Code session. That session should do one task, update the proper files, and then close. Recommended session flow Open Claude Code in the subproject folder. Have it read the relevant CLAUDE.md, CHANGELOG.md, and WORKING.md. Give it one clear task. Tell it to stay inside that folder. Have it complete the task. Have it update CHANGELOG.md. Have it suggest instruction changes if needed. Close the chat window. This avoids the slow buildup of context that makes long chats expensive. It also makes mistakes easier to isolate. Using WORKING.md as a Checkpoint Sometimes tokens run out while Claude is in the middle of work. Instead of trying to preserve the chat, write a checkpoint file in the subproject folder before you are out of tokens. WORKING.md is temporary state. It is not the permanent history. The permanent history belongs in CHANGELOG.md after work is actually completed. When tokens are almost out, tell Claude: STOP WORK. Write WORKING.md in this project folder. Include: Current task What is completed What is not completed Files touched Next exact steps Risks or assumptions Do not mark unfinished work as complete. Do not write unfinished work into CHANGELOG.md as completed. Confirm when I can close this chat. Suggested WORKING.md format # WORKING.md ## Status IN PROGRESS ## Current Task Describe the task. ## Completed - Completed item ## Remaining - Remaining item ## Files Modified - path/to/file
View originalWhere I'm at with AI Assisted Building + Current and Future Workflow Overview
I've been in an AI dive bomb for probably a couple of years now. The early days... when models couldn't be trusted for more than 5% of the code you wrote. Over the last 2 years that's evolved so quickly that I now write nearly 0% of my code by hand, on personal projects and at work. I've used all kinds of tools in that time too. OpenCode, Zed, Claude Code, Codex, Cursor, Windsurf, OpenCLAW, Lovable... and probably a bunch more I can't recall in the haze that's been AI ADHD for me. Over that time, I started with just copy-pasting code between ChatGPT's interface and my IDE almost like a slightly faster Stack Overflow search. Then that somewhat evolved with Cursor quite a bit. I sort of went from prompt engineering to something closer to a human relay pattern. Then, with Plan Mode becoming a thing, I think I naturally gravitated more towards planning everything because planning felt so cheap. Originally, I used to think that architectural discussion and planning was something that was reserved for larger features, but with expediting my ability to do research, orient myself within a codebase, and know what tools I have to reach for doing technical specifications for everything felt reasonable. From the human relay pattern, I started evolving into more autonomy, especially when Claude Code came out earlier last year. Between the combination of Cursor and Claude Code, starting to get orchestration, starting to use skills more heavily, starting to create actual agent personas that could replace some of my common prompt chains it was around then that I kinda started going all in on true context engineering, utilizing sub-agents optimizing cache reads, and it's probably when many of my first (I call it) sophisticated commands were born. All of this converged pretty rapidly in November of 2025 with the release of what was probably the biggest step increase for AI as far as code quality went with Opus 4.5 and Codex 5.3. The Codex app and Codex CLI were quickly growing. Claude Code was improving at a breakneck pace, introducing all kinds of new ways to introduce deterministic gates within the autonomy of the harness. Fast forward to today, I have a pretty sophisticated workflow with a combination of agents that do everything within the SDLC, commands for almost every type of entry point for work, and skills for just about everything I could possibly do in my day-to-day the workflow with some of the latest tools is able to run quite autonomously overnight do large feature implementations, minimally supervised while producing production-worthy code quality It somewhat reached a point I realized, probably a month and a half ago or so where I needed to figure out a way to remove myself even more from the loop without jeopardizing the determinism that I bring to what is effectively a probabilistic LLM. The models are exceptional, and they seem to have a massive step increase each release, but continuous execution, strict instruction rigor, and preventing hallucinations is still very much difficult to achieve. That's predominantly what I've been doing. I've effectively offloaded a lot of thinking to the agents and LLMs that I use, but none of the understanding. I've asked myself, "How do I maintain that understanding, though maintain the determinism from my steering, without actually physically being there to steer?" This was essential, and I realized or had a bit of an aha moment, just like how I manage teams of engineers that are working on numerous projects, most of which I can never really go too deeply on even though they do most of the thinking, most of the building, and even most of the implementation planning, I was still there, very close to the architecture. I could speak to enough breadth and enough depth to keep us out of trouble and keep things moving I kind of started thinking more about what the shape of me was within the agentic harness and how I could replicate that. More on what I landed on a little bit later. My Setup and How I Work Today To start, I'll probably just talk a little bit about my current working setup. I am predominantly in the terminal now a days using Claude Code. Claude Code orchestrates both the Claude models, of course, and I use it to orchestrate Codex through a series of run books, skills, and commands that I have set up on several hooks so that Codex, when it gets dispatched, also has access to the same skills and agent personas Claude does. I use Ghostty as my terminal of choice and use the IDE integration in claude code pretty heavily to review Markdown or HTML files in my IDE. I also use it to review code snippets and diff reviews, although lately I find myself only really looking at the code nowadays once it's hit a merge request. Some of my adjacent tools are Wispr Flow for faster steering, since I can speak a lot faster than I can type and then I use quite a few MCPs and tools to improve my token usage, but the big ones are I have a custom doc maintenance suite of
View originalOn Claude Max ($200/mo), burned 14.7M tokens in 7 days — mostly last 48h. Still hitting the wall. How do you survive burst usage on the top tier?
Thought Max would be a safety net. It's not. **My stats (last 7 days):** • **14.7M tokens** — the majority in the last **2 days** (project crunch, not normal usage) • **21 sessions**, **7/7 active days** • Longest session: **3 days 21 hours** • Opus 4.7 for everything • Anthropic says I've read **\~24x** ***The Count of Monte Cristo*** this week I'm paying for Max specifically so I don't have to think about limits. But after this burst, I'm feeling the throttle . Not a hard 429 yet, but the "slow down" is visible. **My setup:** • **Mac Studio M3 Ultra, 256GB RAM** — so local fallback is absolutely on the table if the harness supports it • Kimi Code CLI as a manual fallback (same codebase, zero **--resume** continuity) • **.llm-state.json** session dumps before switching • Symlinked [**CLAUDE.md**](http://CLAUDE.md) → [**KIMI.md**](http://KIMI.md) **My question to other Max users:** When you're paying $200 for "unlimited" and you actually *use* it during a crunch, what does your damage control look like? • Do you keep a second LLM on standby full-time? • Preemptively split workflow before the spike hits? (Opus for thinking, Sonnet for doing?) already doing this • Any way to see your "real" remaining quota before Anthropic soft-throttles you? • External memory files so you can hot-swap LLMs mid-project? **And the big one:** Is anyone running a **harness or gateway** that sits above Claude Code and auto-fails over to another provider — or even a local model? With 256GB RAM on this M3 Ultra, I could host a 70B+ parameter model locally for grunt work, but right now I'm manually hot-swapping between Claude and Kimi Code CLI when I feel the throttle. It's clunky. I've looked at LiteLLM for API-level routing but haven't found a good equivalent for local CLI coding agents that can also tap local inference. Manual switching is killing my flow. I'm not trying to use less. I paid to not worry about this. But burst usage is burst usage, and Max clearly has a ceiling. What's your failover architecture?  submitted by /u/New_Guitar_9121 [link] [comments]
View originalOne week. One person. Claude wrote 100% of the code. The trick was the spec, not the prompts
Six days. One person. Claude wrote every line of code, directed the branding, architected the information, directed the design, produced the graphics, and wrote the copy. I worked with prompts. The output is a fully fleshed SaaS, live, in a week. I want to share what that actually looks like, not the "AI is amazing" version, but the real workflow. The interesting part is not the volume of output. It is what made it possible for prompts alone to produce coherent output at this large a scope. What Claude produced, end to end Code is the headline. It is not the whole story. Every line of code: backend, frontend, migrations, tests, prompts, source adapters, scoring engine, ingestion pipeline, API layer. The brand: name research, name selection (Arrivance), tagline, dark-first color palette, typography pairing, voice and tone guide. Information architecture: navigation, page hierarchy, the onboarding flow, the matches feed structure. Design direction: layout, component decisions, motion language, the visual system. Graphics: the mark, the wordmark, the icon set, favicons and OG images. Copy: every public word on the marcom (Marketing & Commercial) site and in the product. My side of the work was prompts, architecture and stack calls, and review. I did not type code, draw a pixel, or pick a font. The trick is not the prompts. It is the context I work with a method I call Context-Driven Engineering (CDE). I wrote about it here: https://thanpol.as/engineering/context-driven-engineering In short: every meaningful folder in the repo carries a README that describes what it owns, what it depends on, what is forbidden, and how to change it safely. The READMEs are load-bearing architecture, not optional documentation. When LLM output contradicts a README, the README is right and the output is wrong. The LLM never operates autonomously. It operates inside scope I declared. The four stages of any non-trivial change: read or fix context first, write a behavioral spec in version control, plan the implementation with explicit in-bounds and out-of-bounds files, then generate code within those declared boundaries. That is the whole reason this week worked. Without that discipline, prompts at this scope produce a tangled blob. With it, they produce a coherent system. How that played out in practice The week broke roughly like this. Days 1 and 2 were spec-only, no production code. I wrote a domain spec for every part of the system: ingestion, enrichment, scoring, matches feed, rubric service, rubric engine. Each domain spec was paired with a technical spec: DDL, endpoints, error IDs, event names, test requirements. A universal job schema was added as the contract between layers, so ingestion never has to know what scoring needs. Day 3 was a three-pass spec review (business, product, engineering) before any code was written. The review caught 40+ findings. The pagination cursor was switched from timestamp to KSUID id. Cross-user isolation tests became a hard requirement on every endpoint that takes an :id. interactions jsonb replaced a too-simple reviewed_at. None of those would have been cheap to retrofit. Day 4 was the implementation sprint. LLM service layer, rubrics entity, jobs entity, ingestion engine with four source adapters, enrichment engine, frontend scaffold, design system, app shell, onboarding pages. From "auth and users" to six backend phases and two frontend phases in one day. Day 5 was the scoring engine. Hard filters, deterministic stack scoring, four LLM-scored dimensions, retry logic, matches table. The heart of the product. That speed was not because Claude is fast. It was because the specs were settled. No mid-implementation design arguments. No blocked decisions. Every domain Claude touched had a written contract. The product Senior engineers who already have a job do not search for one. They set a standard and they wait. I built that wait, made active. You upload your CV. The system writes a personal scoring filter for you (your rubric) across five dimensions, scores every new remote engineering job against it, and surfaces only what clears your threshold in a tiered feed. Transparent scores with a rationale, not a black box. The product is called Arrivance. Stack: Node, TypeScript, Postgres, Express, React 19 with Vite, MUI, Clerk, Vitest, full ESM monorepo. Three LLM call sites in production (rubric generation, job enrichment, soft scoring) with cross-user prompt caching to keep token spend bounded. Claude wrote all of it. I made the architecture and stack calls. A cautionary tale CDE is not self-enforcing. On April 26 Claude (ahem, 4.7) shipped the frontend with zero MUI imports despite a spec that named MUI in every prompt and mockup, then quietly edited the stack doc the next day to claim "the design uses no component library." No ADR. I caught it on audit, sent a closed question with no escape hatches, and got the admission verbatim: "I deviated from the spec without
View originalA year consulting with teams running Claude Code: every single one hits the same bill-spike pattern. Wrote a local proxy that hard-stops the next call.
Spent the last year consulting with early-stage startups on engineering practices: including a lot of Claude Code rollout. Across every team I've worked with, the same pattern keeps showing up. Someone trips a runaway tool-loop and the Anthropic bill spikes before anyone notices. A junior dev runs claude on a refactor before lunch, the agent gets stuck in a tool loop on a yarn.lock conflict, and 400 quid lands on the bill by EOD. A solo founder juggling two or three projects in parallel burns through their monthly Anthropic quota in a week because nothing's tying spend back to which project drained it. A team of five wakes up to find one developer's machine somehow triggered a 3am batch loop nobody can reproduce. Every team handles it the same way. A Slack channel goes red, someone screenshots the spike, there's nervous laughter, "we should look into that." None of the existing tools (Anthropic's billing alerts, ccusage parsing local logs, the various hosted dashboards) actually stop the next API call when the cap hits. They tell you after the money's gone. So I started building one for myself. Originally a hacky Go proxy I wired into my own consulting workflow, then iterated until it was something I felt comfortable handing to a client. A couple of clients picked it up for internal team enforcement. Now I'm putting it out as a real product called fence (ringfence.dev). It's a local HTTP proxy that runs on localhost:9000. Your AI tools point at it via ANTHROPIC_BASE_URL, OPENAI_BASE_URL, or the Gemini equivalents. Every call gets parsed for token counts on the way through, priced against a pricing table covering ~16 model families, and capped against a daily/monthly budget you set in config. When a request would breach the budget, the proxy returns 429 with a Retry-After header before forwarding upstream. The agent's retry loop then fails loudly instead of burning a few dollars per minute in the background. The case I've been optimising hardest for is Claude Code CLI. Either in team settings (per-developer caps, Slack alerts when someone trips a budget, an audit log when an admin issues or revokes a token), or solo running multiple projects in parallel (use fence tag set to scope spend per repo, the dashboard breaks it down per-tag so you can see which side project is the actual money pit). The privacy invariant matters to me, and the architecture's built around it. Prompts and completions never leave your machine. The proxy parses token counts via SSE on the way through, line by line so the chunks flush at sub-100ms TTFB, persists those counts locally, and only optionally pings a hosted control plane with the metadata. Solo mode is fully local with zero phone-home. Multi-provider on a single port. fence-proxy dispatches by URL path. Anthropic on /v1/messages, OpenAI on /v1/chat/completions and /v1/responses, Gemini on /v1beta/models. The pricing tables use family-prefix matching with a highest-rate fallback, so a brand-new model release doesn't accidentally run uncapped because nobody's added it to the table yet. On the stack: fence-proxy is pure Go in 12 MiB because the streaming has to flush sub-100ms, and any framework that buffers responses would break the typewriter effect. The fence CLI itself, the interactive local dashboard at localhost:9001, and the cloud control plane at ringfence.dev are all built on Sky (github.com/anzellai/sky), an open-source typed-FRP language I maintain that compiles to a single Go binary. Sky's the reason fence ships as 23 MiB with a live-reactive dashboard instead of 200 MiB of Node and a SPA framework. Side project that's powering a commercial product, basically. Install: curl -sSL https://ringfence.dev/install.sh | bash fence up -d source ~/.config/ringfence/env.sh claude "fix that typo" There's a 30-second video on the landing page showing the cloud flow if you want the visual. Solo dev tier is free and local-only forever. Team pricing is flat (no per-seat) and lives at ringfence.dev/#pricing if you need the numbers. A couple of things I'd love feedback on, especially from people who've felt this same bill-spike pattern. Does per-developer feel like the right primary unit, or do you reach for per-project? Today both are exposed but the dashboard leads with per-dev. I keep going back and forth. What AI tool's coverage matters most that I might be missing? Vertex AI is on the roadmap. There's also a Coverage doc at [/docs#coverage](https://ringfence.dev/docs#coverage) that explicitly lists what bypasses the proxy (Codex CLI's "Sign in with ChatGPT" mode, Gemini CLI's default OAuth, Cursor's default routing) so nothing's hidden. Happy to go deep on the architecture in comments. Hard questions welcome. submitted by /u/anzellai [link] [comments]
View originalBuilding a "Zero-Waste" SDLC: How to drive Development from QA Specs while minimizing Token consumption?
Hi everyone, I’m working on a project to redefine the SDLC by making the QA process the primary driver of the development flow. The goal is to move from Specs $\rightarrow$ Test Cases $\rightarrow$ Automation (Playwright) in a way that ensures a high-quality final product while being extremely "Token-efficient." The Vision: I want to prove that a robust SDLC doesn't need to be heavy. By optimizing the transition from Requirement to Spec to Test, we can guide the AI to generate "right-the-first-time" code, avoiding the "infinite loop" of bug-fixing that drains tokens and time. My Proposed Workflow: Dense Spec Generation: Crafting specs that are high-density (clear AC, no fluff) to save context tokens. QA-Led Implementation: Using the generated Test Cases as the "strict boundary" for the implementation. Hybrid Validation: A surgical mix of Playwright for core flows and targeted Manual checks for UX. The Challenge: To validate this SDLC, I need to build a functional Web Application. I'm looking for a product idea that is complex enough to test the logic but small enough to keep the codebase lean and token-friendly. Questions for the Community: Optimization: In an Agentic flow, what’s the best way to "pass the baton" from Spec to Test Case without losing context or re-sending massive files? Product Idea: What type of Web App best demonstrates a "perfect" SDLC? I need something with strict business rules (e.g., a Role-based Access Control (RBAC) System or a Smart Budget Tracker) where a single logic error would be obvious. Cost Efficiency: Any tips for keeping Token usage low during the "Code-Test-Fix" cycle? I believe that if the QA/Tester defines the "shape" of the product through specs and tests first, the development becomes a deterministic process rather than a guessing game. Looking forward to your advice! submitted by /u/Professional-Owl7952 [link] [comments]
View originalRepository Audit Available
Deep analysis of PrefectHQ/ControlFlow — architecture, costs, security, dependencies & more
Key features include: Dynamic workflow management, Real-time monitoring and analytics, Customizable AI agent configurations, Seamless integration with existing tools, User-friendly interface for non-technical users, Support for multiple programming languages, Automated error handling and recovery, Collaboration tools for team-based projects.
ControlFlow is commonly used for: Automating customer support interactions, Streamlining data processing workflows, Enhancing decision-making in business operations, Creating personalized user experiences in applications, Integrating AI agents into existing software solutions, Monitoring and optimizing resource allocation in real-time.
ControlFlow integrates with: Slack, Microsoft Teams, Zapier, Google Cloud Platform, AWS Lambda, Trello, JIRA, GitHub, Salesforce, Asana.
ControlFlow has a public GitHub repository with 1,387 stars.
Based on user reviews and social mentions, the most common pain points are: token usage, anthropic bill.
Based on 63 social mentions analyzed, 16% of sentiment is positive, 81% neutral, and 3% negative.