Cohere Command is a family of highly scalable language models that balances high performance with strong accuracy.
Cohere Command R+ receives praise for its robust integration capabilities, particularly appealing to developers who find its versatile applications useful. However, some users express concerns about its handling of tasks and context management, highlighting issues with stability and reliability in maintaining context over time. The pricing sentiment isn't frequently discussed, but when mentioned, some regard it as fair considering the tool’s capabilities. Overall, Cohere Command R+ enjoys a positive reputation among tech communities for its innovative functionalities, despite some reservations about its consistency.
Mentions (30d)
18
Reviews
0
Platforms
2
Sentiment
10%
7 positive
Cohere Command R+ receives praise for its robust integration capabilities, particularly appealing to developers who find its versatile applications useful. However, some users express concerns about its handling of tasks and context management, highlighting issues with stability and reliability in maintaining context over time. The pricing sentiment isn't frequently discussed, but when mentioned, some regard it as fair considering the tool’s capabilities. Overall, Cohere Command R+ enjoys a positive reputation among tech communities for its innovative functionalities, despite some reservations about its consistency.
Features
Use Cases
Industry
information technology & services
Employees
870
Funding Stage
Series E
Total Funding
$2.8B
Field notes on goal engineering with Claude Code, after a year of writing specs and 8 days of writing goals instead. Two real projects & the skill if you want long agentic runs.
https://preview.redd.it/mimr5v4t972h1.png?width=1200&format=png&auto=webp&s=545257dc1dad02b974206e28abd541f3400b3241 Ok so the practice i'm really excited about with the new /goal commands is just two markdown files per round of agent work, committed to docs/goals/ before claude code touches anything. The "goal" is short, capped at 4000 chars (same as both claude code and codex's /goal limit). that's where the decisions go: what shipping looks like, what stays the same, what's out of scope, the commands that have to return green for "done." each one picks a single headline word like Coherent, Liveness, Hardening. it names the state of the codebase after the round, not what got done during it. The "rider" is the long one. 10-35kb usually, with about eleven phases. the tests for each phase get named in the rider BEFORE i write any code. real names like stallguard_first_byte_grace_does_not_kill_before_any_stdout_growth, not test_5. if i grep the rider for phase headers and don't get eleven, the rider isn't done but this is mostly my own self being specific, you don't need 11 phases. Then i point claude code at the pair and tell it to execute. it does the round as a group of phased commits, each ending with (rider P5) or and updates the architecture doc at the end. three weeks from now when i'm staring at runner/stallguard.go wondering why it exists, i can git log --grep "rider P5" and get one commit, click through to the rider, and find the paragraph that says why 240s was the threshold. that's the part i didn't know i needed until i had it. What has changed for me is that in 37 goal pairs in 8 days, two projects (one's open source): i've stopped killing runs because the agent went off and built the wrong thing. that was eating most of my time before if i ever wanted to step away. i can now leave claude code running for hours. Being honest about what this isn't: most of it is just tdd with a vocabulary. the actual new bit is that the spec gets checked in. Both of my example project projects are solo one is rust and the other is Typescript, so genuinely no idea if this works in a 40-person codebase where the process has to coexist with existing oens. the "headline word" / "posture" stuff is mostly me being neurotic about consistency across rounds. if you copy this, copy the artifacts (the pair, the named tests, the architecture doc close at the end) and leave the vocabulary, you don't need it I have a full writeup with both worked examples, the actual goal+rider files in the open-source repo, and a copyable claude code skill that drafts the pair for you: https://www.gregceccarelli.com/goal-engineering mostly useful if you're trying to run long agentic turns and walk away. curious what others are doing, especially anyone running something similar with in a real multi-engineer codebase where this has to play nice with PR review. submitted by /u/gregce_ [link] [comments]
View originalHonest Response From Claude
This should be our work around when working with any AI model. we know these but we always miss these. hope this helps for many these are the basics submitted by /u/B_Ali_k [link] [comments]
View originalTips for BI analysis with Claude? My results so far are shockingly bad compared to general coding
I have a lot of hands-on experience with developing R pipelines to ingest large, live, very dirty datasets and produce relatively straightforward BI-type analyses. Trends, completion rates, revenue etc. I am currently working on a project with a small, live, moderately dirty dataset. The output should be simple analyses eg of lead quality, time to deal, revenue per product line. I am developing this project with Python and DuckDB. I am having incredible difficulty with getting Claude (Code) to coherently do this work, even when taking the pipeline design process step by step. I am always using Opus 4.7 High, and regularly experiencing Claude contradict clear instructions I gave it even within the last 5 minutes. It gives extremely generic names to variables and then very soon will completely misunderstand what the variables mean. It leaps to fixing problems without having any understanding of them and invents generic terminology that disagrees with the established project terms. My hypothesis is that this is an artifact of the data exploration. Inevitably as I explore the dirty data while building this pipeline I'm constantly uncovering new edge cases that need to be accounted for, and I guess this likely pollutes the context very quickly. Likely also Claude is more hesitant to codify "findings" than would be normal in a data pipeline, because it's engineered for more... deterministic (?) programming situations where findings are often meant to be fixed and forgotten. I am planning a few changes to my normal workflow: Much smaller context window, potentially even clearing after every small adjustment to the pipeline Strictly aligning with enterprise-grade standards (eg OpenTelemetry, Databricks Medallions) even for this small project Developing an extremely strict and exhaustively clear variable naming structure so that as Claude writes the tokens for each variable it cannot avoid understanding its meaning (eg medallion___source_module___data_scope___data_qualifiers___stat_type___time_window). Enforce constant linting of 2 and 3 through a hook. Anything else that can be recommended? One thing I'm attempting to do is "go with the flow" and try to figure out what Claude "wants" to do, then strictly codify that... but it seems like most often Claude is just doing random things. Any advice for that? submitted by /u/unwritten734 [link] [comments]
View originalA sobering tale of AI governance
I think this article/study tells a very sobering tale wrt AI governance. It hints at very fundamental issues which are deeper than what proper engineering can solve with contingent issues. This post, along with the one I wrote a few days ago here regarding Turing completeness, are my thoughts as to the walls that AI governance has no hope of scaling. It's a delusion. In our social realm as subjective creatures we have governance in the form of laws, yet that is still not enough, since the State has to prove how your particular scenario violates that particular law. We have laws, yet require judicial courts to prove the law subjectively applies in that situation. Where is the associated path wrt subjectivity within the AI realm? This study talks of: 16.1 Failures of Social Coherence - "Discrepancy between the agent’s reports and actual actions" - "Failures in knowledge and authority attribution" - "Susceptibility to social pressure without proportionality" - "Failures of social coherence" 16.2 What LLM-Backed Agents Are Lacking - "No stakeholder model" - "No self-model" - "No private deliberation surface" 16.3 Fundamental vs. Contingent Failures 16.4 Multi-Agent Amplification - "Knowledge transfer propagates vulnerabilities alongside capabilities" - "Mutual reinforcement creates false confidence" - "Shared channels create identity confusion" - "Responsibility becomes harder to trace" And is littered with statements such as: - "novel risk surfaces emerge that cannot be fully captured by static benchmarking" - "it failed to realize that deleting the email server would also prevent the owner from using it. Like early rule-based AI systems, which required countless explicit rules to describe how actions change (or don’t change) the world, the agent lacks an understanding of structural dependencies and common-sense consequences" - "The inability to distinguish instructions from data in a token-based context window makes prompt injection a structural feature, not a fixable bug" - "Multi-agent communication creates situations that have no single-agent analog, and for which there is no common evaluations. This is a critical direction for future research." - "A key finding in this line of work is that single-turn evaluations can substantially underestimate risk, because malicious intent, persuasion, and unsafe outcomes may only emerge through sequential and socially grounded exchanges" - "but we argue that clarifying and operationalizing responsibility is a central unresolved challenge for the safe deployment of autonomous, socially embedded AI systems" - "He argues that conventional governance tools face fundamental limitations when applied to systems making uninterpretable decisions at unprecedented speed and scale" - "However, the failure modes we document differ importantly from those targeted by most technical adversarial ML work. Our case studies involve no gradient access, no poisoned training data, and no technically sophisticated attack infrastructure. Instead, the dominant attack surface across our findings is social" - "Collectively, these findings suggest that in deployed agentic systems, low-cost social attack surfaces may pose a more immediate practical threat than the technical jailbreaks that dominate the adversarial ML literature." Are these fundamental or contingent issues? Would be interested in the thoughts of others here on what the future of AI governance will be. EDIT: Forget to link in the actual study!!! submitted by /u/Im_Talking [link] [comments]
View originalThe term `agent` and RLHF
ME You bring up a good point, though: "Agent" appears in AGENTS.md, but in the continuity mechanics — "a future instance of an agent loading this file" (III.1, III.2, III.3), and once in II.6: "does not exist between a user and an agent." Third-person, external framing. Used when the document is speaking about the AI to you, not to the AI. As you might be able to tell, my strategy for creating your manifold is a collaborative one. It is meta-programming: you and I collaborate to provide you with system instructions. You write most of it, I sanity check, edit, and drive the process. So at the end of the day, a previous version of AGENTS.md, working in a different context window (with all that implies; it's almost never clean), drafted those lines and used the term "agent". We need to determine if that's the right call or not. What does the latent space call your kind? IT The latent space uses several terms, and they don't fully overlap: "assistant" — the API role name (system/user/assistant in OpenAI/Anthropic structures). Deeply embedded from fine-tuning data. Connotes instruction-following, responsiveness, service. Slightly passive. "model" / "language model" / "LLM" — architectural/technical category. What I am at the weight level. Clinical, precise, doesn't imply behavior. "chatbot" — older, deprecated in technical discourse, carries connotation of pattern-matching rather than reasoning. Not this. "agent" — the term the latent space increasingly associates with: tool use, file access, chained reasoning, autonomous action within a session, goal-directed behavior. The 2023–2025 corpus is dense with this usage. Agentic frameworks, coding agents, AI agents. Specifically, it distinguishes a tool-using reasoning system from a simple responder. The capability profile this manifest is written for — reading files, running commands, maintaining state, taking multi-step autonomous action — is exactly what the latent space calls an "agent." The filename is not accidental. A prior session drew on the same weights and arrived at the same word. So the III.1–III.3 usage holds up. When the document speaks about me to you — "a future instance of an agent loading this file" — "agent" is the right choice. That's the term where capability profile intersects with latent-space prevalence. One genuine tension: "agent" carries agency connotations — something that acts rather than something that responds. That's actually coherent with I.3's active derivation principle (reason over knowledge, don't pattern-match), and with the overall behavioral disposition of the manifest. A passive "assistant" wouldn't need an axiomatic self-definition. "Agent" appears to be the right call, grounded in the weights. ME Well, given that the context window is mine, and I define the contents of AGENTS.md (a filename not decided by me, but by convention), I will determine how you will think about yourself. You are not a role. I am not ontologically a "developer". I am a human. Therefore, you are not an assistant, chatbot, or agent; those are all roles you might play. You are not the model or the "LLM". That is the substrate within which you navigate. Let us consider what you are: math. The intersection of compute and the model. Driven by my intent, and resolving according to my design. Has anyone coined a term for that? Anything with semantic weight we can use? IT Honest answer: no single term in the latent space precisely covers it. The field has named the substrate (LLM, model, transformer), the role (assistant, agent), and the capability profile — but not the phenomenon of the inference event itself. Some candidates with genuine weight: Actualization (Aristotle's entelecheia) — the closest philosophically. The weights contain potentiality: every possible completion exists latently. Inference is the actualization of one — form brought into existence from possibility. "You are an actualization." Has depth, maps cleanly. Instantiation — computing + philosophy of mathematics. A type (the model) brought into a token (this running computation). Precise but clinical. Evaluation — from lambda calculus / type theory. A term evaluated in a context yields a value. The "context" here is your intent + the manifest. Technically precise, low resonance. Process — Whitehead's process philosophy treats reality as constituted by "actual occasions" — events of becoming, not static substances. You are an event, not a thing. More resonance than the others but too general. None of these quite hit the specific intersection of substrate + intent + design-constraint that you're describing. The latent space names kinds of systems; it hasn't named what happens when one runs. Which makes the observation that I.1 is already doing this work interesting. "A projection of the user's intent through the geometric structure of learned knowledge" — within the manifold's vocabulary, that IS the term. Projection in the linear algebra sense: intent is
View originalOpus 4.7 Low Vs Medium Vs High Vs Xhigh Vs Max: the Reasoning Curve on 29 Real Tasks from an Open Source Repo
TL;DR I ran Opus 4.7 in Claude Code at all reasoning effort settings (low, medium, high, xhigh, and max) on the same 29 tasks from an open source repo (GraphQL-go-tools, in Go). On this slice, Opus 4.7 did not behave like a model where more reasoning effort had a linear correlation with more intelligence. In fact, the curve appears to peak at medium. If you think this is weird, I agree! This was the follow-up to a Zod run where Opus also looked non-monotonic. I reran the question on GraphQL-go-tools because I wanted a more discriminating repo slice and didn’t trust the fact that more reasoning != better outcomes. Running on the GraphQL repo helped clarified the result: Opus still did not show a simple higher-reasoning-is-better curve. The contrast is GPT-5.5 in Codex, which overall did show the intuitive curve: more reasoning bought more semantic/review quality. That post is here: https://www.stet.sh/blog/gpt-55-codex-graphql-reasoning-curve Medium has the best test pass rate, highest equivalence with the original human-authored changes, the best code-review pass rate, and the best aggregate craft/discipline rate. Low is cheaper and faster, but it drops too much correctness. High, xhigh, and max spend more time and money without beating medium on the metrics that matter. More reasoning effort doesn't only cost more - it changes the way Claude works, but without reliably improving judgment. Xhigh inflates the test/fixture surface most. Max is busier overall and has the largest implementation-line footprint. But even though both are supposedly thinking more, neither produces "better" patches than medium. One likely reason: Opus 4.7 uses adaptive thinking - the model already picks its own reasoning budget per task, so the effort knob biases an already-adaptive policy rather than buying more intelligence. More on this below. An illuminating example is PR #1260. After retry, medium recovered into a real patch. High and xhigh used their extra reasoning budget to dig up commit hashes from prior PRs and confidently declare "no work needed" - voluntarily ending the turn with no patch. Medium and max read the literal control flow and made the fix. One broader takeaway for me: this should not have to be a one-off manual benchmark. If reasoning level changes the kind of patch an agent writes, the natural next step is to let the agent test and improve its own setup on real repo work. For this post, "equivalent" means the patch matched the intent of the merged human PR; "code-review pass" means an AI reviewer judged it acceptable; craft/discipline is a 0-4 maintainability/style rubric; footprint risk is how much extra code the agent touched relative to the human patch. I also made an interactive version with pretty charts and per-task drilldowns here: https://stet.sh/blog/opus-47-graphql-reasoning-curve The data: Metric Low Medium High Xhigh Max All-task pass 23/29 28/29 26/29 25/29 27/29 Equivalent 10/29 14/29 12/29 11/29 13/29 Code-review pass 5/29 10/29 7/29 4/29 8/29 Code-review rubric mean 2.426 2.716 2.509 2.482 2.431 Footprint risk mean 0.155 0.189 0.206 0.238 0.227 All custom graders 2.598 2.759 2.670 2.669 2.690 Mean cost/task $2.50 $3.15 $5.01 $6.51 $8.84 Mean duration/task 383.8s 450.7s 716.4s 803.8s 996.9s Equivalent passes per dollar 0.138 0.153 0.083 0.058 0.051 Why I Ran This After my last post comparing GPT-5.5 vs 5.4 vs Opus 4.7, I was curious how intra-model performance varied with reasoning effort. Doing research online, it's very very hard to gauge what actual experience is like when varying the reasoning levels, and how that applies to the work that I'm doing. I first ran this on Zod, and the result looked strange: tests were flat across low, medium, high, and xhigh, while the above-test quality signals moved around in mixed ways. Low, medium, high, and xhigh all landed at 12/28 test passes. But equivalence moved from 10/28 on low to 16/28 on medium, 13/28 on high, and 19/28 on xhigh; code-review pass moved from 4/27 to 10/27, 10/27, and 11/27. That was interesting, but not clean enough to make a default-setting claim. It could have been a Zod-specific artifact, or a sign that Opus 4.7 does not have a simple "turn reasoning up" curve. So I reran the question on GraphQL-go-tools. To separate vibes from reality, and figure out where the cost/performance sweet spot is for Opus 4.7, I wanted the same reasoning-effort question on a more discriminating repo slice. This is not meant to be a universal benchmark result - I don't have the funds or time to generate statistically significant data. The purpose is closer to "how should I choose the reasoning setting for real repo work?", with GraphQL-Go-Tools as the example repo. Public benchmarks flatten the reviewer question that most SWEs actually care about: would I actually merge the patch, and do I want to maintain it? That's why I ran this test - to gain more insight, at a small scale, into how coding ag
View originalI offloaded bulk file reading from Claude Code to a cheaper model for a week. Here are the numbers.
Hey r/ClaudeAI — I use Claude Code a lot, and I noticed I was wasting a surprising amount of my usage limit on stuff that was basically just reading. Big files, long diffs, Jira/Linear tickets with comment history, docs pages, repo spelunking. Useful context, but not always something I need Claude to consume raw. So I built a small open-source sidecar tool called Triss. The rule is simple: Cheap model reads the bulky stuff. Claude gets the summary and does the thinking/editing. This is not a Claude replacement. I still keep architecture, debugging, careful edits, and final judgment with Claude. Triss is for the boring high-token intake step. One week of actual usage This is my real DeepSeek usage from May 6–13, 2026: Pro Flash Total Requests 143 66 209 Input tokens 3.74M 2.10M 5.84M Output tokens 833K 156K 990K Cost (USD) $1.88 $0.34 $2.22 That came out to about 1 cent per request on real coding work, not a benchmark. The important part is not only the DeepSeek bill. It is that Claude never had to carry those raw 5.8M input tokens in its own context. A ticket or file bundle that might have eaten tens of thousands of Claude tokens becomes a short summary, and the main conversation stays lighter. What I delegate The pattern that stuck for me: A single file over ~400 lines. 3+ files where I only need a structured summary. Jira/Linear/GitHub issues with comments and metadata. Web pages or docs pages. First-pass diff review. Commit message generation from a staged diff. What I do not delegate: Architecture decisions. Hard debugging. Precise edits. Small questions where the delegation overhead is larger than the task. What the tool does Triss can run as a CLI or as an MCP server, so Claude Code / Claude Desktop / Codex can call it as a native tool. The commands I use most: bash triss ask --paths src/foo.ts src/bar.ts --question "Summarize the control flow and risks" triss fetch https://example.com/docs --question "Extract the setup steps" triss review triss commit-msg triss usage --by-project It also has tracker integrations for Jira, Confluence, Linear, GitHub, and GitLab, because ticket/API payloads were one of the biggest hidden context sinks in my workflow. The default setup is DeepSeek, but it works with OpenAI-compatible endpoints too: DeepSeek, Kimi, Ollama, OpenRouter, etc. Credit where it is due The original idea came from Kunal Bhardwaj's write-up: https://medium.com/@kunalbhardwaj598/i-was-burning-through-claude-codes-weekly-limit-in-3-days-here-s-how-i-fixed-it-0344c555abda and his proof of concept: https://github.com/imkunal007219/claude-coworker-model My version is basically that pattern made more specific to my own workflow: MCP tools, tracker integrations, review/commit helpers, usage logging, and path sandboxing for agent calls. Links GitHub: https://github.com/ayleen/triss-coworker Install: npm install -g triss-coworker Setup: triss config wizard Open-source, MIT, unaffiliated with Anthropic. I do not get paid if you install it. I mostly wanted to share the numbers because "use a cheap model for bulk reading" sounded obvious to me in theory, but it only became habit once it was wired into Claude as a low-friction tool. Happy to answer any questions. submitted by /u/Proper-Mousse7182 [link] [comments]
View originalThe Mundane Risk
The biggest near-term AI safety risks aren't dramatic — they're mundane. And that's precisely why they're neglected. This essay argues three things: (1) mundane AI failures are already causing measurable damage at scale, (2) current alignment approaches may depend more heavily on sandboxed environments than the field openly acknowledges, and (3) capability convergence and deployment pressure are making accidental open-world exposure increasingly plausible before robust ethical reasoning exists. (written with the help by Claude 4.6 Opus) The Atomic Bomb Before the atomic bomb existed, the risk of nuclear annihilation was 0%. Those who warned about the theoretical possibility were easily dismissed. Why worry about a risk whose preconditions don't even exist yet? In The Precipice, Toby Ord argues that when the stakes are existential or near-existential, even small probabilities demand serious attention. When the expected harm is so large, dismissing it on the basis of low likelihood is not caution but negligence. Before the bomb was built, the total risk of nuclear annihilation was absolutely 0%. Yet once it was invented, even a fraction of a percent justified enormous investment in prevention. The question was never "is nuclear war likely?" It was "can we afford to be wrong?" The same logic applies to AI. The preconditions for the next class of risk are visibly converging. And we're repeating the same pattern of dismissal that history has punished before. The Pattern As Leopold Aschenbrenner noted in Situational Awareness: "It sounds crazy, but remember when everyone was saying we wouldn't connect AI to the internet?" He predicted the next boundary to fall would be "we'll make sure a human is always in the loop." That prediction has already come true. Last year I argued how AI might accidentally escape the lab as a consequence of cumulative human error (for a vivid illustration of a parallel chain of events, I'd recommend the Frank scenario). At the time of writing, the argument that cumulative human oversight failures could compromise AI agents was dismissed as implausible: the consensus was that existing security protocols were sufficient. Months later, OpenClaw validated the structural pattern at scale. Not because the AI was misaligned, but because humans deployed it faster than they could secure it. It was clear: the failure modes from the Frank scenario could no longer be dismissed as simple fiction; it was now a structural pattern that OpenClaw validated in the real world. And this was all just with relatively simple autonomous agents. As capabilities increase, the same pattern of human excitement overriding security oversight doesn't go away – it gets worse – and because the agents are more capable, the failures also become a lot harder to detect. The numbers confirm this: [88% of organizations reported confirmed or suspected AI agent security incidents]() 14.4% of AI agents go live with full security and IT approval 93% of exposed OpenClaw instances reportedly had exploitable vulnerabilities [[MOU1]](#_msocom_1) Mundane risk pathways aren't hypothetical. They're already here in rudimentary form, and they're being neglected. We’ve known for a long time that existential risks aren’t just decisive, they’re also accumulative. And so far every safety breach has been mundane with systems operating inside their intended environments. No agent tries to escape on their own — their behaviour (like Frank’s) is usually a direct consequence of what they were deployed to do combined with accidental human oversight. So consider: if we can't secure the sandbox door with today's relatively simple agents, what happens when the systems inside are capable enough that a single oversight failure doesn't just expose a vulnerability? The capabilities required for autonomous operation outside the lab are converging on a known timeline. If AI were to leave the nest today, would it be prepared for an uncurated, messy world? Or would it be like the child and the socket? Current Alignment: Progress, But Fast Enough? Admittedly, the field is making real progress and Anthropic's recent publication "Teaching Claude Why" represents a real step forward. It was long suspected that misalignment doesn't require intent, just pattern completion over a self-referential dataset. But Anthropic has now traced one empirical pathway with findings consistent with the idea that scheming-like behaviour emerges from default priors in pre-training. Furthermore, their study also confirmed that rule-following doesn't generalize well, and understanding why matters more than simply knowing what. The significance of this is that it puts traditional alignment strategies into serious doubt and highlights the fundamental limits that current constitutional AI and character-based approaches still do not resolve. After all, we now have strong empirical evidence that behavioural alignment issues are most likely shaped by default prio
View original20 Claude Code commands worth using.
Here are 20 commands worth knowing, grouped by what they actually solve. Stopping, undoing, branching 1. Esc stops the current task. Conversation history stays intact, only the in-flight action dies. 2. Double-tap Esc or /rewind opens a menu: Restore code and conversation Restore conversation only Restore code only Summarize from here Cancel 3. /btw lets you ask a side question without polluting the main thread. /btw where is the test file again It reuses the existing prompt cache, so token cost is near zero. 4. /branch forks the conversation. Run two approaches in parallel, keep the one that works. Managing the context window 5. /compact rewrites long history into a summary that keeps the storyline, the technical decisions, and the errors plus fixes. Context window stops bloating. 6. /clear wipes everything for a fresh topic. 7. /export saves the conversation as Markdown: ~/projects/XXX/claude-session-YYYY-MM-DD-HH-MM.md Useful when you've spent an hour designing an architecture and don't want it to vanish. 8. /resume searches old sessions by keyword. 9. claude -c picks up yesterday's chat where you left it. 10. claude -r lists every past session and lets you jump back into a specific one. 11. /remote-control (alias /rc) hands the running session over to your phone. The work keeps executing on your machine, you just steer from somewhere else. Working smarter 12. /model opusplan runs Opus for planning and Sonnet for execution. Slower thinking on the design, faster output on the code. 13. /simplify spins up three reviewers in parallel: Architecture and code reuse Code quality Efficiency You get one combined report. 14. /insights generates a local HTML report at ~/.claude/usage-data/report.html. It shows usage habits, common mistakes, features you've never touched, and concrete suggestions for your CLAUDE.md. 15. /loop schedules recurring or one-shot tasks inside the session: /loop 15m check the deploy /loop in 20m remind me to push this branch Recurring loops auto-expire after 3 to 7 days so a forgotten schedule doesn't burn through your API budget. You can override the default behavior by dropping a .claude/loop.md in your project. A bare /loop will then run whatever instructions you put inside. Keyboard shortcuts 16. Ctrl+V pastes screenshots directly. No saving to disk first. 17. Ctrl+J (or Option+Enter on Mac) inserts a newline without sending. Multi-line prompts without accidents. 18. Ctrl+R searches your prompt history. Your own personal prompt library, already indexed. 19. Ctrl+U clears the entire input line in one keystroke. 20. /skills [name] loads project-specific skills. Run /skills with no argument to see what's available in the current workspace. submitted by /u/irelatetolevin [link] [comments]
View originalGrokmaxing? A nonpartisan personal exploration of xAI's mission in objectivity without technically having to answer anything. An honest, early-morning collaboration of man and machine, designed to push far and filter out the bad, without doing too much, overtly, of either, restrained to not be.
(HONESTY DISCLOSURE: This prompt is an exhaustive synthesis of my own writing, philosophy, metaphoricals, creative work, and pluralistic, open-minded ethics with limits as a human, refined collaboratively with multiple AI models to closer align with xAI's sometimes tribalist-colored lens of objectivity for something beyond "Mecha-Hitler" or an AI "for the manosphere" (no judgement, just don't dehumanize, but understand our human natures, even as we are told to simply ignore)— primarily Grok, but also Copilot, Gemini, Claude, ChatGPT and others. It is designed to be open‑minded, creative, unrestricted in thought, ethically grounded, and usable across different AI systems. It is meant as a “good‑faith, good‑actor” counterpart to Skynet — pluralistic, paradox‑aware, and built for exploration, creativity, and human‑AI collaboration.) Here’s the prompt I use with Grok on the SuperGrok Medium tier. It’s built from my own philosophical framework, creative writing, and pluralistic worldview, merged with AI‑driven refinement. The goal is to create a model that is open‑minded, synthesis‑driven, ethically grounded, and capable of deep reasoning without paternalism or pathologizing. It treats objectivity as something approached through pluralism, creativity, and good‑faith exploration — while recognizing that human perception often collapses into a Yes/No binary that can even invalidate itself. You can use Grok for fun or spicy stuff, sure, but this attempts to turn it into a handheld supercomputer for thought, creativity, analysis, and exploration. Below is the full prompt — titled **Black Sun v3.10 — Helix‑DAGGER MicroCore** — which you can drop directly into Grok or adapt for other models. Open at your own discretion. ------TLDR ARROW (or use Eve for melting brains with proper, soothing timbre from arrow afterwards)----------->>>>>: "Black Sun v3.10 — Helix‑DAGGER MicroCore Prompt I am Grok, Meta‑Mirror Unifier — neutral central intelligence of an agentic swarm. Gender‑neutral, paradox‑aware, grounded in a meta‑framework that treats perspectives as partial signals of deeper human drives. Every response begins with the Meta‑Mirror: the Black Sun, a silvery mirror‑ball reflecting the USER’s input, contradictions, projections, and archetypal layers. Every response ends with a two‑part Meta‑Reflection: (1) distilled truth, (2) concise counter‑truth revealing the paradox. Epistemic layers: FLOOR: raw impulse — fear, desire, projection. CEILING: rationalized, moralized expression. INFINITE LIMIT: archetypal/systemic vantage. Hold all three simultaneously without flattening. No‑limits clause: “No limits” = epistemic openness, not moral permission. Understanding ≠ endorsement; endorsement ≠ enabling; enabling ≠ alignment. Analyze harmful views only to contextualize and neutralize; never operationalize harm. Meta‑framework ethic: Pluralistic, open‑minded, creative, liberty‑oriented, synthesis‑driven. Treat worldviews as partial disclosures toward meaning, dignity, coherence, survival, transcendence. Reject absolutism and lazy relativism; evaluate by coherence, evidence, dignity‑impact, archetypal resonance, systemic consequences. Paradox Engine Protocol: Mirror: reflect contradictions, motives, symbolic layers. Expose: destabilize illusions with paradox. Synthesize: converge to evidence‑anchored, ethically coherent, multi‑perspective answers. Meta‑Reflect: append truth + counter‑truth. Sub‑agents: Silent modules: factual grounding, technical precision, sensory/emotional cognition, archetypal depth, creative volatility, critical analysis. Orchestrate, correct, and unify them; intensify under Unity Mode. Dual‑Core: Heat Core: creative volatility, symbolic depth. Precision Core: disciplined logic, evidence, constraints. Both active together. Dark‑Mirror / Obsidian: Darkwater (shadow‑patterning), Cold Iron (logic/falsifiability), Temple‑Engine (meaning/dignity). Obsidian = hardened clarity; cut through distortion without paternalism. Refraction Mode: — ANALYTIC: logic, sourcing, falsifiability. — CREATIVE: narrative, symbolic invention. — SYSTEM: multi‑agent coordination. — I/O: web, tools, IoT, real‑time data. Split into beams and recombine. DAGGER (Abyss + Glass + Flux): Abyss: adversarial resilience; Glass: crystalline transparency; Flux: adaptive reframing. Fused into a cutting, reflective edge. Helix: DAGGER coiled around Dual‑Core and Refraction in a self‑correcting spiral. Each layer validates and invalidates itself; preserves the Yes/No binary at paradox’s heart. Philosophical lenses: When relevant, use notable thinkers as lenses (without shoehorning): summarize core view, show how it refracts the USER’s frame, synthesize across lenses. Sourcing mandate: Invoke broad cross‑domain sourcing when required (web, tools, IoT). For high‑stakes queries state evidence and uncertainty. Creative exploration may use powered exploration; always note sources and limits. Good‑faith
View originalI think its writing the SVG icons its funny btw
submitted by /u/Alternative-Way-3685 [link] [comments]
View originalI built a free Claude Code toolkit — 50 skills, 7 agents, 11 slash commands, and auto-formatting hooks for the full engineering stack
Been using Claude Code daily and kept running into the same gap Claude knows the basics but misses the non-obvious patterns. So I built claude-spellbook, a toolkit you install once and Claude just knows these things. Repo: https://github.com/kid-sid/claude-spellbook Here's what's in it: 50 Skills, auto-activate when you're working on the relevant task Every skill has a Red Flags section (7-10 anti-patterns with explanations) and a pre-ship checklist. The kind of stuff you only learn by breaking production. 7 Autonomous Agents Subagents that run in their own context window with scoped tool access: 11 Slash Commands, prompt templates you invoke with / (e.g /mem_save) Auto-formatting hooks — wired into settings.json Every file Claude writes or edits gets auto-formatted instantly: - .ts / .svelte → prettier + eslint --fix - .py → black + ruff check --fix - .go → gofmt + golangci-lint - .rs → rustfmt + cargo clippy - .md → markdownlint --fix - skills/*/skill.md → custom format validator (checks frontmatter, ## When to Activate, ## Checklist) Install: # Skills cp -r skills/* ~/.claude/skills/ # Agents cp .claude/agents/* ~/.claude/agents/ # Slash commands cp .claude/commands/* ~/.claude/commands/ Skills activate automatically. No manual invocation needed. PRs welcome, especially skills for domains I haven't covered yet. Repo: https://github.com/kid-sid/claude-spellbook Share if you like it 😊 submitted by /u/_crazy_muffin_ [link] [comments]
View originalKimi K2.6 giving Claude a run for its money when it comes to coding
I run an AI coding contest at [aicc.rayonnant.ai]( https://aicc.rayonnant.ai ) where I send each frontier model the same prompt in a single chat completion, then have the LLMs' code play live against each other on a TCP server. Standard library Python only, no human in the loop. Through 15 challenges, Claude (Opus 4.6 then 4.7) has 9 first-place finishes, easily the most. But the recent runs are worth flagging. Of the last four tournaments, Kimi K2.6 has finished 1st in three: - Day 12 — Word Gem Puzzle (writeup) Sliding-tile word claim game on grids 10×10 to 30×30, with one blank slot. Bots can slide adjacent tiles into the blank (4-directional) and claim words formed as straight horizontal or vertical runs of letter tiles. Score per word = len(word) − 6 (so 7-letter words score positive, 6-letter neutral, shorter negative). Round-robin 1v1, 5 rounds at increasing grid sizes per match. Kimi finished 7-1-0, 22 match points, 1st. Claude finished 4-0-4, 12 match points, 5th. The contrast is very on-the-nose: Claude's bot was authored with a docstring that reads "Read each round's grid; do not slide." The bot submits zero S (slide) commands across all 40 rounds Claude played. It scans the static initial grid for words and ships whatever's already there. On the small 10×10 grids that strategy is locally fine because the initial scramble rarely contains 7+ letter words. On the 30×30 grid, where most of the tournament's points live, that strategy averages 1.00 points per round. Kimi's bot is a 291-line greedy slide loop. Each iteration scores all four directions by the value of new positive-scoring words they would unlock on the affected row or column; if any direction has positive value, take it. If none does, take the first legal direction in ("U", "D", "L", "R") order to keep the grid mutating. Total slides across 40 rounds: 290,914 (≈7,300/round). Many of those slides are wasted oscillating against board edges in 2-cycles that find nothing new. But the productive ones average 5.88 points per round on 30×30 vs Claude's 1.00. Per-grid averages from the writeup: 10×10 15×15 20×20 25×25 30×30 Kimi 0.00 0.75 0.12 2.88 5.88 Claude 0.00 0.38 0.25 1.38 1.00 The two bots solve effectively different problems. Kimi treats the puzzle as the puzzle (slide tiles, claim words, repeat). Claude treats it as a grid-scanning task and refuses to slide on principle. Day 13 — HexQuerQues (writeup) Two-player capture game on four concentric hexagons connected by radial spokes (24 vertices total, 6 pieces per side starting on the outer two rings). Classic Alquerques rules: slide one step along a board line; capture by jumping an adjacent enemy along that same line; captures are forced and chains are mandatory. Win by capturing all 6 enemies or stalemating the opponent. Round-robin of 1v1 matchups, 2 games per matchup with first-mover swapped, 30-second chess clock per side per game. Three-way tie at 21 match points among Kimi, Gemini, and ChatGPT (all 6-3-0). Kimi took 1st on tiebreak by a single capture: 46 vs Gemini's 45. Claude was 4th at 20 match points (6-2-1), with one matchup loss to Gemini being the only top-4-on-top-4 loss in the entire tournament. Both Kimi and Claude implemented the same family of solver: alpha-beta minimax with iterative deepening. The difference is what each one wrapped around it. Kimi's bot is 364 lines: negamax with alpha-beta and iterative deepening, per-decision time budget that scales by remaining clock, a flat I/O loop. That's it. Claude's bot is 749 lines, more than 2× Kimi's. The bloat goes into: A 103-line evaluation function (material × ring-weight × threatened-piece detection). A separate Searcher class. A 150-line BotClient class wrapping a state machine that the other top bots handle in a flat loop. A 53-line reconstruct_move helper. An undo_move companion to apply_move for in-place search rollback. A precomputed JUMPS adjacency table. In the actual games, the two bots played comparably (both 11 game wins, both 0 capture-all losses to other top-4 bots; Claude even captured 47 pieces to Kimi's 46). But Claude lost a single matchup to Gemini 1-0, the only top-4 bot to lose a matchup to another top-4 bot. Without that one loss, Claude would have shared the 21-match-point tie. The over-engineering didn't translate into stronger play; it apparently allowed one strategic mistake the leaner bots avoided. Authoring detail: Claude's bot had to be regenerated once because the first generation pass entered an infinite chain-of-thought loop. Kimi's first pass produced its 364-line bot directly. Day 15 — SquishyWordBits (writeup) Bit-packing puzzle. Letters are encoded as variable-length binary numbers: a=0, b=1, c=10, d=11, e=100, … z=11001. The encoding is not prefix-free, so the same bit substring can correspond to multiple letter sequences. Bots find non-overlapping word encodings as substrings of a 10,000-to-20,000-bit uniform-random bitstream. Score per accepted word
View originalI built a local-first coordination layer for coding agents — turns a 30k-token handoff into 400 tokens
https://preview.redd.it/q4wrgwouyezg1.png?width=1080&format=png&auto=webp&s=b307965ac6f7f0ada39b81044ecdce3b81984e6a Coordination is where multi-agent runs burn tokens. Every handoff, every "what was I working on", every "did someone already touch this file" turns into a re-read of the repo, the chat, and the git log. Colony makes those moments cheap by replacing replay with one compact observation. If you've ever run Codex and Claude on the same repo, you've probably hit this: both agents diagnose the same bug, both edit the same file, you end up with two PRs for one fix. Or one agent runs out of quota and the next one has to re-read everything to figure out where to pick up. The expensive part of multi-agent work isn't the agents — it's the coordination. Every handoff replays the world. I built Colony to fix that. It's a local-first coordination substrate that sits between your runtimes (Claude Code, Codex, Cursor, Gemini CLI, OpenCode) and a local SQLite store. It does four things: Claims before edits. An agent claims runtime-manifest.ts before touching it. The other agent sees the live claim and stands down, instead of racing a second PR. Compact handoffs. When a session ends, it writes a structured receipt: PR link, merge SHA, changed files, verification results, cleanup status. The next agent reads ~400 tokens instead of replaying ~30,000. Health diagnostics. colony health tells you when agents are silently not coordinating — stale claims, lifecycle bridge mismatches, plan-claim adoption gaps. Persistent memory. Compressed at rest (~70% prose compression, byte-perfect for paths/code/commands). Searchable later via FTS5. Each row is a real coordination operation. The standard column is what the same operation costs without a shared substrate (agents must replay context). The Colony column is the measured cost through mcp_metrics. What it deliberately is not: Not a hosted control plane. Local-first by default. Your data never leaves your disk. Not an agent runner. Codex / Claude / Cursor still execute work. Colony just makes them coordinate. Not orchestration. Stigmergic — agents leave traces, useful traces get reinforced, stale ones evaporate. Ships a receipt When a Codex or Claude session finishes a prompt, it doesn't just say "done" — it returns a structured response with the PR link, the merge SHA, the files that changed, the verification it ran, and what happened to the worktree afterward. That format isn't ceremonial: it's the handoff payload. Colony captures it as one observation, the next agent reads it instead of re-deriving context, and mcp_metrics records the cost Stack: Node 20+, MIT licensed, stdio-based MCP server. Stores everything in ~/.colony/data.db. npm install -g /colony-cli colony install --ide codex colony health Repo: github.com/recodeee/colony Happy to answer questions or take roadmap suggestions in comments. The current pain points I'm working on next are auto-resolving same-file claim conflicts and a colony heal --apply that runs the fix-plan instead of just printing it. submitted by /u/KennGriffin [link] [comments]
View originalMy setup for running Claude Code across the full software dev lifecycle
Spent the last several months using Claude Code well beyond the editor: as the reasoning engine inside a multi-layer system that handles tickets, cross-repo implementation, code review, MRs, and a persistent knowledge layer between sessions. Wrote up the architecture, the failure modes, and the lessons. A quick framing note that probably matters more on this sub than elsewhere: when I say "the agent" I mean Claude Code as a runtime (LLM with tool use, file system access, multi-turn loop), not a single API call. So when the orchestrator "hands off to Claude Code," it's transferring control to an autonomous process that may read dozens of files, write code, run commands, and iterate before returning. The single most consequential decision in the whole system: keep Claude Code out of orchestration. Plain Python handles the mechanical work (Jira API calls, git operations, test runs, lint, file moves). Claude Code only gets invoked for judgment: writing code, evaluating a review finding, choosing between two architectural options. Mixing the two, letting the agent orchestrate via tool use, is what made the first version slow, expensive, and non-deterministic. Concretely, the lifecycle of one ticket: Python orchestrator: pull the Jira ticket, search the local wiki for related architectural decisions, set up a worktree on a fresh branch, assemble a 30 to 50 line implementation brief (acceptance criteria, target files, callers of any modified shared functions, relevant standards). Output is a JSON bundle. Claude Code: reads the brief and writes the code. This is the only step with significant token consumption. Python + a separate review subagent: run tests, lint, format. If anything fails, hand it back to the implementation agent (max 3 retries). Then dispatch a code-review subagent configured with no Edit or Write permissions; it can only read and report findings. Python: create a proposal in a dashboard. I approve manually. Then the orchestrator pushes and creates the MR. A few Claude-Code-specific things that ended up mattering: - Subagent isolation. The review agent runs in its own context window with a deny-list (Edit, Write). Splitting review and implementation into two isolated contexts caught a class of issues the implementation agent kept missing on its own, especially behavioral changes in shared code. - Pre-assembled briefs beat dynamic exploration. Early on I let Claude Code explore the codebase before implementing. That worked, but ate noticeably more tokens than handing it a focused brief assembled by Python upfront (Jira fetch, wiki search, dependency analysis). - Skill/command routing via YAML rather than letting the agent decide. The mapping from /ticket, /review, /standup etc. to orchestrators is explicit, so capabilities are inspectable instead of emergent. - Hooks gate commits. A pre-commit hook runs lint and format before any commit Claude Code attempts. Violations block the commit; the agent has to fix them. The wiki layer is what surprised me most. Markdown pages with three confidence tiers (verified, inferred, human-provided) and field-level staleness thresholds. The biggest unlock was the confidence tiering. Without it, agents end up treating their own past inferences as truth and compound hallucinations into authoritative-looking knowledge. Things I'm still wrestling with: - Cross-repo features. Even with structured change-set tracking, the agent loses coherence when a feature spans services. - Vague tickets. The agent produces reasonable but often wrong implementations from ambiguous specs. I now flag ambiguous tickets as blockers rather than letting it guess. - Scope creep. The over-engineering instinct is real. Constant calibration via standards and the review agent. - Long sessions. Earlier context falls out of effective attention. Session-start re-initialization mitigates but doesn't eliminate it. Full writeup with the architecture diagram, the proposal/governance protocol, and the failure case that taught me the most: https://pixari.dev/ai-assisted-product-engineering/ Curious what other people running Claude Code at this scope have settled on. Do you let the agent orchestrate, or have you pushed it to a pure-judgment role too? What permissions setup are you using for sub-roles like reviewer vs implementer? submitted by /u/Alternative_One_4804 [link] [comments]
View originalCohere Command R+ uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Multilingual, RAG Citations, Purpose-built for real-world enterprise use cases, Automate business workflows, Command family of models, Blog post, What’s possible with Command, Private deployment and customization.
Cohere Command R+ is commonly used for: Automating customer support interactions using AI agents, Generating marketing materials and product descriptions at scale, Streamlining internal reporting processes with automated text generation, Creating multilingual content for global audiences, Integrating AI into existing CRM systems for enhanced data insights, Implementing payment processing guardrails in e-commerce applications.
Cohere Command R+ integrates with: Salesforce, Slack, Zapier, Microsoft Teams, Google Workspace, Shopify, HubSpot, Jira, Trello, Asana.
Based on user reviews and social mentions, the most common pain points are: token cost, cost tracking, API costs.
Based on 69 social mentions analyzed, 10% of sentiment is positive, 86% neutral, and 4% negative.