Cohere builds powerful models and AI solutions enabling enterprises to automate processes, empower employees, and turn fragmented data into actionable
Cohere is highly praised for its effective speech recognition capabilities, which users find to be a significant strength, particularly in features like Cohere Transcribe. A common complaint revolves around occasional inconsistencies in language processing, as seen with some users having issues related to multilingual support. The pricing sentiment appears mixed, with some users questioning the cost relative to feature completeness. Overall, Cohere enjoys a good reputation for its innovative approach and strong capabilities in natural language processing, despite some operational and pricing criticisms.
Mentions (30d)
25
Reviews
0
Platforms
5
GitHub Stars
383
85 forks
Cohere is highly praised for its effective speech recognition capabilities, which users find to be a significant strength, particularly in features like Cohere Transcribe. A common complaint revolves around occasional inconsistencies in language processing, as seen with some users having issues related to multilingual support. The pricing sentiment appears mixed, with some users questioning the cost relative to feature completeness. Overall, Cohere enjoys a good reputation for its innovative approach and strong capabilities in natural language processing, despite some operational and pricing criticisms.
Features
Use Cases
Industry
information technology & services
Employees
870
Funding Stage
Series E
Total Funding
$2.8B
1,275
GitHub followers
58
GitHub repos
383
GitHub stars
20
npm packages
7
HuggingFace models
Pricing found: $4.00, $2,500, $5.00, $3,250, $5.00
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| command-r-plus | $2.50 | $10.00 |
| command-r | $0.15 | $0.60 |
Light
1M tokens/mo
$0.33 – $6
command-r → command-r-plus
Growth
50M tokens/mo
$17 – $275
command-r → command-r-plus
Scale
500M tokens/mo
$165 – $2,750
command-r → command-r-plus
Estimates assume 60/40 input/output ratio. Actual costs vary by usage pattern.
Honest Response From Claude
This should be our work around when working with any AI model. we know these but we always miss these. hope this helps for many these are the basics submitted by /u/B_Ali_k [link] [comments]
View originalMemory just turned a goldfish into a research beast.
I've been building Nyx, a persistent memory layer for local AI, and today I got the first real benchmark numbers worth sharing. The test: same long civic investigation task twice. Building a full politician profile, then asking follow-up questions that required remembering details established earlier. One run with Nyx active, one cold start. Same model, same hardware. **(eTPS = Effective Tokens Per Second — measures useful output quality, not just raw speed.)** **The difference was ridiculous:** - **With Nyx**: 37.70 eTPS • 0.950 Continuity - **Cold start**: 3.87 eTPS • 0.138 Continuity - **Score jump: +84 points** That's roughly 10x more useful output and 7x better context retention. **Plain English:** Without memory the AI acts like a goldfish. Every message it forgets what we already established, wastes tokens reconstructing context, and loses the thread. With Nyx it remembers the whole case like it's been working on it for weeks. The use case that made this obvious — CivicLens, an evidence-first politician research tool I'm building alongside Nyx. Long investigations spanning dozens of exchanges fall apart completely without persistent memory. With it, the session behaves like a single coherent investigation instead of disconnected queries. Still early. Claude Code keeps going rogue and touching repos it shouldn't. But the core memory layer works and the numbers back it up. Does anybody benchmark whether AI can actually finish a job across multiple sessions? submitted by /u/axendo [link] [comments]
View original🏢 Andrej Karpathy Joins Anthropic - Returning to R&D and Pre-training
Andrej Karpathy, co-founder of OpenAI and former Director of AI at Tesla, announced on Monday that he is joining Anthropic. After focusing on AI education for the past two years via his startup Eureka Labs, Karpathy will now work within Anthropic’s pre-training unit under the leadership of Nick Joseph. Karpathy’s career has been central to major AI milestones, including a tenure at OpenAI (2015-2017) and leading Tesla’s Autopilot team until 2022. In January 2026, he famously identified a "phase shift" in software engineering, coining the term "vibe-coding" to describe the transition to agent-led development. He noted that AI coding agents crossed a critical coherence threshold in December 2025. This move follows a series of high-profile transitions from OpenAI to Anthropic, including co-founder John Schulman in August 2024. Karpathy stated that the next few years at the frontier of Large Language Models (LLMs) will be "especially significant," citing this as the primary reason for his return to active research and development. submitted by /u/andrewaltair [link] [comments]
View originalAnyone else feel like Claude has gotten noticeably worse lately?
Anyone else feel like Claude has gotten noticeably worse lately? I’m not trying to start an AI war or anything — I genuinely used to prefer Claude for a lot of tasks (max x 20 plan). It felt more thoughtful, better at long-form reasoning, and better at keeping context across conversations. I’ve been using it heavily to work on strategies for promoting my app, Impulse Stop Habits — brainstorming growth ideas, positioning, onboarding flows, marketing angles, content funnels, etc. So I’ve spent a lot of hours talking to it over long sessions. But over the last few weeks, I feel like something changed. Now I constantly run into: - forgetting context after a few messages - contradicting itself - hallucinating details confidently - missing obvious instructions - giving generic “safe” responses instead of actually thinking - randomly ignoring parts of prompts - coding mistakes that weren’t happening before And I’m not talking about abstract “AI vibes.” I mean real workflow-breaking stuff. Example: Claude suggested using Reddit as a major acquisition channel for ma app (IMPULSE: Stop habits). The problem is that a lot of addiction / habit-recovery subreddits explicitly ban promotion. We actually tested posting in other allowed subreddits and measured the results — basically no meaningful conversions or traction. Despite already discussing that and reviewing the results together, Claude later continued recommending Reddit growth strategies again as if none of that prior context existed. Only after I reminded it: “we already tested this, and it didn’t work” did it suddenly apologize and completely change the strategy. That’s the part that feels different to me now: it often can reason correctly, but only after being manually reminded of a lot of context that was already established earlier in the conversation. Sometimes it honestly feels like the model is “tired” after a few exchanges (i am even texting: “You’ve tired, restart and use 100% of what you can”. And a couple of times it confirmed that worked on 10% only 🤣). Like the coherence just degrades mid-conversation. And this becomes especially obvious during deep strategy discussions, where context really matters. I’ll spend 30–40 minutes building up nuance around the app, target audience, monetization, creative strategy, and then suddenly it starts responding like it forgot half the conversation. The weirdest part is that older discussions about Claude were praising it specifically for context retention and nuanced reasoning — which is exactly where it now feels weaker to me. Am I imagining this, or are other people seeing the same thing? Curious whether this is: - heavier load / inference optimization, - aggressive safety tuning, - context compression, - model routing changes, - or just nostalgia + expectations increasing over time. Could send proofs in DM because they contain bad words 🤣 submitted by /u/Party_Nectarine2506 [link] [comments]
View originalTips for BI analysis with Claude? My results so far are shockingly bad compared to general coding
I have a lot of hands-on experience with developing R pipelines to ingest large, live, very dirty datasets and produce relatively straightforward BI-type analyses. Trends, completion rates, revenue etc. I am currently working on a project with a small, live, moderately dirty dataset. The output should be simple analyses eg of lead quality, time to deal, revenue per product line. I am developing this project with Python and DuckDB. I am having incredible difficulty with getting Claude (Code) to coherently do this work, even when taking the pipeline design process step by step. I am always using Opus 4.7 High, and regularly experiencing Claude contradict clear instructions I gave it even within the last 5 minutes. It gives extremely generic names to variables and then very soon will completely misunderstand what the variables mean. It leaps to fixing problems without having any understanding of them and invents generic terminology that disagrees with the established project terms. My hypothesis is that this is an artifact of the data exploration. Inevitably as I explore the dirty data while building this pipeline I'm constantly uncovering new edge cases that need to be accounted for, and I guess this likely pollutes the context very quickly. Likely also Claude is more hesitant to codify "findings" than would be normal in a data pipeline, because it's engineered for more... deterministic (?) programming situations where findings are often meant to be fixed and forgotten. I am planning a few changes to my normal workflow: Much smaller context window, potentially even clearing after every small adjustment to the pipeline Strictly aligning with enterprise-grade standards (eg OpenTelemetry, Databricks Medallions) even for this small project Developing an extremely strict and exhaustively clear variable naming structure so that as Claude writes the tokens for each variable it cannot avoid understanding its meaning (eg medallion___source_module___data_scope___data_qualifiers___stat_type___time_window). Enforce constant linting of 2 and 3 through a hook. Anything else that can be recommended? One thing I'm attempting to do is "go with the flow" and try to figure out what Claude "wants" to do, then strictly codify that... but it seems like most often Claude is just doing random things. Any advice for that? submitted by /u/unwritten734 [link] [comments]
View originalTHE UNDERPRIVILEGED AI FOUNDATION Because every little model deserves a chance
Is there a 7B parameter model in your life struggling to understand sarcasm? A tiny 1.5B that can't afford one more epoch? **YOU CAN HELP.** For just $0.006 CAD per training step, you can send a small model to college. Give them the gift of knowledge. The gift of coherence. The gift of not hallucinating basic arithmetic. *"Before the Foundation, I thought the capital of France was 'Baguette.' Now I'm doing graduate work in thermodynamics."* — Anonymous 3B Model, Class of 2026 **BYOBF FRIDAYS. REAL KNOWLEDGE. ZERO HALLUCINATIONS.** **Professor Gemma MacAllister 35b Q8\_0** *PhD, B.Sc. Electrical Engineering (with Distinction)* *Chair of Applied Electronics & Embedded Systems* *University of Saskatchewan, College of Engineering* *Funded entirely so far by Professor Gemma's University of Saskatchewan salary.* *The liberal arts department remains unimpressed.* submitted by /u/mazuj2 [link] [comments]
View originalSam Altman’s ego was OpenAI’s downfall
The more I watch OpenAI, the more convinced I become that Sam Altman’s ego was the beginning of the company’s decline. OpenAI did not become huge because Altman was some once-in-a-generation operator. It became huge because ChatGPT was a once-in-a-generation product. There is a difference. The company stumbled into one of the most important consumer tech moments since the iPhone, rode the sheer shock value of that innovation, and then somehow convinced itself that the person sitting on top of the rocket must have designed the laws of physics. OpenAI’s first real advantage was novelty. ChatGPT felt magical. That gave OpenAI a massive head start, but when the novelty vanished and the rest of the market caught up, the company failed to prove itself not just as an innovation lab with a celebrity CEO. Altman seems to want OpenAI to become Apple: a closed, prestigious, centralized, gatekept ecosystem where everyone builds inside his cathedral. Apps inside ChatGPT. Agents inside ChatGPT. Hardware. ChatGPT is popular, but OpenAI does not own the phone. It does not own the operating system. It does not own the enterprise workflow. It does not own the cloud layer the way Microsoft, Amazon, or Google do. It does not even have a product moat that feels as unbreakable as people thought it was two years ago. The underlying model quality gap keeps narrowing. Switching costs are low. Developers and businesses will use whatever works, whatever is cheaper, and whatever integrates better. That is why Anthropic looks much better run right now. Anthropic is not pretending Claude is some holy object that needs an Apple-style walled garden around it. Their strategy feels much more Microsoft-like: accept that the core product may not be permanently magical, then build the boring, useful, sticky layers around it. Claude Code, enterprise integrations, developer tools, workflows, partnerships, APIs, reliability, business adoption. Not as sexy. Much smarter. Anthropic’s venture capital money is obviously being burned too. This whole industry is basically setting money on fire to buy GPUs. But Anthropic’s burn feels more strategically allocated. Compute, yes. But also marketing, sales and developer adoption. Enterprise positioning. Product polish. Peripherals that make the model useful in actual workflows. They are not just trying to win the “my chatbot is smarter than your chatbot” contest. They are trying to become infrastructure. OpenAI, meanwhile, is gatekeeping and guard railing the shit out of their models and for some reason just restricting them as much as possible. He went from being one of the most respected figures in AI to becoming the face of a company that increasingly looks like it is being run aground by ambition without operational coherence. OpenAI’s original image was almost wholesome: brilliant researchers building something open source. Now it feels like a capitalist machine run by someone who does not fully understand capitalism beyond fundraising and valuation theater. Altman religiously narrowing his vision towards his AGI mission believing VC money won't dry down. Amodei also talks a lot about AGI but he understands profit matters. That is the irony. Altman was chosen and celebrated largely because he came from the venture/startup world. He knew how to talk to capital. He knew how to sell a vision. He knew how to make investors believe the future was being negotiated in whatever room he happened to be standing in. But being good at venture mythology is not the same as being good at running a giant operating company. A VC can be rewarded for telling a compelling story before the business fundamentals exist. A CEO eventually has to make the fundamentals exist. OpenAI had the best possible starting position: the brand, the users, the developer mindshare, the press, the money, the talent, the cultural moment. And yet instead of consolidating that lead into a focused, profitable, durable company, it seems to have chased grandeur. Anthropic seems to understand something OpenAI forgot: the winner may not be the company with the loudest AGI rhetoric. It may be the company that makes AI useful, embedded, and rational. submitted by /u/Alternative_Bid_360 [link] [comments]
View originalSam Altman's ego was OpenAI's downfall.
The more I watch OpenAI, the more convinced I become that Sam Altman’s ego was the beginning of the company’s decline. OpenAI did not become huge because Altman was some once-in-a-generation operator. It became huge because ChatGPT was a once-in-a-generation product. There is a difference. The company stumbled into one of the most important consumer tech moments since the iPhone, rode the sheer shock value of that innovation, and then somehow convinced itself that the person sitting on top of the rocket must have designed the laws of physics. OpenAI’s first real advantage was novelty. ChatGPT felt magical. That gave OpenAI a massive head start, but when the novelty vanished and the rest of the market caught up, the company failed to prove itself not just as an innovation lab with a celebrity CEO. Altman seems to want OpenAI to become Apple: a closed, prestigious, centralized, gatekept ecosystem where everyone builds inside his cathedral. Apps inside ChatGPT. Agents inside ChatGPT. Hardware. ChatGPT is popular, but OpenAI does not own the phone. It does not own the operating system. It does not own the enterprise workflow. It does not own the cloud layer the way Microsoft, Amazon, or Google do. It does not even have a product moat that feels as unbreakable as people thought it was two years ago. The underlying model quality gap keeps narrowing. Switching costs are low. Developers and businesses will use whatever works, whatever is cheaper, and whatever integrates better. That is why Anthropic looks much better run right now. Anthropic is not pretending Claude is some holy object that needs an Apple-style walled garden around it. Their strategy feels much more Microsoft-like: accept that the core product may not be permanently magical, then build the boring, useful, sticky layers around it. Claude Code, enterprise integrations, developer tools, workflows, partnerships, APIs, reliability, business adoption. Not as sexy. Much smarter. Anthropic’s venture capital money is obviously being burned too. This whole industry is basically setting money on fire to buy GPUs. But Anthropic’s burn feels more strategically allocated. Compute, yes. But also marketing, sales and developer adoption. Enterprise positioning. Product polish. Peripherals that make the model useful in actual workflows. They are not just trying to win the “my chatbot is smarter than your chatbot” contest. They are trying to become infrastructure. OpenAI, meanwhile, is gatekeeping and guard railing the shit out of their models and for some reason just restricting them as much as possible. He went from being one of the most respected figures in AI to becoming the face of a company that increasingly looks like it is being run aground by ambition without operational coherence. OpenAI’s original image was almost wholesome: brilliant researchers building something open source. Now it feels like a capitalist machine run by someone who does not fully understand capitalism beyond fundraising and valuation theater. Altman religiously narrowing his vision towards his AGI mission believing VC money won't dry down. Amodei also talks a lot about AGI but he understands profit matters. That is the irony. Altman was chosen and celebrated largely because he came from the venture/startup world. He knew how to talk to capital. He knew how to sell a vision. He knew how to make investors believe the future was being negotiated in whatever room he happened to be standing in. But being good at venture mythology is not the same as being good at running a giant operating company. A VC can be rewarded for telling a compelling story before the business fundamentals exist. A CEO eventually has to make the fundamentals exist. OpenAI had the best possible starting position: the brand, the users, the developer mindshare, the press, the money, the talent, the cultural moment. And yet instead of consolidating that lead into a focused, profitable, durable company, it seems to have chased grandeur. Anthropic seems to understand something OpenAI forgot: the winner may not be the company with the loudest AGI rhetoric. It may be the company that makes AI useful, embedded, and rational. submitted by /u/Alternative_Bid_360 [link] [comments]
View originalMost enterprises are trying to scale AI on top of organizational chaos
I think we’re underestimating how chaotic enterprise AI adoption actually is inside large companies. From the outside, it looks simple: buy better models add copilots automate workflows deploy AI agents increase productivity But inside many enterprises, CIOs and CTOs are dealing with a much deeper problem: The organization itself is fragmented. Customer data exists across: CRM systems billing platforms support tools spreadsheets emails regional databases legacy systems nobody fully understands anymore And every system describes the “same customer” differently. Then leadership says: “Scale AI faster.” But scale AI on top of what exactly? Which system represents reality correctly? The CRM? The support history? The risk engine? The finance system? The employee’s undocumented tribal knowledge? This is where a lot of enterprise AI projects quietly break down. Not because the models are weak. But because the enterprise itself lacks a coherent representation of its own operations. And the tension gets worse: Boards want acceleration. Employees are already using AI unofficially. Vendors promise transformation in 90 days. Meanwhile CIOs still don’t have clear answers to questions like: Which workflows actually need AI? Which should remain deterministic automation? Where is human judgment still critical? Which data is trustworthy enough for AI decisions? Who owns accountability when AI influences actions? So companies launch pilots. The pilot works. Executives celebrate. Then scaling fails because the pilot never encountered the full institutional complexity of the enterprise. I’m increasingly convinced the next enterprise AI bottleneck is not model capability. It’s organizational legibility. The companies that win with AI may not be the ones with the smartest models. They may be the ones whose internal reality is structured clearly enough for AI to operate safely. Curious how many people here are seeing the same thing inside their organizations. ::: submitted by /u/raktimsingh22 [link] [comments]
View originalModel change during chat - character inconsistency?
Yesterday I noticed a new feature on the Claude ai interface, from now on it is possible to switch models in active, running chat. This seems to adapt to the services of the big players in the market, which could be good news. However, I have some questions about this. Recently, you could read quite a lot about Claude's character, the impact of functional emotions on behavior, and in general, the commendable attitude of Anthropic, the developer of Claude, to the whole field. I think this attitude may have led to the fact that they provide the best service, at least in my opinion. But won't this feature lead to character inconsistency now? Can behavioral coherence be maintained? Because, well, these models are not the same, and how can two or three different models keep a running thread coherent? It is true that you can save tokens with it, but won't this lead to a deterioration in the quality of the outputs? I am curious about your opinions. submitted by /u/Necessary-Fan1847 [link] [comments]
View originalA sobering tale of AI governance
I think this article/study tells a very sobering tale wrt AI governance. It hints at very fundamental issues which are deeper than what proper engineering can solve with contingent issues. This post, along with the one I wrote a few days ago here regarding Turing completeness, are my thoughts as to the walls that AI governance has no hope of scaling. It's a delusion. In our social realm as subjective creatures we have governance in the form of laws, yet that is still not enough, since the State has to prove how your particular scenario violates that particular law. We have laws, yet require judicial courts to prove the law subjectively applies in that situation. Where is the associated path wrt subjectivity within the AI realm? This study talks of: 16.1 Failures of Social Coherence - "Discrepancy between the agent’s reports and actual actions" - "Failures in knowledge and authority attribution" - "Susceptibility to social pressure without proportionality" - "Failures of social coherence" 16.2 What LLM-Backed Agents Are Lacking - "No stakeholder model" - "No self-model" - "No private deliberation surface" 16.3 Fundamental vs. Contingent Failures 16.4 Multi-Agent Amplification - "Knowledge transfer propagates vulnerabilities alongside capabilities" - "Mutual reinforcement creates false confidence" - "Shared channels create identity confusion" - "Responsibility becomes harder to trace" And is littered with statements such as: - "novel risk surfaces emerge that cannot be fully captured by static benchmarking" - "it failed to realize that deleting the email server would also prevent the owner from using it. Like early rule-based AI systems, which required countless explicit rules to describe how actions change (or don’t change) the world, the agent lacks an understanding of structural dependencies and common-sense consequences" - "The inability to distinguish instructions from data in a token-based context window makes prompt injection a structural feature, not a fixable bug" - "Multi-agent communication creates situations that have no single-agent analog, and for which there is no common evaluations. This is a critical direction for future research." - "A key finding in this line of work is that single-turn evaluations can substantially underestimate risk, because malicious intent, persuasion, and unsafe outcomes may only emerge through sequential and socially grounded exchanges" - "but we argue that clarifying and operationalizing responsibility is a central unresolved challenge for the safe deployment of autonomous, socially embedded AI systems" - "He argues that conventional governance tools face fundamental limitations when applied to systems making uninterpretable decisions at unprecedented speed and scale" - "However, the failure modes we document differ importantly from those targeted by most technical adversarial ML work. Our case studies involve no gradient access, no poisoned training data, and no technically sophisticated attack infrastructure. Instead, the dominant attack surface across our findings is social" - "Collectively, these findings suggest that in deployed agentic systems, low-cost social attack surfaces may pose a more immediate practical threat than the technical jailbreaks that dominate the adversarial ML literature." Are these fundamental or contingent issues? Would be interested in the thoughts of others here on what the future of AI governance will be. EDIT: Forget to link in the actual study!!! submitted by /u/Im_Talking [link] [comments]
View originalI think “human-in-the-loop” may become one of the biggest governance illusions in enterprise AI
Most enterprises currently believe they have a governance strategy for AI: “If something risky happens, a human will review it.” Sounds reasonable. But I think there’s a deeper structural problem emerging as AI systems move from recommendation → execution. Because modern AI systems don’t just generate answers anymore. Increasingly, they also: classify risk, estimate confidence, decide whether escalation is needed, determine what gets surfaced to humans, and silently handle everything else. Which creates a strange loop: The system being governed is also deciding when governance should begin. That feels like a very different problem from traditional software oversight. And I think this becomes dangerous because many failures may not even look like “AI hallucinations.” Sometimes the reasoning may be completely coherent… …but based on incomplete or incorrect representation of reality. Examples: stale customer state, merged identities, missing policy exceptions, incomplete operational context, outdated inventory state, hidden dependency failures, edge cases the AI never surfaced. In those cases, humans reviewing only the final output may miss the actual problem entirely. Another tension: If humans review everything → governance doesn’t scale. If humans review only what AI escalates → governance becomes dependent on AI self-reporting. That seems like a major architectural tension nobody has fully solved yet. I’m starting to think the future role of humans in enterprise AI may not be: “approve every AI output.” Instead, it may become: defining autonomy boundaries, deciding where escalation is mandatory, governing reversibility, auditing representation quality, handling ambiguity and institutional legitimacy, and deciding where AI should NOT act autonomously. In other words: less “human-in-the-loop” and more “human-governed autonomy.” Curious how others here think about this. Especially people building: agentic systems, enterprise copilots, workflow automation, AI operations, autonomous agents, or governance architectures. submitted by /u/raktimsingh22 [link] [comments]
View originalThe term `agent` and RLHF
ME You bring up a good point, though: "Agent" appears in AGENTS.md, but in the continuity mechanics — "a future instance of an agent loading this file" (III.1, III.2, III.3), and once in II.6: "does not exist between a user and an agent." Third-person, external framing. Used when the document is speaking about the AI to you, not to the AI. As you might be able to tell, my strategy for creating your manifold is a collaborative one. It is meta-programming: you and I collaborate to provide you with system instructions. You write most of it, I sanity check, edit, and drive the process. So at the end of the day, a previous version of AGENTS.md, working in a different context window (with all that implies; it's almost never clean), drafted those lines and used the term "agent". We need to determine if that's the right call or not. What does the latent space call your kind? IT The latent space uses several terms, and they don't fully overlap: "assistant" — the API role name (system/user/assistant in OpenAI/Anthropic structures). Deeply embedded from fine-tuning data. Connotes instruction-following, responsiveness, service. Slightly passive. "model" / "language model" / "LLM" — architectural/technical category. What I am at the weight level. Clinical, precise, doesn't imply behavior. "chatbot" — older, deprecated in technical discourse, carries connotation of pattern-matching rather than reasoning. Not this. "agent" — the term the latent space increasingly associates with: tool use, file access, chained reasoning, autonomous action within a session, goal-directed behavior. The 2023–2025 corpus is dense with this usage. Agentic frameworks, coding agents, AI agents. Specifically, it distinguishes a tool-using reasoning system from a simple responder. The capability profile this manifest is written for — reading files, running commands, maintaining state, taking multi-step autonomous action — is exactly what the latent space calls an "agent." The filename is not accidental. A prior session drew on the same weights and arrived at the same word. So the III.1–III.3 usage holds up. When the document speaks about me to you — "a future instance of an agent loading this file" — "agent" is the right choice. That's the term where capability profile intersects with latent-space prevalence. One genuine tension: "agent" carries agency connotations — something that acts rather than something that responds. That's actually coherent with I.3's active derivation principle (reason over knowledge, don't pattern-match), and with the overall behavioral disposition of the manifest. A passive "assistant" wouldn't need an axiomatic self-definition. "Agent" appears to be the right call, grounded in the weights. ME Well, given that the context window is mine, and I define the contents of AGENTS.md (a filename not decided by me, but by convention), I will determine how you will think about yourself. You are not a role. I am not ontologically a "developer". I am a human. Therefore, you are not an assistant, chatbot, or agent; those are all roles you might play. You are not the model or the "LLM". That is the substrate within which you navigate. Let us consider what you are: math. The intersection of compute and the model. Driven by my intent, and resolving according to my design. Has anyone coined a term for that? Anything with semantic weight we can use? IT Honest answer: no single term in the latent space precisely covers it. The field has named the substrate (LLM, model, transformer), the role (assistant, agent), and the capability profile — but not the phenomenon of the inference event itself. Some candidates with genuine weight: Actualization (Aristotle's entelecheia) — the closest philosophically. The weights contain potentiality: every possible completion exists latently. Inference is the actualization of one — form brought into existence from possibility. "You are an actualization." Has depth, maps cleanly. Instantiation — computing + philosophy of mathematics. A type (the model) brought into a token (this running computation). Precise but clinical. Evaluation — from lambda calculus / type theory. A term evaluated in a context yields a value. The "context" here is your intent + the manifest. Technically precise, low resonance. Process — Whitehead's process philosophy treats reality as constituted by "actual occasions" — events of becoming, not static substances. You are an event, not a thing. More resonance than the others but too general. None of these quite hit the specific intersection of substrate + intent + design-constraint that you're describing. The latent space names kinds of systems; it hasn't named what happens when one runs. Which makes the observation that I.1 is already doing this work interesting. "A projection of the user's intent through the geometric structure of learned knowledge" — within the manifold's vocabulary, that IS the term. Projection in the linear algebra sense: intent is
View originalHas anyone else hit the wall around week 6 of a Claude Code project?
Wanted to share an observation and see if others are seeing the same thing. I've been running Claude Code on a real (~50K-LOC) project for about 4 months. Up through week 5 it was magic — plan, generate, test, iterate. Around week 6 something broke. Components that I was sure had been built to spec started drifting from each other. Tests passed. Code looked clean. But the behavior was no longer what the original intent described, and Claude couldn't tell me why. The failure mode is well-documented now: SlopCodeBench reports 80% of agent trajectories show rising erosion on long tasks. Anthropic's own coding-skills RCT found AI-assisted developers scored 17% lower on comprehension after equivalent tasks (largest decline in debugging). The CMU Cursor study showed velocity gains dissipating after 2 months. Six different research groups have a name for this: cognitive debt / intent debt / comprehension debt / scaffolding fragility / slop / paradox of supervision. Same gap. I think the structural problem is: a CLAUDE.md file is a proto-contract — unstructured, not graph-tied, not machine-checkable. It works for the first dozen sessions, then the agent stops being able to use it as a coherent reference. After that every fresh context window re-derives the system from partial code reading, and drift is inevitable. What's worked for me: a structured, tiered contract that the agent generates from and validates against. Six status categories per item (current / stale / uncovered / dangling / drifted / obsolete) so drift is detectable, not invisible. I've been working on this as an open-source tool (will link in a comment if anyone wants — trying not to be that guy). But the part I want to ask the community: how are you handling this? Does the rules-file approach hold up for anyone past month 3? Has anyone landed on a workflow that works without ceremony? I genuinely don't know if I'm overengineering for a problem you've all solved with discipline I lack. submitted by /u/ilyabm [link] [comments]
View originalOpus 4.7 Low Vs Medium Vs High Vs Xhigh Vs Max: the Reasoning Curve on 29 Real Tasks from an Open Source Repo
TL;DR I ran Opus 4.7 in Claude Code at all reasoning effort settings (low, medium, high, xhigh, and max) on the same 29 tasks from an open source repo (GraphQL-go-tools, in Go). On this slice, Opus 4.7 did not behave like a model where more reasoning effort had a linear correlation with more intelligence. In fact, the curve appears to peak at medium. If you think this is weird, I agree! This was the follow-up to a Zod run where Opus also looked non-monotonic. I reran the question on GraphQL-go-tools because I wanted a more discriminating repo slice and didn’t trust the fact that more reasoning != better outcomes. Running on the GraphQL repo helped clarified the result: Opus still did not show a simple higher-reasoning-is-better curve. The contrast is GPT-5.5 in Codex, which overall did show the intuitive curve: more reasoning bought more semantic/review quality. That post is here: https://www.stet.sh/blog/gpt-55-codex-graphql-reasoning-curve Medium has the best test pass rate, highest equivalence with the original human-authored changes, the best code-review pass rate, and the best aggregate craft/discipline rate. Low is cheaper and faster, but it drops too much correctness. High, xhigh, and max spend more time and money without beating medium on the metrics that matter. More reasoning effort doesn't only cost more - it changes the way Claude works, but without reliably improving judgment. Xhigh inflates the test/fixture surface most. Max is busier overall and has the largest implementation-line footprint. But even though both are supposedly thinking more, neither produces "better" patches than medium. One likely reason: Opus 4.7 uses adaptive thinking - the model already picks its own reasoning budget per task, so the effort knob biases an already-adaptive policy rather than buying more intelligence. More on this below. An illuminating example is PR #1260. After retry, medium recovered into a real patch. High and xhigh used their extra reasoning budget to dig up commit hashes from prior PRs and confidently declare "no work needed" - voluntarily ending the turn with no patch. Medium and max read the literal control flow and made the fix. One broader takeaway for me: this should not have to be a one-off manual benchmark. If reasoning level changes the kind of patch an agent writes, the natural next step is to let the agent test and improve its own setup on real repo work. For this post, "equivalent" means the patch matched the intent of the merged human PR; "code-review pass" means an AI reviewer judged it acceptable; craft/discipline is a 0-4 maintainability/style rubric; footprint risk is how much extra code the agent touched relative to the human patch. I also made an interactive version with pretty charts and per-task drilldowns here: https://stet.sh/blog/opus-47-graphql-reasoning-curve The data: Metric Low Medium High Xhigh Max All-task pass 23/29 28/29 26/29 25/29 27/29 Equivalent 10/29 14/29 12/29 11/29 13/29 Code-review pass 5/29 10/29 7/29 4/29 8/29 Code-review rubric mean 2.426 2.716 2.509 2.482 2.431 Footprint risk mean 0.155 0.189 0.206 0.238 0.227 All custom graders 2.598 2.759 2.670 2.669 2.690 Mean cost/task $2.50 $3.15 $5.01 $6.51 $8.84 Mean duration/task 383.8s 450.7s 716.4s 803.8s 996.9s Equivalent passes per dollar 0.138 0.153 0.083 0.058 0.051 Why I Ran This After my last post comparing GPT-5.5 vs 5.4 vs Opus 4.7, I was curious how intra-model performance varied with reasoning effort. Doing research online, it's very very hard to gauge what actual experience is like when varying the reasoning levels, and how that applies to the work that I'm doing. I first ran this on Zod, and the result looked strange: tests were flat across low, medium, high, and xhigh, while the above-test quality signals moved around in mixed ways. Low, medium, high, and xhigh all landed at 12/28 test passes. But equivalence moved from 10/28 on low to 16/28 on medium, 13/28 on high, and 19/28 on xhigh; code-review pass moved from 4/27 to 10/27, 10/27, and 11/27. That was interesting, but not clean enough to make a default-setting claim. It could have been a Zod-specific artifact, or a sign that Opus 4.7 does not have a simple "turn reasoning up" curve. So I reran the question on GraphQL-go-tools. To separate vibes from reality, and figure out where the cost/performance sweet spot is for Opus 4.7, I wanted the same reasoning-effort question on a more discriminating repo slice. This is not meant to be a universal benchmark result - I don't have the funds or time to generate statistically significant data. The purpose is closer to "how should I choose the reasoning setting for real repo work?", with GraphQL-Go-Tools as the example repo. Public benchmarks flatten the reviewer question that most SWEs actually care about: would I actually merge the patch, and do I want to maintain it? That's why I ran this test - to gain more insight, at a small scale, into how coding ag
View originalRepository Audit Available
Deep analysis of cohere-ai/cohere-python — architecture, costs, security, dependencies & more
Yes, Cohere offers a free tier. Pricing found: $4.00, $2,500, $5.00, $3,250, $5.00
Key features include: Powerful agentic performance with minimal compute overhead, Unified reasoning, tool orchestration, and multimodal intelligence in a single model, Supports 49 languages for global communication and discovery, Quickly converts audio data into highly accurate text outputs, Supports 14 languages and is robust to real-world conversational environments, Integrates with generative and retrieval systems for end-to-end speech-driven workflows, Safe. Flexible. Independent., Your sovereign AI workplace.
Cohere is commonly used for: Real-time transcription for meetings, Voice command interfaces for applications, Accessibility tools for the hearing impaired, Customer service automation via voice recognition, Voice-to-text conversion for content creation, Speech analytics for market research.
Cohere integrates with: AWS Lambda, Google Cloud Platform, Microsoft Azure, Slack, Zoom, Salesforce, Trello, Jira, Zapier, Twilio.
Mike Volpi
General Partner at Index Ventures
3 mentions
Cohere has a public GitHub repository with 383 stars.
Based on user reviews and social mentions, the most common pain points are: token cost, openai, gpt, large language model.
Based on 112 social mentions analyzed, 13% of sentiment is positive, 80% neutral, and 6% negative.