A guidance language for controlling large language models. - guidance-ai/guidance
"Guidance" software is praised for its ability to support advanced and multi-step tasks effectively, benefiting from integrations with tools like GitHub Copilot. Users appreciate its strong performance in complex coding environments and agentic execution capabilities. However, some users express concerns about its move to a usage-based billing model, indicating that cost could become a significant factor for some. Overall, it maintains a solid reputation for enhancing developer workflows, though pricing remains a sensitive area for users.
Mentions (30d)
72
14 this week
Reviews
0
Platforms
5
GitHub Stars
21,364
1,157 forks
"Guidance" software is praised for its ability to support advanced and multi-step tasks effectively, benefiting from integrations with tools like GitHub Copilot. Users appreciate its strong performance in complex coding environments and agentic execution capabilities. However, some users express concerns about its move to a usage-based billing model, indicating that cost could become a significant factor for some. Overall, it maintains a solid reputation for enhancing developer workflows, though pricing remains a sensitive area for users.
Features
Use Cases
Industry
information technology & services
Employees
6,200
Funding Stage
Other
Total Funding
$7.9B
236
GitHub followers
11
GitHub repos
21,364
GitHub stars
20
npm packages
18
HuggingFace models
Brazil, Indonesia, Japan, Germany, and India fueled a massive surge in 2025, adding nearly 36 million new developers to GitHub. 🌏 India alone added 5.2 million. 🇮🇳
Brazil, Indonesia, Japan, Germany, and India fueled a massive surge in 2025, adding nearly 36 million new developers to GitHub. 🌏 India alone added 5.2 million. 🇮🇳
View originalManaged Agents endpoint reference - what's new in CC 2.1.144 (-105 tokens)
Data: Managed Agents endpoint reference — Drops the type: "model_config" wrapper from the model config shorthand example, so the full config object is now just {id: "claude-opus-4-6", speed: "fast"}. Tool Description: CronCreate — Adds a "Not for live watching" section (shown when the Monitor tool is enabled) clarifying that CronCreate re-runs prompts at fixed wall-clock intervals and pointing users to the Monitor tool for streaming log/process/command output as it changes, since cron polls on a schedule. Refactors the durability and runtime-behavior copy so the durable-vs-session-only guidance is sourced from shared snippets rather than inlined conditionals. Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.144 submitted by /u/Dramatic_Squash_3502 [link] [comments]
View originalIf you're NOT having usage or drift issues, have you turned off auto-memory?
There's a running debate in this community: some people say Opus is nerfed, usage evaporates after two prompts, sessions drift and get "stupid." Others say everything's fine. The common theory is Anthropic is A/B testing or ranking preferred customers. I think there's a simpler explanation, and I'd like the community's help testing it. The hidden variable: Claude Code's auto-memory directory Claude Code has a feature (on by default since v2.1.59) that silently creates individual .md files in ~/.claude/projects/*/memory/ every time it decides something is worth remembering about you or your project. Each memory gets its own file. There's no consolidation, no dedup, and no size management. These files load as instructions at the start of every session. Not as conversation — as instructions. The model weighs them heavily. What I found in my projects I audited every project on my machine: 136 memory files across 18 projects 432KB total (~108-140K tokens of instruction overhead) One project alone had 41 files Found direct contradictions between files — one file listed brand terms as approved, another (written later) said those same terms were explicitly rejected by the client When you have 20+ feedback files giving slightly different guidance about how to approach your work, the model tries to honor all of them simultaneously. It averages across conflicting signals. That averaging is what people experience as drift. It's not that Opus got dumber — it's that it's being pulled in 20 directions by its own instruction set. Check yours right now for dir in ~/.claude/projects/*/memory/; do if [ -d "$dir" ]; then project=$(basename "$(dirname "$dir")") count=$(find "$dir" -name "*.md" 2>/dev/null | wc -l | tr -d ' ') size=$(find "$dir" -name "*.md" -exec cat {} + 2>/dev/null | wc -c | tr -d ' ') if [ "$count" -gt 0 ]; then echo "$count files, $(($size/1024))KB — $project" fi fi done | sort -t, -k1 -rn The question for this community People who say they have NO issues with usage limits or drift — have you also turned off auto-memory ("autoMemoryEnabled": false in settings), or do you actively manage your memory files? Because if there's a strong correlation between clean/disabled memory and good session quality, that's a signal that this is a real contributing factor. And for people who ARE hitting usage walls or experiencing drift — run that diagnostic. If you're sitting on 30+ memory files with contradictions you didn't know about, that's worth knowing. I'm not claiming this explains everything. Model changes, server-side factors, plan differences — those are all real variables. But memory hygiene is the one variable you can actually control, and I don't see anyone talking about it. The fix I built a Claude Code skill (/memory-cleanup) that: Audits your memory directory and reports what's there Consolidates everything into 2 managed files (MEMORY.md + feedback.md) Surfaces contradictions for your review Installs write-mode instructions that prevent re-bloating Yes, it works retroactively as well. Tested on a 7-file project and a 41-file project — both cleaned up, contradictions resolved, no data loss. To install (one command): mkdir -p ~/.claude/commands && curl -sL https://gist.github.com/evanvandyke/a7063a8e5c838673a55df0be10f4892c/raw -o ~/.claude/commands/memory-cleanup.md Then run /memory-cleanup in any project. What this doesn't fix This manages the content quality of your memory files — contradictions, redundancy, bloat. It can't change the system-level instructions that Anthropic bakes into Claude Code, and it can't address model-level changes or server-side throttling. But it removes one real source of noise from your sessions. Note: Anthropic has added an "Auto Dream" consolidation feature that prunes memory between sessions. This skill goes further — it restructures memory into a managed 2-file system with write-mode guardrails that prevent the accumulation pattern from recurring. Built collaboratively with Claude (Opus 4.7). I drove the diagnosis and design decisions; Claude did the auditing and skill construction. Sharing because the diagnostic is free and takes 10 seconds — if it helps even a few people, worth the post. submitted by /u/really_evan [link] [comments]
View originalClaude is genuinely amazing - appreciation post
this silly robot on the other side of my computer has helped me in some really hard to describe ways..Even when discussing personal things that I've needed guidance on ways of thinking about issues and perspectives, it has not, in any moment tried to drag me into a endless conversation, it has constantly pushed back against narratives that didnt make sense, and told me to leave and disconnect.. Claude has really pushed me to get distance from it, to be pragmatic and to look at the things that have value outside the conversation with it... truly incredible work the Anthropic team has done with Claude's personality and alignment. submitted by /u/weichafediego [link] [comments]
View originalNeed reliable source for 30+ years of S&P 500 historical data for LSTM/Transformer research [P]
Hi everyone, I'm starting a research project on financial time-series forecasting using LSTM and Transformer models for predicting S&P 500 market direction. Right now, I'm struggling with obtaining reliable long-term historical data. I tried Yahoo Finance, but downloads are inconsistent/failing for me, and most Kaggle datasets I found only contain around 5–10 years of data. I specifically need: Around 30 years of historical S&P 500 data Preferably daily OHLCV data Reliable and clean source suitable for ML research Ideally free or student-friendly I also want to understand what researchers typically use in academic work for financial forecasting: Yahoo Finance? Alpha Vantage? WRDS/CRSP? Polygon? Kaggle? Something else? Additionally: Is using only S&P 500 index data enough for a Master's level research project? Or should I include technical indicators, macroeconomic data, sentiment, or constituent stock data? Would appreciate guidance from people who've actually worked on financial ML projects. Thanks. submitted by /u/stickPotatoe [link] [comments]
View originalI told Claude to stop using em dashes. It happily obliged...
Instructions for Claude: Do not use em dash. Now Claude is using double hyphens in lieu of em dashes. Technically correct I guess... Open to any suggestions to get rid of both! submitted by /u/salty_dragonfly1 [link] [comments]
View originalClaude AI failed to render guiding steps
Upon asking Claude AI to give me step by step guidance and visuals on how to edit codes. It couldn't render despite how many times I've asked. Changing browser and using Claude App didn't help either. Anyone knows how to workaround? Processing img z62imz8fox1h1... submitted by /u/tommy7611 [link] [comments]
View originalCan Claude generate output in CoWork and copy paste it into Outlook?
Heya, as the title suggests, I’m wondering if Claude can generate a body of text in CoWork and then (a) copy and paste the body of text into a draft email in Outlook, (b) attach a file to the draft email and (c) populate the Subject line? I have a Pro subscription and have built some extensive skills which consistently generate the output I need, so I’m now looking for an efficient way to get the output into Outlook. Any tips and/or guidance would be greatly appreciated. submitted by /u/Outrageous-Way6102 [link] [comments]
View originalMCP server for the TLA+ model checker tla-rs
Hi all, Just shipped an MCP server some of you might find useful: **tla-mcp**. TLA+ is a formal-spec language for designing concurrent and distributed systems. You describe what your protocol should do and a model checker tries every reachable state to catch invariant violations, deadlocks, race conditions you didn't see coming. With tla-mcp registered, Claude Code can call the checker as a first-class tool: validate a spec, run a bounded check with a counterexample trace, replay specific scenarios, all from inside the chat. Tool descriptions are deliberately opinionated about how the model should use the checker (budget all limits upfront, treat `limit_reached` as inconclusive, look at the last transition of a trace first) so the guidance survives context truncation. Install + client config snippet + tour of the four tools is on the landing page: **https://fabracht.github.io/tla-rs/** It's an experiment. Feedback and bug reports welcome. submitted by /u/Anxious_Tool [link] [comments]
View originalAgent Terraform Skill for Codex (Agentic Skill)
I added dedicated backend-state safety support to TerraShark. Mini recap: TerraShark is my Terraform and OpenTofu skill for Claude Code and Codex. LLMs hallucinate a lot with Terraform. They often produce HCL that looks correct, but is actually risky: unstable resource identity, missing moved blocks, secrets leaking into state, huge root modules, unsafe production applies, weak CI pipelines, missing policy checks, or rollback plans that are basically useless once something goes wrong. TerraShark is meant to fix that by making the AI reason in a failure-mode-first way. It does not just tell the model “write good Terraform”. It makes the model ask what can go wrong before generating code. Is this an identity-churn risk? A secret-exposure risk? A blast-radius risk? A CI drift risk? A compliance-gate risk? Then it loads only the references that matter for that task and returns the answer with assumptions, tradeoffs, validation steps, and rollback guidance. That matters because Terraform mistakes can look totally fine at first. A plan can look normal while replacing important infrastructure. A refactor can look clean while changing resource addresses. A secret can be marked sensitive and still live in state. A pipeline can pass validation and still apply in an unsafe way. Repo: https://github.com/LukasNiessen/terrashark Now what’s new: TerraShark now has dedicated backend-state safety support. Terraform keeps a state file. That state file is basically Terraform’s memory: it maps the code you wrote to the real infrastructure that already exists. The backend is where that state lives, for example in S3, Azure Blob Storage, GCS, Terraform Cloud, PostgreSQL, Consul, or locally on disk. When the task involves backend config, backend migration, state storage, locking, force-unlock, backup, restore, S3, AzureRM, GCS, Terraform Cloud/remote, PostgreSQL, Consul, or local state, TerraShark now switches into backend-aware guidance. This matters because state is one of the highest-impact parts of Terraform. If state is lost, corrupted, unlocked, migrated badly, or readable by the wrong people, Terraform can make very dangerous assumptions. It may try to recreate infrastructure that already exists. It may allow two applies to run at the same time. It may leak sensitive values. It may turn a backend migration into a production incident. So TerraShark now keeps the boring but critical backend details in mind: S3 needs versioning, encryption, public access blocking, narrow IAM, locking, and clean state keys per environment. AzureRM needs storage encryption, blob recovery/versioning where available, lease-based locking, network restrictions, and narrow RBAC. GCS needs versioning, uniform bucket-level access, encryption, narrow IAM, and clean prefixes. Terraform Cloud needs workspace boundaries, restricted state sharing, sensitive variables, and approved execution mode. It also knows the common LLM mistakes here: suggesting local state for a team setup, forgetting state locking, creating backend storage inside the same root module that uses it, recommending force-unlock too casually, mixing backend migration with unrelated refactors, skipping state backups, or assuming encrypted state is safe for anyone to read. TerraShark applies progressive disclosure pretty strictly and stays very token lean. The core skill stays small and procedural. Deeper backend-state guidance is only loaded when the task actually touches backend or state risk. So instead of generic Terraform advice, you get backend-aware Terraform guidance exactly when the risk appears. Compared to Anton Babenko’s Terraform skill: Anton Babenko’s Terraform skill is more like a broad Terraform reference manual. It includes a lot of useful Terraform material up front, but that also means the model carries a lot more general context from the beginning. His skill burned through my tokens incredibly fast, and for my use case that just was not needed. TerraShark takes a different approach. It keeps activation much leaner and is built around a diagnostic workflow. First it identifies the likely failure mode, then it loads the specific reference material needed for that risk. That is the core difference: TerraShark is not trying to be the biggest Terraform knowledge dump. It is trying to be a focused safety layer for LLM-assisted Terraform work. Feedback and PRs are highly welcome! submitted by /u/trolleid [link] [comments]
View originalI converted Google’s AI search guidelines into a Claude skill goog-geo
Google recently published official guidance on how to optimize pages for AI-powered search features like AI Overviews and AI Mode - https://developers.google.com/search/docs/fundamentals/ai-optimization-guide Most of the advice floating around GEO / AI search optimization is still pretty hand-wavy, so I wanted something more concrete. So, I converted Google’s AI search guidance into an open-source Claude Code skill: https://github.com/vishalmdi/goog-geo The skill audits any live URL and turns the guidance into a scored report: Checks whether Googlebot can crawl the page Checks indexability and snippet eligibility Detects noindex, nosnippet, max-snippet, canonicals, robots.txt issues Uses a live browser to inspect rendered DOM and JSON-LD schema Reviews headings, semantic HTML, answer blocks, FAQs, tables, author/date signals Checks whether AI crawlers like GPTBot, PerplexityBot, ClaudeBot, and Bingbot are allowed Produces a 100-point GEO / AI search readiness score Gives a prioritized action plan instead of vague SEO advice The main idea is simple - Google’s AI search features are not a totally separate SEO system. They still depend on crawlability, indexability, snippet eligibility, helpful content, and structured/extractable pages. So instead of guessing what “AI optimization” means, this skill audits against the actual signals Google documented. I also added a “what not to do” section because Google explicitly says some popular AI SEO advice is useless or misunderstood, like treating `llms.txt` as a Google AI ranking lever. Would love feedback from anyone working on SEO, content, SaaS landing pages, docs, or AI search visibility. If you find it useful, a GitHub star would help: Repo Link: https://github.com/vishalmdi/goog-geo submitted by /u/vishal_jaiswal [link] [comments]
View originalWhat's new in CC 2.1.143 (+302 tokens)
Agent Prompt: Hook condition evaluator (stop) — Adds a third response shape {"ok": false, "impossible": true, "reason": ...} for conditions that can never be satisfied (self-contradictory, missing capability, or assistant has exhausted approaches). Cautions the evaluator to independently verify impossibility rather than trust the assistant's self-assessment, and not to mark conditions impossible just because progress is slow or the goal isn't yet reached. Skill: Verify skill — Reframes the "don't run tests" rationale from "CI already ran them" to "running them proves you can run CI, not that the change works," so the rule applies even when there's no CI. Generalizes the workflow beyond PRs: the scope can be a diff or just "does X work," and "PR description" becomes "any description." Expands the change-discovery section with commands for repos without an upstream (git diff origin/HEAD...), uncommitted changes (git diff HEAD), and a fallback that asks the user to name the scope when there's no repo at all. Adds a "Destructive path?" guard telling the verifier not to drive code live when it deletes, publishes, sends, or writes outside the workspace without a dry-run, and to call out which path went unexercised. Swaps the /init-verifiers follow-up suggestion for a note to capture the working build/launch recipe so it can become a verifier-* skill later, and trims the report-formatting guidance (drops the "hoisted above the PR comment fold" detail). Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.143 submitted by /u/Dramatic_Squash_3502 [link] [comments]
View originalThe Frontier-Only Narrative Is a Financing Story, Not an Architecture Story
The frontier-only narrative is an artifact of how AI infrastructure is being financed, not how production systems are being built. The setup. Q1 2026 disclosed $112B in hyperscaler capex in a single quarter, $650–725B in 2026 guidance, and Alphabet's first 100-year bond by a tech company since Motorola 1997 (see a0109). The story that underwrites that paper is: every query needs a bigger model. The architecture says the opposite. Microsoft's Phi-4 (14B parameters) exceeds its teacher GPT-4o on graduate STEM and competition math. Phi-4-reasoning is competitive with DeepSeek-R1 at roughly one-forty-eighth the parameter count. Claude Haiku 4.5 is positioned by Anthropic and AWS for "economically viable agent experiences." None of this is a benchmark teaser — it is the production toolkit, available today. Routing is the missing component. RouteLLM (UC Berkeley, Anyscale) demonstrated over 2x cost reduction without sacrificing response quality. AWS Bedrock Intelligent Prompt Routing — generally available, official, supported — claims up to 30% cost reduction within a single model family without compromising accuracy. The Flagship Tax (see a0085) didn't just die; it left a vacancy at the architecture layer. The bookkeeping nobody wants to do. Operator audits suggest 40–60% of token budgets in production LLM applications are waste, dominated by default-to-frontier routing. Roughly 37% of enterprises with production AI workloads run five or more models in their stack. The rest are still defaulting to one. Why the story isn't being told. Hundred-year bonds don't pencil out on "use less compute per query." They pencil out on "every query needs a bigger model." The opacity in the harness (see a0107) is the symptom; the underwriting is the disease. What you do Monday morning. Treat model selection as a dependency-graph decision, not a vendor decision. Add a complexity classifier. Default to small. Cascade up when verification fails. Instrument model-mix as a first-class production metric. Bottom line. You are not behind because you have not bought the biggest model. You are behind because you have not built the router. submitted by /u/gastao_s_s [link] [comments]
View originalNeed help picking the right emoji (like we did for this post)? 🤔 @cassidoo made an emoji list generator with Copilot CLI. Learn how she did it and pick up tools and tricks for your next project. 👇
Need help picking the right emoji (like we did for this post)? 🤔 @cassidoo made an emoji list generator with Copilot CLI. Learn how she did it and pick up tools and tricks for your next project. 👇 https://t.co/13xwmu6tE9 https://t.co/pCy8PGfUIE
View originalNew to Claude
Hey guys, I see videos and people talking about how using Claude has helped them create things, help advance, and solve problems in their business. I own a roofing company and it’s just me in the company. My website is kinda whack, struggling to generate leads, and so on and so forth. Normal small business issues when you’re starting out. Now, as the title states, I’m new to Claude, but I would actually fall into the category of being new to using AI effective and efficiently as a whole. I could youtube, google, or even tiktok binge some videos about Claude, but I feel as if I’m not sure if what I’m looking for is correct. So, I’ve come to reddit to ask you guys for some guidance. I want to hear what steps you took to learn this tool and what benefit it has brought you. My main goal I want to achieve is how can I learn to use this tool the correct way to help better my business. Thank you guys for your time and the read. I hope and would really appreciate some suggestions and feedback! submitted by /u/Gydn- [link] [comments]
View originalCooking up something new 🧑🍳 Join the waitlist for early access to technical preview of the GitHub Copilot app 👇 https://t.co/ODODKdvzOA https://t.co/1h7AJPAhiH
Cooking up something new 🧑🍳 Join the waitlist for early access to technical preview of the GitHub Copilot app 👇 https://t.co/ODODKdvzOA https://t.co/1h7AJPAhiH
View originalRepository Audit Available
Deep analysis of guidance-ai/guidance — architecture, costs, security, dependencies & more
Guidance uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Set the temperature of the generation, Capture the generated page from the Model object, A Pythonic interface for language models, Guarantee output syntax with constrained generation, Debug grammars offline (no model API calls), Create your own Guidance functions, Generating JSON, Resources.
Guidance is commonly used for: Text generation for chatbots, Automated content creation for blogs, Code generation and assistance, Data analysis and report generation, Natural language understanding tasks, Interactive storytelling applications.
Guidance integrates with: Transformers, llama.cpp, OpenAI, Hugging Face, TensorFlow, PyTorch, FastAPI, Flask, Django, Streamlit.
Guidance has a public GitHub repository with 21,364 stars.
Cristiano Amon
President and CEO at Qualcomm
3 mentions
Based on user reviews and social mentions, the most common pain points are: down, token cost, token usage, breaking.
Based on 188 social mentions analyzed, 5% of sentiment is positive, 95% neutral, and 0% negative.