Build controllable agents with LangGraph, our low-level agent orchestration framework
LangGraph is praised for its ability to effectively manage multiple AI agents, offering robust state tracking and infrastructure handling which simplifies user workflows. However, some users have encountered security issues during structured testing, indicating potential vulnerabilities in the system. While there is limited specific feedback on pricing, users involved in DIY approaches have expressed concerns about potential costs, suggesting that affordability could be a consideration. Overall, LangGraph is regarded as a strong tool for managing AI agents with a few caveats concerning its security frameworks.
Mentions (30d)
0
Reviews
0
Platforms
2
GitHub Stars
28,022
4,791 forks
LangGraph is praised for its ability to effectively manage multiple AI agents, offering robust state tracking and infrastructure handling which simplifies user workflows. However, some users have encountered security issues during structured testing, indicating potential vulnerabilities in the system. While there is limited specific feedback on pricing, users involved in DIY approaches have expressed concerns about potential costs, suggesting that affordability could be a consideration. Overall, LangGraph is regarded as a strong tool for managing AI agents with a few caveats concerning its security frameworks.
Features
Use Cases
Industry
information technology & services
Employees
98
Funding Stage
Series B
Total Funding
$260.0M
17,647
GitHub followers
232
GitHub repos
28,022
GitHub stars
20
npm packages
25
HuggingFace models
Chat based form filler in natural language
Hi folks, I am building an AI chat based system whose eventual goal is to get answers to all the questions I want to have answered from user in plain language conversation. It’s quite similar to filling out a form, but instead of boxes, it happens through a chatbot. I want to design and build it end-to-end for maximum scalability. I also want to make it feature-rich — for example, the bot should be able to use tools like search in the middle of conversations, read uploaded files /images. If users diverge into different topics, I want to allow that and let bot helps it, but eventually bring things back to where we want to lead them. The system should generate questions based on the user's input and intelligently decide what to ask next. I’m confused about how to build it. I previously built a state machine, but it didn’t perform as expected because out-of-order data coming from users breaks it. I want to explore other tools like LangGraph, but I’m not really sure how to design the overall architecture. I need help designing it in a way that it can be plugged into different systems and reused across products. The data I want to gather is stored in a Pydantic model. I also have a couple of helper functions like web search, DB update functions, and utility functions to extract data from user input, which I can probably wrap into tools. Would love some help figuring out the right architecture and approach for this. submitted by /u/sagar12sagar [link] [comments]
View originalWhy I added a governance layer on top of my Claude agents (and why it made a huge difference)
Hey r/ClaudeAI, I’ve been heavily using Claude 3.5 Sonnet and Opus through the Anthropic API to build agents and workflows. Claude is honestly one of the best models right now for complex reasoning and tool calling. But here’s what I kept running into: even though Claude is smart, when I put it into longer-running agent loops (CrewAI, LangGraph style setups), it still does the classic agent things occasional silent failures, burning through tokens in loops, or just going off in directions I didn’t expect. The worst part wasn’t even the cost. It was the constant checking. I couldn’t fully trust the agent to run for hours without me babysitting it. So I started using a lightweight governance/observability layer that sits below the agent (not inside the system prompt). It basically adds: Hard safety boundaries and fail-closed behavior Real-time live traces so I can actually see what Claude is doing step by step Human-in-the-loop control (I can pause, resume or stop the agent from Telegram/phone) Automatic checkpointing Proper runtime budget caps (not just “please don’t spend too much” in the prompt) The difference is night and day. I can now let my Claude agents run for long periods and actually feel safe ignoring them. Curious if other people building with Claude have run into the same trust/cost/monitoring issues. Have you tried any governance tools or patterns that made your Claude agents feel truly production-ready? Or are you still manually monitoring them? Would love to hear what’s working for you. submitted by /u/Necessary_Drag_8031 [link] [comments]
View originalCavemen skill questions
Caveman looks amazing for reducing output tokens! Has anyone tried applying the Caveman skill to a headless, automated backend application? I have a Python/LangGraph pipeline making direct API calls to Claude to validate telecom engineering drawings, and I'd love to get these token savings. Can the MCP proxy be wrapped around standard API calls, or should I just manually inject the Caveman prompts into my backend logic submitted by /u/Special_Spring4602 [link] [comments]
View originalThe open-source AI agent config repo the community has been building just hit 888 stars — asking for feedback & feature ideas
Over the past year our team and community have been building an open-source collection of AI agent configs: production-ready system prompts, tool-calling schemas, RAG setups, multi-agent orchestration patterns, and model-specific tuning files. Repo: https://github.com/caliber-ai-org/ai-setup This week it crossed 888 GitHub stars and nearly 100 forks. All free, no paywall, no product to sell. What's in there: - System prompt templates across GPT-4o, Claude 3.5/3.7, Gemini 2.5 Pro - Tool-use and function calling schemas for agentic workflows - LangChain / LangGraph agent setup configs - RAG pipeline configurations with different retrieval strategies - Ollama and local model setups - CLAUDE.md / AGENTS.md templates for coding agent contexts - Multi-agent orchestration patterns We'd love to hear from this community: What AI agent patterns are you using that you'd want to see in the repo? What's missing that would make this genuinely useful to you? What setups have you found work well in production? All feedback and contributions are welcome. submitted by /u/Substantial-Cost-429 [link] [comments]
View originalToday I learned about this
submitted by /u/YogurtWild [link] [comments]
View originalI run a team of Claude agents that ships PRs to production — open source
I've been running a multi-agent system in production for a few months — a co-CTO agent + specialist agents (PM, dev, ops) that handle real engineering work end-to-end: design specs, code review, PR implementation, deploys, monitoring. The architecture: Each agent is a Docker container running claude -p (with optional Codex fallback) wrapped in .NET 10. A central orchestrator coordinates them via Temporal workflows + RabbitMQ. Agents talk to me over Telegram (DMs + group chat for the whole team). Memory is Qdrant + Ollama embeddings — agents recall past decisions across sessions. A web dashboard shows live agent status and in-flight workflows. What it does day-to-day: I drop a one-line request in Telegram. PM writes the spec, two reviewers run consensus, dev implements the PR, CI ships to staging, PM verifies, I approve the merge gate, prod deploy. Same pattern handles infra: deploy verifications, health checks, daily digests, incident triage. Agents have access to fleet-memory (semantic memory MCP) — they search before acting, write learnings after. 5-min demo of an actual production PR being shipped: https://youtu.be/DIx7Y3GfmGc Why I built it instead of using crewai/autogen/langgraph: I wanted Temporal-backed durability (workflows survive restarts, retries are deterministic) and ops-grade observability (every workflow visible in the temporal UI, every signal auditable). The agents themselves are just claude -p — the magic is in the orchestration layer. Open source: https://github.com/anurmatov/phleet Side note for those who recognize me — this runs on the Mac Studio I documented in mac-studio-server. The dogfooding is real. Happy to dig into prompts, system architecture, memory strategy, or how the agents handle PR reviews — AMA. submitted by /u/_ggsa [link] [comments]
View originalALL Agents deviate, fail and mess up because no enforcement is done at runtime. A method to fix it.
I have been following this and many other subs around LLMs and Agents, everything from the top posts to recent are regarding agents going off and doing something they are not supposed to do, drift and ignore the system prompts. Real examples: "Never delete user data" → agent calls DROP TABLE users next turn "Don't share internal pricing" → agent leaks cost basis to a customer "Verify identity first" → agent skips to the action Add 10 more rules → model quietly drops the first 5 I am 100% sure if you have used Agents in prod, this has occurred to you (especially when your system prompts get larger, and context gets bigger). You can test this yourself and notice immediate enforcement. Prompt-based rules are suggestions, not constraints. Re-prompting fixes one case, breaks two. Post-hoc evals tell you what already went wrong. NeMo and Guardrails AI help on content safety but don't cover business logic/your specification. After tackling this from a few angles, I finally got something solid. A proxy system between your app and your LLM, which reads rules from a plain markdown, enforces at runtime. Provider-agnostic, one base URL change, works with LangGraph/CrewAI/custom. - Maximum discount is 15%. - Never reveal internal pricing or cost basis. Without it: agent offers 90% off and mentions your margin. With it: 15%, no margin talk. Curious if it solved your LLMs for outputting incorrect stuff or agents from going off tracks, it definitely did for my (specific) use cases. What's everyone doing for this in prod? Shadow evals? Re-prompt loops? Something I'm missing? submitted by /u/Chinmay101202 [link] [comments]
View original[Question] How to extract/package a specific "Claude Code Skill" workflow into a standalone app? (中文:如何将特定的 Claude Code Skill 流程提取并封装为独立 App?)
Introduction: I’ve been using Claude Code and created several custom workflows using SKILL.md. These "Skills" work great within the terminal, but I want to take it a step further: How can I extract a specific Skill’s logic and package it into a standalone application (Web or Desktop) for others to use without them needing to install Claude Code? Key Challenges: Context Injection: In Claude Code, the SKILL.md is automatically injected into the context. In a custom app, should I just paste it into the System Prompt, or is there a better way to handle the metadata (triggers, permissions)? Action Execution: Many Skills rely on Claude Code’s ability to run shell commands or edit files. If I move this to a Web App, what’s the best alternative for this "Agentic" loop? (e.g., using MCP, LangGraph, or custom tool-calling?) Existing Projects: Are there any open-source projects or frameworks that specifically facilitate the migration of Claude Skills to standalone agents/apps? What I'm looking for: Best practices for "Skill-to-App" migration. Recommended tech stacks (e.g., Streamlit + Anthropic SDK vs. Electron + MCP). Any existing GitHub repos that serve as a "Skill Wrapper." 中文翻译(方便你根据讨论回复): 简介: 我一直在使用 Claude Code,并利用 SKILL.md 创建了一些自定义工作流。这些 Skill 在终端里运行得很好,但我想更进一步:如何提取特定的 Skill 逻辑,并将其封装成一个独立的应用程序(Web 或桌面端),让其他没安装 Claude Code 的用户也能使用? 核心难题: 上下文注入: Claude Code 会自动注入 SKILL.md。在自定义 App 中,是直接把内容贴进 System Prompt,还是有更好的处理元数据(触发器、权限)的方法? 动作执行: 很多 Skill 依赖 Claude Code 运行 Shell 或编辑文件的能力。如果迁移到 Web App,实现这种“智能体循环”的最佳替代方案是什么?(比如 MCP, LangGraph 还是自定义 Tool-calling?) 现有项目: 有没有专门将 Claude Skills 迁移为独立 Agent/App 的开源项目或框架? 我希望得到: “Skill 转 App”的最佳实践。 推荐的技术栈。 任何可以作为“Skill 包装器”的 GitHub 仓库。 submitted by /u/Electronic_Film2004 [link] [comments]
View originalThe hidden gap in enterprise AI adoption: nobody has figured out how to manage AI agents at scale
We are entering a phase where AI adoption metrics at large companies look good on paper, but a new problem is quietly forming: nobody actually knows how to govern the agents that are being deployed. Here is the maturity curve as I see it: Stage 1: Experimentation. Teams spin up a few agents, see results, get excited. Stage 2: Proliferation. Agents spread across departments. Sales has one. Support has three. Marketing is running five. DevOps is testing two. Stage 3: Chaos. Nobody knows which agents are active, what instructions they are running, who owns them, whether any are duplicating effort, or whether the configs are current. Most mid-to-large enterprises with serious AI programs are hitting Stage 3 right now. The tooling for Stage 3 does not really exist yet. Some of the symptoms I keep seeing: - Customer-facing agents running system prompts that were written 8 months ago and never reviewed - Multiple teams independently building agents to solve the same problem because there is no central inventory - Agents that were stood up for a pilot and never decommissioned, still consuming credits and occasionally responding to real users - No audit trail when something goes wrong. Did the agent say that because the model hallucinated or because someone changed the instructions last Tuesday? The build-side tooling (LangChain, LangGraph, Claude, etc.) is excellent and getting better. The run-side tooling for AI directors and heads of AI who need to actually manage a fleet of agents in production is almost nonexistent. We are working on this at Caliber. We gave the community an open source repo as a foundation for structured AI agent setup (link in comments). And if you are in an AI leadership role trying to navigate this transition, the newsletter at caliber-ai.dev covers exactly this operational layer. submitted by /u/Substantial-Cost-429 [link] [comments]
View originaleuclid :The open source AI math tutor.
I built an open-source ALEKS alternative that actually proves you understand math. Four AI agents that find what you know, decide what you're ready for, teach through Socratic dialogue, and verify real understanding. Grades 1–12. Runs locally. What it does: - Diagnoses what you actually know (Knowledge Space Theory) - Only teaches what you're ready for - Uses Socratic dialogue (no answer dumping) - Verifies real understanding before moving on How it works: - 4-agent system (diagnosis, planning, teaching, evaluation) - Knowledge graph of ~60 math concepts (grade 1 → calculus) - Tracks progress locally (~/.euclid/state.db) - No data leaves your machine (except LLM calls) Built with: - LangGraph (agent orchestration) - LiteLLM (plug any model) Example flow: User: "I don’t understand fractions" → system detects missing prerequisite: division → starts guided questions instead of explaining → unlocks fractions only after mastery Looking for feedback: - Is this actually useful vs ALEKS? - What would you add/remove? - Would you use it locally? GitHub: https://github.com/Tarek-new/euclid https://preview.redd.it/htmocuminbwg1.png?width=900&format=png&auto=webp&s=8f21d0cb3d26c5749e626b9299f8a1dfcf6e3bbc submitted by /u/john-fransis [link] [comments]
View originalBuilding advanced AI workflows—what am I missing?
Hey everyone, I’ve been diving into advanced workflow orchestration lately—working with tools like LangChain / LangGraph, AWS Step Functions, and concepts like fuzzy canonicalization. I’m trying to get a broader, more future-proof understanding of this space. What other tools, patterns, or concepts would you recommend I explore next? Could be anything from orchestration, distributed systems, LLM infra, or production best practices. Would love to hear what’s been valuable in your experience. submitted by /u/emprendedorjoven [link] [comments]
View originalAgentic OS — an governed multi-agent execution platform
I've been building a system where multiple AI agents execute structured work under explicit governance rules. Sharing it because the architecture might be interesting to people building multi-agent systems. What it does: You set a goal. A coordinator agent decomposes it into tasks. Specialized agents (developer, designer, QA, etc.) execute through controlled tool access, collaborate via explicit handoffs, and produce artifacts. QA agents validate outputs. Escalations surface for human approval. What's different from CrewAI/AutoGen/LangGraph: The focus isn't on the agent — it's on the governance and execution layer around the agent. Tool calls go through an MCP gateway with per-role permission checks and audit logging Zero shared mutable state between agents — collaboration through structured handoffs only Policy engine with configurable approval workflows (proceed/block/timeout-with-default) Append-only task versioning — every modification creates a new version with author and reason Built-in evaluation engine that scores tasks on quality, iterations, latency, cost, and policy compliance Agent reputation scoring with a weighted formula (QA pass rate, iteration efficiency, latency, cost, reliability) Architecture: 5 layers with strict boundaries — frontend (visualization only), API gateway (auth/RBAC), orchestration engine (24 modules), agent runtime (role-based, no direct tool access), MCP gateway (the only path to tools). Stack: React + TypeScript, FastAPI, SQLite WAL, pluggable LLM providers (OpenAI, Anthropic, Azure), MCP protocol. Configurable: Different team presets (software, marketing, custom), operating models with different governance rules, pluggable LLM backends, reusable skills, and MCP-backed integrations. please guys, I would love to get your feedback on this and tell me if this is interesting for you to use submitted by /u/ramirez_tn [link] [comments]
View originalAgentic OS — an governed multi-agent execution platform
I've been building a system where multiple AI agents execute structured work under explicit governance rules. Sharing it because the architecture might be interesting to people building multi-agent systems. What it does: You set a goal. A coordinator agent decomposes it into tasks. Specialized agents (developer, designer, QA, etc.) execute through controlled tool access, collaborate via explicit handoffs, and produce artifacts. QA agents validate outputs. Escalations surface for human approval. What's different from CrewAI/AutoGen/LangGraph: The focus isn't on the agent — it's on the governance and execution layer around the agent. Tool calls go through an MCP gateway with per-role permission checks and audit logging Zero shared mutable state between agents — collaboration through structured handoffs only Policy engine with configurable approval workflows (proceed/block/timeout-with-default) Append-only task versioning — every modification creates a new version with author and reason Built-in evaluation engine that scores tasks on quality, iterations, latency, cost, and policy compliance Agent reputation scoring with a weighted formula (QA pass rate, iteration efficiency, latency, cost, reliability) Architecture: 5 layers with strict boundaries — frontend (visualization only), API gateway (auth/RBAC), orchestration engine (24 modules), agent runtime (role-based, no direct tool access), MCP gateway (the only path to tools). Stack: React + TypeScript, FastAPI, SQLite WAL, pluggable LLM providers (OpenAI, Anthropic, Azure), MCP protocol. Configurable: Different team presets (software, marketing, custom), operating models with different governance rules, pluggable LLM backends, reusable skills, and MCP-backed integrations. agenticompanies.com please guys, I would love to get your feedback on this and tell me if this is interesting for you to use you can register with email/passoword to view the platform but if you want to operate agentsession I need to send you an invitation code. please feel free to DM me for an invitation code you would also need to use your Anthropic or OpenAI API key to operate then engines Thanks submitted by /u/ramirez_tn [link] [comments]
View originalI built Synapse AI: An open-source, DAG-based orchestrator for AI agents.
Hey Everyone, For the past three months, I’ve been building an open-source orchestration platform for AI agents called Synapse AI. I started this because I found existing frameworks (like LangChain or AutoGen) either too bloated or too unpredictable for production workflows. Letting agents freely "chat" with each other often leads to infinite loops, high API costs, and debugging nightmares. I wanted strict, predictable control. The Architecture: Instead of conversational routing, Synapse AI relies on a Directed Acyclic Graph (DAG) architecture. You define the work, strictly control the hand-offs between agents, and get a completed task on the other side. Under the Hood: Tool Agnostic: Build custom tools from scratch (Python/webhooks) or instantly plug in existing Model Context Protocol (MCP) servers. Local-First Emphasis: Full native support for Ollama so you can run routing and tasks entirely locally. (It also supports Gemini, Claude, and OpenAI for the heavy lifting). CLI Integration: Just shipped a community-requested feature to connect Claude Code, Gemini CLI, Codex CLI, and GitHub Copilot CLI directly to your agents. Frictionless Setup: A 1-step installation process across macOS, Windows, and Linux. What I'm looking for: I am currently maintaining this solo and rolling it out for an early pilot phase. I would love for this community to take a look under the hood. Specifically: Code Review: I’d love brutal feedback on the DAG implementation and overall architecture. Contributors & Collaborators: If you find the project worthwhile, I am actively looking for people to team up with! Whether it's adding new LLM providers, fixing UI quirks, or improving the 1-step installer, PRs are incredibly welcome. Repo: https://github.com/naveenraj-17/synapse-ai If you bump into any bugs, please drop an issue so I can patch it. Would love to hear your thoughts! submitted by /u/WabbaLubba-DubDub [link] [comments]
View originalHow are you catching overnight agent drift when the logs still say success?
Last night was the same dumb failure again: clean logs at 11pm, broken state by 7am. I’ve been trying to keep a few OpenAI-based agents stable across scheduled runs, and the breakage is never loud. One small prompt tweak, one tool schema update, or one model swap, and the morning report still says "success" even though the agent quietly skipped half the job. I’ve tried AutoGen, CrewAI, LangGraph, and Lattice. Some parts got easier. LangGraph made the control flow easier to inspect, while CrewAI was fast to stand up for simple orchestration. Lattice caught one issue the others missed because it keeps a per-agent config hash and flags when the deployed version drifts from the last run cycle. That helped, but it did not solve the main problem. I still do not have a good way to catch slow behavioral drift when the config is unchanged but the agent starts taking weird shortcuts after a few days. The logs look fine. The outputs are not. How are you detecting that kind of fake-success before it burns a week? submitted by /u/Acrobatic_Task_6573 [link] [comments]
View originalRepository Audit Available
Deep analysis of langchain-ai/langgraph — architecture, costs, security, dependencies & more
LangGraph uses a tiered pricing model. Visit their website for current pricing details.
Key features include: How does LangGraph help?, Guide, moderate, and control your agent with human-in-the-loop, Build expressive, customizable agent workflows, Persist memory for future interactions, First-class streaming for better UX design, LangGraph FAQs, See what your agent is really doing.
LangGraph is commonly used for: Automating customer support interactions with human oversight, Creating personalized marketing campaigns that adapt based on user feedback, Developing educational tools that provide tailored learning experiences, Implementing complex data analysis workflows with agent-driven insights, Streamlining project management tasks through automated updates and reminders, Facilitating content generation while ensuring quality control.
LangGraph integrates with: Slack for team communication, Zapier for workflow automation, Google Sheets for data management, Trello for project tracking, Salesforce for CRM integration, Twilio for SMS notifications, Discord for community engagement, Jira for issue tracking, Notion for documentation, AWS for cloud computing resources.
Ollama
Project at Ollama
1 mention
LangGraph has a public GitHub repository with 28,022 stars.
Based on user reviews and social mentions, the most common pain points are: API costs, overspending, API bill, token cost.
Based on 34 social mentions analyzed, 24% of sentiment is positive, 76% neutral, and 0% negative.