LLM Gateway (OpenAI Proxy) to manage authentication, loadbalancing, and spend tracking across 100+ LLMs. All in the OpenAI format.
LiteLLM is generally appreciated for its capabilities as an AI coding tool, particularly among users with AWS credits. However, it has recently faced significant criticism due to a security breach involving credential-stealing malware linked to a malicious package release. Users show concerns about the safety and reliability of the software in light of these events. The overall sentiment on pricing is mostly neutral as the primary focus remains on addressing security issues, impacting its reputation negatively.
Mentions (30d)
0
Reviews
0
Platforms
5
GitHub Stars
41,659
6,878 forks
LiteLLM is generally appreciated for its capabilities as an AI coding tool, particularly among users with AWS credits. However, it has recently faced significant criticism due to a security breach involving credential-stealing malware linked to a malicious package release. Users show concerns about the safety and reliability of the software in light of these events. The overall sentiment on pricing is mostly neutral as the primary focus remains on addressing security issues, impacting its reputation negatively.
Features
Use Cases
Funding Stage
Venture (Round not Specified)
815
GitHub followers
40
GitHub repos
41,659
GitHub stars
20
npm packages
10,659
npm downloads/wk
391,582,157
PyPI downloads/mo
Malicious litellm_init.pth in litellm 1.82.8 PyPI package – credential stealer
View originalPricing found: $0, $0
Agentic Workflow Visualization and API Gateway
I am building an API gateway for agents that can make your agentic AI code model and provider agnostic. I am also grouping agent runs that show multiple llm calls and tool calls in the visualization piece. It gives details on tokens, cost and model latency. I am doing this without requiring any instrumentation in the agentic code. The agents (python for now) are started by a rust correlator that assigns a job_id to each agent so we could track api and tool (inferred from http requests and responses) calls across the entire agentic run. The servers are also in rust. I also have an implementation where instead of the rust correlator i have python and other platform shims that do the same job and the servers are in go. I would appreciate comments from people who are in AI ops who use tools like litellm and Helicone and can provide feedback or complicated use cases. I plan to make everything open source so looking for collaborators too. submitted by /u/High-Speed-Diesel [link] [comments]
View originalLLM-Rosetta — format conversion library across LLM API standards, doubles as a proxy
This started because we had a proprietary internal LLM API that spoke none of the standard formats. Built an internal conversion layer to bridge it, maintained that for over a year. As colleagues started adopting more and more coding tools — Claude Code, opencode, Codex, VS Code plugins, Goose, and whatever came out that week — each with its own API format expectations, maintaining separate adapters for each became the actual problem. That's what pushed the internal conversion layer into a proper generalized design, and llm-rosetta is the result. It's a Python library that converts between LLM API formats — OpenAI Chat, Responses/Open Responses, Anthropic, and Google GenAI. The idea is you convert through a shared IR so you don't end up writing N² adapters. The key difference from LiteLLM: LiteLLM is a unified calling layer that takes OpenAI-style input and transforms it into provider-native requests — one direction. llm-rosetta uses a hub-and-spoke IR, so each provider only needs one converter, and you get any-to-any conversion for free. Anthropic → Google, OpenAI Chat → Anthropic, whatever direction you need. Use it as a library — pip install and call convert() directly, no server needed. Or run the gateway if you want a proxy that handles the format translation for you. Zero required runtime dependencies either way. The HTTP server, client, and persistence layer are vendored from zerodep (https://github.com/Oaklight/zerodep), another project of mine — stdlib-only single-file modules, not someone else's library repackaged. The gateway ships with a Docker image if you'd rather not deal with Python env setup. You can also deploy it on HuggingFace Spaces or anything similar — admin panel, dashboard, request log, config management all included. Screenshots: https://llm-rosetta.readthedocs.io/en/latest/gateway/admin-panel/ We've been running it in production for about 5 months as the conversion layer for an internal multi-model access platform — needed to support various API standards and coding tool integrations before the upstream APIs were fully standardized. The Responses converter passes all 6 official Open Responses compliance tests (schema + semantic) from the spec repo. So if you're running Ollama, vLLM, or LM Studio with Responses endpoints, it should just work as one side of the conversion. There's a shim layer for provider-specific quirks — built-in shims for OpenRouter, DeepSeek, Qwen, xAI, Volcengine, etc. Converters stay generic per API standard, shims handle the edge cases declaratively. 24 cross-provider examples in the repo covering all provider pairs, SDK + REST, streaming, tool calls, image inputs, multi-turn with provider switching mid-conversation. GitHub: https://github.com/Oaklight/llm-rosetta Docs: https://llm-rosetta.readthedocs.io arXiv: https://arxiv.org/abs/2604.09360 Gateway screenshot: https://preview.redd.it/qzzjr2dcdw1h1.png?width=949&format=png&auto=webp&s=bce4293aae81059f794909fc37f85071cee34378 submitted by /u/Oaklight_dp [link] [comments]
View originalAnthropic just banned "claude -p" from their Quota - BIG MISTAKE!
So Anthropic just announced that starting June 15, claude -p, Agent SDK usage, Claude Code GitHub Actions, and third-party Agent SDK apps will stop counting against the normal Pro/Max interactive Claude usage. Instead, they now go into a separate monthly Agent SDK credit bucket. For Max 5x, that is apparently $100/month. Which sounds fine until you realize any serious autonomous agent setup can burn through that very fast. So yeah, if you built anything around: tickets -> agents -> hooks -> executor -> claude -p -> background automation you are probably cooked. I was building exactly this kind of thing with AgentiBridge / AgentiCore / AgentiHooks. Basically a framework for orchestrating Claude Code agents at scale. The idea was simple: run Claude Code not as a human sitting in the terminal, but as a worker inside a larger production system. And now Anthropic basically said: “Nice automation stack bro, please move to the paid SDK/API bucket.” FML. But I don’t think the solution is to cry forever or keep playing cat-and-mouse with tmux hacks. The real solution is model routing. My plan is this: Keep Claude for interactive operator work. Use Claude where the reasoning actually matters: architecture decisions debugging hard shit reviewing plans high-context coding anything that needs taste and judgment But for background agents, automation loops, disposable workers, CI-style jobs, and dumb task execution? Fuck burning premium Claude credits on that. Put LiteLLM, Portkey, or another LLM gateway in front. Then route the worker swarm to cheaper models: Gemini DeepSeek Qwen OpenAI-compatible models local/self-hosted models where possible Claude Code already supports custom model options through environment variables. So in theory, you can have different profiles/scripts/aliases that swap model routing depending on what you are doing. One profile for interactive Claude. Another profile for automation. Another profile for cheap background agents. So instead of every autonomous goblin using the expensive brain, you send the cheap goblins to cheap models and keep Claude for the operator layer. This was always where agent orchestration was going anyway. One model for everything is stupid. The future is gateways, routing, workload separation, and not letting every background agent torch your best model quota because it decided to rewrite the same YAML file 11 times. Anthropic didn’t kill agent orchestration. They just made the architecture more obvious. submitted by /u/nestorcolt [link] [comments]
View originalOn Claude Max ($200/mo), burned 14.7M tokens in 7 days — mostly last 48h. Still hitting the wall. How do you survive burst usage on the top tier?
Thought Max would be a safety net. It's not. **My stats (last 7 days):** • **14.7M tokens** — the majority in the last **2 days** (project crunch, not normal usage) • **21 sessions**, **7/7 active days** • Longest session: **3 days 21 hours** • Opus 4.7 for everything • Anthropic says I've read **\~24x** ***The Count of Monte Cristo*** this week I'm paying for Max specifically so I don't have to think about limits. But after this burst, I'm feeling the throttle . Not a hard 429 yet, but the "slow down" is visible. **My setup:** • **Mac Studio M3 Ultra, 256GB RAM** — so local fallback is absolutely on the table if the harness supports it • Kimi Code CLI as a manual fallback (same codebase, zero **--resume** continuity) • **.llm-state.json** session dumps before switching • Symlinked [**CLAUDE.md**](http://CLAUDE.md) → [**KIMI.md**](http://KIMI.md) **My question to other Max users:** When you're paying $200 for "unlimited" and you actually *use* it during a crunch, what does your damage control look like? • Do you keep a second LLM on standby full-time? • Preemptively split workflow before the spike hits? (Opus for thinking, Sonnet for doing?) already doing this • Any way to see your "real" remaining quota before Anthropic soft-throttles you? • External memory files so you can hot-swap LLMs mid-project? **And the big one:** Is anyone running a **harness or gateway** that sits above Claude Code and auto-fails over to another provider — or even a local model? With 256GB RAM on this M3 Ultra, I could host a 70B+ parameter model locally for grunt work, but right now I'm manually hot-swapping between Claude and Kimi Code CLI when I feel the throttle. It's clunky. I've looked at LiteLLM for API-level routing but haven't found a good equivalent for local CLI coding agents that can also tap local inference. Manual switching is killing my flow. I'm not trying to use less. I paid to not worry about this. But burst usage is burst usage, and Max clearly has a ceiling. What's your failover architecture?  submitted by /u/New_Guitar_9121 [link] [comments]
View originalMy Mac Mini kernel-panicked twice. Turned out MCP servers were eating 1.5 GB at idle, leaving no headroom for anything else. So I built a process supervisor
tl;dr (Claude caveman edition): MCP servers sit around doing nothing, eat 1.5 GB. Machine angry. Machine crash. I make tool. Tool only run server when you use it. Server stop when you leave. 16 MB when idle. Go binary. Free. https://github.com/surgifai-com/mcprt -- I've been working on my project, Surgifai, after work. It's in stealth, but building it means running a bunch of MCP servers on a Mac Mini M2 with 16 GB - embeddings server, code RAG, Chrome DevTools, a couple others. All via launchd, all 24/7. The machine kernel-panicked twice during a Next.js build. I assumed it was the build itself, but a process audit told a different story. Chrome DevTools MCP had somehow spawned duplicate instances - two server processes, two npm parents, two node watchdogs - 1.2 GB for one tool. Vault-mcp, code RAG server, colab-mcp, LiteLLM, the Claude session itself. Nearly 3 GB of resident memory before the build even started. On unified memory that's competing directly with GPU allocation. The build needed burst memory on a machine that had none left to give. Stopping the MCP services eliminated the panics. They were the easiest ~1.5 GB to reclaim without losing anything I was actively using. But now I had no MCP servers. I looked at what existed. mcp-on-demand does manual start/stop via CLI commands - it's solving context window token pollution, not memory. mcp-hub keeps everything running and connected. microsoft/mcp-gateway is Kubernetes + Redis + Azure. Nobody had a tool that just... watches whether a client is connected, and only runs the server while it is. So I built mcprt. It's a reverse proxy that uses connection refcounting instead of timeouts. It watches SSE streams and session headers from the Streamable HTTP transport. First client connects to a server's route, mcprt spawns the upstream process. Last client disconnects, it stops the process after a 5-second grace period. A server can sit silent for an hour mid-session and mcprt won't touch it - the SSE stream is still open. Refcount ≥ 1 = alive. Refcount 0 for 5s = stop. Why not idle-timeout? Because it fails in both directions. Too aggressive and you kill a server mid-reasoning. Too lax and you barely save memory. A server being silent and a session being over are different things. Only connection close is the reliable signal. Idle footprint for the mcprt daemon: 16.6 MB. At peak concurrent load across 4 servers the daemon grew by less than 1 MB - all the memory is in the child processes, fully reclaimed when they exit. Cold start is ~500ms-800ms. That's the tradeoff. I've been running it daily while building Surgifai and honestly don't notice it - there's always a beat before the first tool call anyway. One other thing - mcprt refuses STDIO transport at the config level. Hard validator error, not a toggle. After the OX Security disclosure in April (14 CVEs, 200K+ server deployments affected), I don't think STDIO MCPs should be normalized anymore. Every npx u/modelcontextprotocol/server-whatever in your mcp.json runs with your full user context. mcprt catches those patterns before any process spawns. And the duplicate Chrome DevTools instances? That's the kind of silent failure STDIO transport makes easy and invisible. Single Go binary. Apache 2.0. One TOML config file. Works with Claude Code, Cline, Continue - anything that speaks Streamable HTTP. It lives under the Surgifai org on GitHub because I use it as part of my stack, but I'm open-sourcing it because the problem isn't specific to what I'm building. If you're running multiple MCP servers on a resource-constrained machine, it might save you some grief. GitHub: https://github.com/surgifai-com/mcprt Happy to answer questions about the architecture or the STDIO stance - this is my fork of Anthropic's mcp-builder if you want to dig into it. https://github.com/victorqnguyen/skills/tree/main/skills/mcp-builder submitted by /u/winwinwinguyen [link] [comments]
View originalclaudely: launch Claude Code against Local LLM provider like LM Studio / Ollama / llama.cpp without trashing your real claude config
Plenty of CLI coding agents will talk to a local LLM, but the catch is the ecosystem. Skills, slash commands, MCP servers, plugins, hooks: all the interesting tooling has been built specifically for Claude Code, and parity on every other agent is patchy at best. Trying to reuse a Claude-shaped workflow on a different agent quickly turns into "rewrite all the plugins" or "do without." claudely skips that fight. You keep Claude Code as the client (and its whole plugin / skill / MCP ecosystem with it), and just point it at a model running on your own hardware. Pick a provider, claudely spawns `claude` with the right base URL, auth, and cache fix wired up for that one session. Your shell and the regular `claude` command stay untouched, so you can flip between local and the real Anthropic API without thinking about it. It also quietly fixes a prompt-cache bug that otherwise tanks local-model speed by ~90%, and handles the per-provider env-var differences for you. Works with LM Studio, Ollama, llama.cpp, or any Anthropic-compatible endpoint (point it at a litellm or claude-code-router proxy for OpenAI-protocol backends like vLLM). npm i -g claudely claudely # LM Studio, picker over your downloaded models claudely -p ollama -m gpt-oss:20b # Ollama, skip the picker claudely -p llamacpp # whichever GGUF llama-server is serving MIT, Node 20+, unaffiliated community helper. Built with Claude Code's help, fittingly. Feedback welcome. Repo: https://github.com/mforce/claudely NPM: https://www.npmjs.com/package/claudely submitted by /u/mforce22 [link] [comments]
View originalLLM proxy that lets Claude Code talk to any model
I built rosetta-llm — an open-source multi-format LLM proxy that acts as a drop-in Claude Code gateway. Works as a Claude Code LLM gateway — set `ANTHROPIC_BASE_URL` and all configured models appear in `/model` picker Translates between formats — Anthropic Messages ↔ OpenAI Chat ↔ OpenAI Responses at the wire level Thinking blocks round-trip correctly — this is the hard part and why I built this Provider routing — `openai/gpt-5.4`, `anthropic/claude-opus-4-7`, `groq/llama-4` all through one endpoint Streaming on everything — passthrough fast path + cross-format translation with proper SSE handling The thinking-block problem Most proxies lose reasoning continuity. LiteLLM has had open PRs for thinking block handling for a long time — some dating back months — and they're still not merged. Without proper round-tripping, prompt caching breaks across turns and Claude Code loses context. Rosetta encodes encrypted reasoning into Anthropic's `signature` field and decodes it back — so multi-turn agentic workflows keep their prompt-cache hits. Zero-setup Hugging Face Space Literally a two-line Dockerfile: FROM ghcr.io/lokesh-chimakurthi/rosetta-llm:latest COPY --chown=app:app config.json /app/config.json Add config.json file and above Dockerfile into a HF Space (Docker SDK) and it's running. No clone, no build, no venv. The GHCR image has everything baked in. Make your HF space private and add api keys in hf space secrets. Check readme in github Also works with # No install — ephemeral uvx rosetta-llm # Persistent install uv tool install rosetta-llm rosetta-llm --config ~/.rosetta-llm/config.json # Docker docker run -p 7860:7860 \ -v ~/.rosetta-llm/config.json:/app/config.json \ ghcr.io/lokesh-chimakurthi/rosetta-llm:main Why another proxy? I looked at existing solutions: LiteLLM — thinking block round-trip PRs going nowhere, too many abstractions OpenRouter — great but closed-source, no self-hosting Direct passthrough proxies — don't translate between formats Nothing gave me lossless cross-format translation with proper reasoning fidelity. Links GitHub: https://github.com/Lokesh-Chimakurthi/rosetta-llm PyPI: https://pypi.org/project/rosetta-llm/ Contributions welcome I built this for myself and it works for my use cases. But there's a lot more it could do — better multimodal handling, embeddings support, rate limiting, an admin UI. If any of this sounds interesting, PRs are absolutely welcome. Happy to answer questions in the comments. submitted by /u/DataNebula [link] [comments]
View original1M context beta retired yesterday on Sonnet 4.5 / 4. Here's the actual fix if you missed it.
In case you missed the email or woke up to a spike in 400 errors, the context-1m-2025-08-07 beta header officially stopped working for Sonnet 4.5 and Sonnet 4 as of midnight UTC yesterday. Anything over 200K tokens returns 400 after midnight UTC. The migration is simple but not zero-effort: Swap to claude-sonnet-4-6 (1M is GA there, no header needed) Drop the beta header from your requests The long-context surcharge is gone too. Anthropic killed the 2x premium back in March. If you haven't updated yet, here is likely why you're seeing failures: If your code branches on the beta header (if context > 200K, send beta), that branch silently drops the 1M ask after today. No error, just a 400 on the first long prompt. Long-running chat sessions where cumulative history grew past 200K. Those start erroring on the next call. Agents with verbose tool-call histories. Tool outputs accumulate faster than you'd expect, especially with reflection steps. If you are running a gateway, now is the time to audit your per-model context limits. Bifrost (github.com/maximhq/bifrost) and LiteLLM both let you set hard caps per model so you get a clean error at the proxy instead of a surprise 400 from Anthropic. Bottom line is if you have production traffic failing right now, the model string change is your #1 priority. submitted by /u/Character-File-6003 [link] [comments]
View originalBuilt + open sourced anti-slopsquatting CLI
TL;DR: built an open source CLI that scans your repository's manifest (package.json, requirements.txt, go.mod) files for indicators of slopsquatting or other supply chain attack indicators. Repo: https://github.com/zhendahu/dep-doctor There's been a ton of supply chain attacks recently (Axios, LiteLLM, Trivy to name a few) and attackers don't seem like they're slowing down - PyTorch Lightning just got hit with one today. AI coding makes us increasingly susceptible to such attacks because of a couple reasons: 1. We get lazy and don't review command line output warnings when our agent installs like 47 different packages at once 2. AI agents can hallucinate package names that sound correct (e.g. it might try to pip install lightllm instead of litellm). Number 2 in particular opens up opportunity for a new kind of attack called "slopsquatting", where bad actors intentionally register malicious packages that sound similar to legitimate, widely used ones. I'm hoping this Rust CLI that I built and open-sourced can help make developers less susceptible to these kinds of attacks. It scans manifest files (currently package.json, requirements.txt, and go.mod) and for each dependency, queries the respective registry (e.g. PyPi for Python, npm for Javascript) for package metadata. It then evaluates the metadata against a list of heuristic checks for existence, newness, number of downloads, most recent maintenance, or version drift. It finally queries the OSV API for that package name and version. It'll surface warnings and how to remediate as necessary. Feel free to use, share, contribute, make fun of, report, or whatever your heart desires :) Not asking for anything in return, hoping this can be helpful to as many as possible. Thanks for reading! submitted by /u/doomkaiser21 [link] [comments]
View originalW2A: an open protocol for agent sensors — giving local agents real-time perception
Sharing a project that just went public: World2Agent (W2A) — an open protocol for the perception side of the loop. Entirely self-hostable, no SaaS, no telemetry, TS SDK, Apache 2.0. The gap it's filling: every local agent setup I've built ends up with a pile of one-off scripts and cron jobs shoving events into the context window in slightly different shapes. Each one parses a different API, each one emits a different JSON shape, each one breaks when I swap agent frameworks. W2A standardizes that layer. What I find fascinating: We spent 2024–2025 teaching agents how to understand context (RAG, long context, memory). We spent 2025–2026 teaching them how to act (MCP, skills, tools). W2A is the first serious attempt at the third leg: teaching them to perceive. Skills give agents capabilities. W2A gives them perception. With only two of the three, you get a very smart intern who needs to be told everything. With all three, you get something that actually works autonomously. The design choice I liked most is that the protocol itself has no routing or priority logic — a sensor just emits, and the consumer (your agent) decides what matters. Keeps sensors simple and reusable. Same signal can feed a Claude Code agent, a Slack bot, and a dashboard with zero changes. The fastest way to feel W2A is with Claude Code. In an active session, install the world2agent plugin: /plugin marketplace add machinepulse-ai/world2agent-plugins /plugin install world2agent@world2agent-plugins /reload-plugins Add a sensor — for example, Hacker News: /world2agent:sensor-add @world2agent/sensor-hackernews Restart Claude Code with the plugin channel loaded so sensor signals flow into your session: claude --dangerously-load-development-channels plugin:world2agent@world2agent-plugins Pair it with any local agent runtime (Ollama + a small orchestrator, LiteLLM, whatever). I've been running it with a local 70B and it handles the summary-only fast path fine; only drops to full raw when the summary isn't enough. Write your own sensor in ~50 lines (defineSensor + createSignal + a setInterval or webhook, emit, done). There's a working Slack sensor in the repo as a reference. Repo: https://github.com/machinepulse-ai/world2agent#quick-start License: Apache 2.0SDK: TypeScript (Python SDK is on the roadmap — PRs welcome) submitted by /u/Specialist_Dot_2626 [link] [comments]
View originalTwo months of coding with Claude code
My background started in sales, moved to product/tech about ten years ago culminating in my role as chief product officer at a large debt relief company. Today, around 7:30 am, after my fourth all nighter in a row I released a product (in stealth no heavy marketing yet) after two months of deep work with over 1,000 commits and a lot of sleepless nights. I used VS code, with ClaudeCode. Mostly opus high effort. Lots of CLI, no MCP - huge win - read about so many issues with MCP and it was never a thing. Built on/with railway, supabase, voyage AI, pinecone, resend, grafana, multi-AI provider with custom fallback (almost used liteLLM, and chose custom days before their incident), cloudflare for dns/R2/zerotrust, sentry (incredible tool - major part of how I shipped as much as I did as quickly as I did), redis upstash, bullMQ, Unsplash, stripe, huskyCI, Semgrep, and probably a few more I am missing. - Is it going to sell? I don’t know. - Is it technically capable and unique? I think so - Am I super proud of myself? Hell yes. - Are there bugs? You tell me, typically squash then in staging environment with help of sentry, but something may have gotten past me certainly! - What does it do? Convert web visitors to leads with custom agents, in under 5 minutes. Roast me, or give me some feedback! www.wengrow.app Moment that stand out: - The velocity in general - Shipping enterprise level SSO (supabase auth) in a few hours - Rapid CRO optimization of onboarding flow. having done this work before leading large engineering and product teams the work I did in 24 hours would have taken a cross functional team of 5 weeks at a minimum. - Cookie consent management. Having previously spent months at prior job trying to do CCM right with a paid tool, I was able to set up a compliant CCM process on www in hours with c15t including audit logs sent to my Supabase DB, and proper handing of California nuances. - so much more but I need to catch up on some sleep submitted by /u/berrism [link] [comments]
View originaleuclid :The open source AI math tutor.
I built an open-source ALEKS alternative that actually proves you understand math. Four AI agents that find what you know, decide what you're ready for, teach through Socratic dialogue, and verify real understanding. Grades 1–12. Runs locally. What it does: - Diagnoses what you actually know (Knowledge Space Theory) - Only teaches what you're ready for - Uses Socratic dialogue (no answer dumping) - Verifies real understanding before moving on How it works: - 4-agent system (diagnosis, planning, teaching, evaluation) - Knowledge graph of ~60 math concepts (grade 1 → calculus) - Tracks progress locally (~/.euclid/state.db) - No data leaves your machine (except LLM calls) Built with: - LangGraph (agent orchestration) - LiteLLM (plug any model) Example flow: User: "I don’t understand fractions" → system detects missing prerequisite: division → starts guided questions instead of explaining → unlocks fractions only after mastery Looking for feedback: - Is this actually useful vs ALEKS? - What would you add/remove? - Would you use it locally? GitHub: https://github.com/Tarek-new/euclid https://preview.redd.it/htmocuminbwg1.png?width=900&format=png&auto=webp&s=8f21d0cb3d26c5749e626b9299f8a1dfcf6e3bbc submitted by /u/john-fransis [link] [comments]
View originalGemma 4 actually running usable on an Android phone (not llama.cpp)
I wanted a real local assistant on my phone, not a demo. First tried the usual llama.cpp in Termux — Gemma 4 was 2–3 tok/s and the phone was on fire. Then I switched to Google’s LiteRT setup, got Gemma 4 running smoothly, and wired it into an agent stack running in Termux. Now one Android phone is: running the LLM locally automating its own apps via ADB staying offline if I want Happy to share details + code and hear what else you’d build on top of this. https://preview.redd.it/7vkbrlzfryvg1.jpg?width=3024&format=pjpg&auto=webp&s=25455827ddf9715b4159ce64a18deba812cf0f5f submitted by /u/GeeekyMD [link] [comments]
View originalMercor says it was hit by cyberattack tied to compromise of open-source LiteLLM project
The AI recruiting startup confirmed a security incident after an extortion hacking crew took credit for stealing data from the company's systems.
View originalPopular AI gateway startup LiteLLM ditches controversial startup Delve
LiteLLM had obtained two security compliance certifications via Delve and fell victim to some horrific credential-stealing malware last week.
View originalRepository Audit Available
Deep analysis of BerriAI/litellm — architecture, costs, security, dependencies & more
Yes, LiteLLM offers a free tier. Pricing found: $0, $0
Key features include: Enterprise, Pass-through Endpoints, Logging, Alerting/Monitoring, Authentication, CRUD Endpoints + UI, Control Model Access, Admin UI.
LiteLLM is commonly used for: Providing LLM access to multiple developers, Managing multiple LLM models efficiently, Tracking spend by model and user, Implementing rate limits by key or user, Using virtual keys for authentication, Migrating existing projects to the proxy.
LiteLLM integrates with: OpenAI, Langfuse, Arize Phoenix, Langsmith, OTEL Logging, Slack, Discord, Teams, Email, Webhook.
LiteLLM has a public GitHub repository with 41,659 stars.
Based on user reviews and social mentions, the most common pain points are: llm.
Based on 22 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.