The Groq LPU delivers inference with the speed and cost developers need.
Groq is praised for its fast computing capabilities and cost optimization, making it an attractive choice for projects requiring efficient processing. However, specific user reviews are scarce; the limited mentions highlight its use in varied AI applications but lack detailed insights into user satisfaction or complaints. Pricing sentiment isn't directly addressed, but the focus on cost savings suggests a favorable view. Overall, Groq seems to possess a solid reputation for performance, with potential for further user engagement as more detailed feedback surfaces.
Mentions (30d)
9
Reviews
0
Platforms
3
Sentiment
20%
6 positive
Groq is praised for its fast computing capabilities and cost optimization, making it an attractive choice for projects requiring efficient processing. However, specific user reviews are scarce; the limited mentions highlight its use in varied AI applications but lack detailed insights into user satisfaction or complaints. Pricing sentiment isn't directly addressed, but the focus on cost savings suggests a favorable view. Overall, Groq seems to possess a solid reputation for performance, with potential for further user engagement as more detailed feedback surfaces.
Features
Use Cases
Industry
semiconductors
Employees
350
Funding Stage
Venture (Round not Specified)
Total Funding
$3.3B
Show HN: Beta-Claw – I built an AI agent runtime that cuts token costs by 44%
I built Beta-Claw during a competition and kept pushing it after because I genuinely think the token waste problem in AI agents is underrated.<p>The core idea: most agent runtimes serialize everything as JSON. JSON is great for humans but terrible for tokens. So I built TOON (Token-Oriented Object Notation) — same structure, 28–44% fewer tokens. At scale that's millions of tokens saved per day.<p>What else it does: → Routes across 12 providers (Anthropic, OpenAI, Groq, Ollama, DeepSeek, OpenRouter + more) → 4-tier smart model routing — picks the cheapest model that can handle the task → Multi-agent DAG: Planner → Research → Execution → Memory → Composer → Encrypted vault (AES-256-GCM), never stores secrets in plaintext → Prompt injection defense + PII redaction built in → 19 hot-swappable skills, < 60ms reload → Full benchmark suite included — 9ms dry-run pipeline latency<p>It's CLI-first, TypeScript, runs on Linux/Mac/WSL2.<p>Repo: <a href="https://github.com/Rawknee-69/Beta-Claw" rel="nofollow">https://github.com/Rawknee-69/Beta-Claw</a><p>Still rough in places but the core is solid. Brutal feedback welcome.
View originalPricing found: $0.075, $1, $0.30, $1, $0.075
I built my own GTA 6 (but it's 2d pixelart and 100% AI) with Claude
Working on a fully AI native online game similar to gta online but in habbo hotel style and all content is live AI generated! Players can create own characters, weapons, buildings in the shared universe and raid others players homes! About the tech & how Claude helped: I use different AI apis like OpenAI, groq, gemini to generate the live in-game sprites. For the actual game development, I primarily used Claude and Claude Code (alongside Unity and Cursor). Claude wrote the core C# game logic, helped structure the multiplayer networking, and integrated the various AI APIs into the game engine. If you are interested, you can join the discord to try the completely free first demo: https://discord.gg/BFqQZHhkv6 submitted by /u/SneakerHunterDev [link] [comments]
View originalI built gta online but in 2d and everything is AI-native
I’ve been building a multiplayer 2D pixel-art sandbox game using Unity + Claude Code. The idea is basically “GTA Online meets Habbo Hotel,” except almost everything in the world is generated dynamically with AI: - buildings - characters - weapons - animations - item sprites Players earn gold in different ways, build bases/businesses, and can raid other players for resources. Claude Code has been especially useful for: - generating Unity systems and gameplay scripts - refactoring networking/game-state logic - debugging procedural generation issues - helping structure the AI content pipeline - rapidly iterating on UI/gameplay ideas For asset generation I’m currently using APIs from OpenAI, Gemini, and Groq. The game is still early/in-progress, but you can try it here: https://discord.gg/w24aaRpfsV Would love feedback from other people building AI-assisted games. submitted by /u/SneakerHunterDev [link] [comments]
View original[P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]
I fine-tuned Qwen2.5-1.5B for multi-class CEFR English proficiency classification using QLoRA (4-bit NF4). The goal was to classify English text into one of the 6 CEFR levels (A1 → C2), which can be useful for: adaptive language learning systems, placement testing, readability estimation, educational NLP applications. Dataset The dataset contains 1,785 English texts balanced across: 6 CEFR levels, 10 domains/topics. The samples were synthetically generated using: Groq API Llama-3.3-70B Generation constraints were designed to preserve: vocabulary complexity, grammatical progression, sentence structure variation, CEFR-specific linguistic patterns. Training Setup Base model: Qwen2.5-1.5B Fine-tuning method: QLoRA 4-bit NF4 quantization LoRA adapters Only ~0.28% of model parameters were trained. Results Held-out test set: 179 samples Metrics: Accuracy: 84.9% Macro F1: 84.9% Per-level recall: Level Recall A1 96.6% A2 90.0% B1 90.0% B2 86.7% C1 86.7% C2 60.0% Most errors come from C1/C2 confusion, which is expected due to the subtle linguistic boundary between those levels. Deployment I also built: a FastAPI inference API, Docker deployment setup. Example Usage from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch model = AutoModelForSequenceClassification.from_pretrained( "yanou16/cefr-english-classifier" ) tokenizer = AutoTokenizer.from_pretrained( "yanou16/cefr-english-classifier" ) text = "Artificial intelligence is transforming many industries." inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) pred = outputs.logits.argmax(dim=-1).item() print(pred) Feedback is welcome, especially regarding: evaluation methodology, synthetic data quality, improving C2 classification performance, better benchmarking approaches. submitted by /u/Professional-Pie6704 [link] [comments]
View originalLLM proxy that lets Claude Code talk to any model
I built rosetta-llm — an open-source multi-format LLM proxy that acts as a drop-in Claude Code gateway. Works as a Claude Code LLM gateway — set `ANTHROPIC_BASE_URL` and all configured models appear in `/model` picker Translates between formats — Anthropic Messages ↔ OpenAI Chat ↔ OpenAI Responses at the wire level Thinking blocks round-trip correctly — this is the hard part and why I built this Provider routing — `openai/gpt-5.4`, `anthropic/claude-opus-4-7`, `groq/llama-4` all through one endpoint Streaming on everything — passthrough fast path + cross-format translation with proper SSE handling The thinking-block problem Most proxies lose reasoning continuity. LiteLLM has had open PRs for thinking block handling for a long time — some dating back months — and they're still not merged. Without proper round-tripping, prompt caching breaks across turns and Claude Code loses context. Rosetta encodes encrypted reasoning into Anthropic's `signature` field and decodes it back — so multi-turn agentic workflows keep their prompt-cache hits. Zero-setup Hugging Face Space Literally a two-line Dockerfile: FROM ghcr.io/lokesh-chimakurthi/rosetta-llm:latest COPY --chown=app:app config.json /app/config.json Add config.json file and above Dockerfile into a HF Space (Docker SDK) and it's running. No clone, no build, no venv. The GHCR image has everything baked in. Make your HF space private and add api keys in hf space secrets. Check readme in github Also works with # No install — ephemeral uvx rosetta-llm # Persistent install uv tool install rosetta-llm rosetta-llm --config ~/.rosetta-llm/config.json # Docker docker run -p 7860:7860 \ -v ~/.rosetta-llm/config.json:/app/config.json \ ghcr.io/lokesh-chimakurthi/rosetta-llm:main Why another proxy? I looked at existing solutions: LiteLLM — thinking block round-trip PRs going nowhere, too many abstractions OpenRouter — great but closed-source, no self-hosting Direct passthrough proxies — don't translate between formats Nothing gave me lossless cross-format translation with proper reasoning fidelity. Links GitHub: https://github.com/Lokesh-Chimakurthi/rosetta-llm PyPI: https://pypi.org/project/rosetta-llm/ Contributions welcome I built this for myself and it works for my use cases. But there's a lot more it could do — better multimodal handling, embeddings support, rate limiting, an admin UI. If any of this sounds interesting, PRs are absolutely welcome. Happy to answer questions in the comments. submitted by /u/DataNebula [link] [comments]
View originalIDK why the chat-apps don't have this thing!!
I shipped a side project: QuotePin, an AI chat app with inline annotations to reduce "clarification clutter." The problem: In ChatGPT/Claude-style chats, small follow-ups ("define X", "what does this sentence imply?", "what is Y?") become full messages. After a while, the conversation is 60% main thread and 40% you going "sorry, one more quick question." It's basically a support ticket at that point. What QuotePin does instead: you select a word or phrase in an AI response, ask your question in a pop-up, and the answer is saved as an annotation attached to the original context. Think Wikipedia-style reading, where the main flow stays readable, and you only expand details where needed, instead of derailing the whole thread because you didn't know what "idempotent" meant. Features: Inline annotate: select text → ask → saved badge on the message Optional "reply in chat" for larger follow-ups that actually deserve to exist Conversation graph view for overview/sharing Bookmarks. This came from a specific pain point: I'd ask the AI to give me a list of questions, reply with my doubts for each one, and by the time I was done, the original question list had scrolled so far up I had to hunt for it every time. Bookmarks let you pin that message and jump back instantly. Multi-provider support (OpenAI/Anthropic/Gemini/Groq/Qwen) using your own API key No paid API key? Groq has a free tier that works great for this. Get started in 30 seconds: Go to console.groq.com and grab a free API key Open QuotePin and head to Settings Select Groq as your provider Paste your key and you're good to go I'm not a product/UX person (I live in the low-level systems part of the brain where there are no users, only registers). So I'd genuinely love feedback, especially on the annotation UX and what would make it useful in real workflows, not just in my head. Live: https://quotepin.vercel.app/ Repo: https://github.com/aayuxh-vim/QuotePin submitted by /u/Chessislove [link] [comments]
View originalI built a router that automatically sends your AI tasks to the most appropriate model to handle them at low cost - 9,200 tasks in, $21 saved at $0.14 actual cost
The observation that started this: most of what people use AI for every day - summarising, drafting, classifying, extracting etc doesn't actually require a frontier model. Any competent 8-70B model handles those just as well. But most people run everything through Claude or ChatGPT out of habit. I built Followloop (followloop.app) to solve this automatically. It classifies each task by complexity and routes it: - Simple tasks → Cerebras Llama (2000 TPS, 1M tokens/day free), Groq, Gemini Flash - Moderate tasks → Groq 70B, SambaNova - Complex tasks → Claude Haiku as fallback The dashboard shows your actual cost alongside what you'd have paid running everything on Claude Sonnet. I've been running it on my own AI workflow for two weeks: 9,200 tasks routed, $21.24 saved, $0.1360 actual cost. About 157× cheaper per token than Sonnet on average. Works with any AI setup via MCP (Model Context Protocol) - Claude Desktop, Cursor, Claude Code, or anything MCP-compatible. Also has a library of 1,300+ safety-screened MCP servers as a bonus feature. $5/month at followloop.app submitted by /u/QueefLatinahOG [link] [comments]
View originalSomeone just open-sourced a hedge fund
submitted by /u/YogurtWild [link] [comments]
View originalLessons from building a coding agent for 8k context windows: token budgeting, parallel executors, and per-file isolation
Most AI coding tools (Cursor, Aider, Claude Code) assume you have a 200k-token model. If you're running local LLMs through Ollama or LM Studio, or hitting free-tier cloud APIs like Groq or OpenRouter, you've got around 8k tokens to work with. That doesn't fit a whole project, barely fits a single large file. I spent the last few weeks building a CLI coding agent that's designed around the 8k constraint instead of fighting it. Wanted to share what I learned, because some of it surprised me. The core insight: the LLM never needs to see your whole project. Most agents try to stuff as much context as possible into a single call. With 8k tokens that's a non-starter. The approach that worked for me is splitting the work into roles: A planner call that only sees a lightweight project map (Markdown summaries of each folder, ~300-500 tokens for the whole project) plus the user's request, and outputs a task list. Executor calls that each see exactly one file plus one task. Never two files in the same call. An orchestrator that's pure code, absolutely no LLM, building a dependency graph between tasks and deciding what runs in parallel vs sequential. This split means the LLM only ever reasons about a small, bounded amount of code at any one time. The planner doesn't need to see code at all (just file summaries), and the executor only sees one file. Multi-file refactors stop being a context-window problem and become a scheduling problem. Token budgeting has to be enforced in code, not promised in a prompt. Every LLM call goes through a canFit() check that measures: system prompt + reserved output tokens + memory + actual code. If the code doesn't fit, the agent automatically falls back to a per-file line index (generated once for files over ~150 lines) and pulls only the relevant section. Concrete budget math for 8192 tokens: System prompt + instructions: ~1000 Reserved for response: ~2000 Short-term memory (4 entries): ~360 Available for actual code: ~4800 (about 140-190 lines) Parallel execution is the speed multiplier that makes 8k usable. Because each executor sees only one file, independent edits across files can run simultaneously. A 5-file refactor that would be slow if run sequentially completes in roughly the time of the longest single edit. The dependency graph (built in pure code from the planner's task list) decides which tasks have to wait for which. A few things that tripped me up along the way: Question-style requests overwriting files. The first version had no concept of read-only operations, so asking "how many lines does X have?" caused the executor to write the answer into the file. Fixed by adding an action_type: "query" field to the planner's output that routes through a separate code path that never touches disk. Stale project maps causing silent misroutes. If the user named a file in their request that wasn't in the context map (because they just renamed it, or hadn't refreshed), the planner would silently route the action to the closest match. Now the orchestrator validates that mentioned file paths actually exist on disk and throws a clear error if they don't. Markdown fences in executor output. Even when explicitly told not to, smaller models love wrapping code in triple backticks. Strip them in post-processing rather than fighting the prompt. Memory token cost. Initially didn't budget for it; persistent memory is great but it's another ~80-90 tokens per entry that has to come out of the code budget. Now folder context is dropped first when the budget is tight, then memory, before the actual code gets cut. What I'm still figuring out: Whether the planner/executor split scales cleanly to codebases over 50 files. The dependency graph stays manageable, but the project map starts costing real tokens once you have enough folders. Currently dropping folder context first when budget is tight, but that means deeper edits get less context. Curious if anyone else has run into this and how they handle it. Open-sourced the implementation if anyone wants to dig in: https://github.com/razvanneculai/litecode submitted by /u/BestSeaworthiness283 [link] [comments]
View originalBuilt an open-source proxy that saves ~30% on API tokens while keeping response quality — free, looking for beta testers
I've been building **compresh**, an open-source proxy that sits between your app and the OpenAI API. You swap `base_url`, and it optimizes your requests before they hit the API. **Two layers of optimization:** **Rule-based prompt compression** — strips filler words, verbose phrases, redundant instructions. Sub-millisecond, no ML involved. Works in 6 languages. **Conversation-aware context compression** — for multi-turn chats, it builds a semantic understanding of the conversation and replaces older turns with a compact context block. Instead of sending 50 turns of raw history, your model gets the essential context in a fraction of the tokens. **Why not just summarize?** Summarization requires an extra LLM call (cost + latency). Compresh's scoring and compression is deterministic and rule-based. The only ML component is a lightweight tag extraction step, and even that runs on a small model. More importantly: summaries lose corrections. If a user corrects themselves mid-conversation, a summary might keep the wrong version. Compresh explicitly tracks these corrections and preserves them through compression. **Net result:** ~30% token savings on multi-turn conversations, with response quality on par or better than no compression (validated on benchmarks). The model also stays in-context longer because you're using the context window more efficiently. It works with any OpenAI-compatible endpoint — not just OpenAI. Groq, Mistral, local models, anything. Free, open source: github/compresh/compresh Edit: Fixed product name typos. submitted by /u/talatt [link] [comments]
View originalI built a local-first MCP server that gives Claude Code persistent memory, a knowledge graph, and a consent framework — and Claude is just the first client
I've been building this for a couple of years. It started as "what if my AI assistant actually remembered things," and it became something bigger. The short version: I built a local AI infrastructure layer that runs entirely on my machine. No cloud. No exposed ports. My data stays on my hardware. And this week it's finally at a point where I can share it. --- What it is willow-1.7 is a Model Context Protocol server. Claude Code connects to it at session start via stdio — no HTTP, no ports, no supervisor. A direct pipe. Through that connection, Claude gets 44 tools: - Persistent memory — a Postgres knowledge graph (atoms, entities, edges) that survives sessions - Local storage — SQLite per collection, with a full audit trail and soft-delete - Inference routing — local Ollama first, then Groq / Cerebras / SambaNova as free-tier fallback if Ollama is down - Task queue — Claude submits shell tasks to Kart, a worker that polls Postgres and executes them - SAFE authorization — every agent that wants knowledge graph access must present a GPG-signed manifest. No valid signature = access denied. Revoke an agent by deleting its folder. The filesystem is the ACL. - Session handoffs — structured handoff documents written to disk and indexed in Postgres, so the next session can pick up from where the last one ended --- The authorization model This part is unusual enough that it's worth explaining. Each application that wants to access the knowledge graph has a folder on a separate partition (/media/willow/SAFE/Applications/ /). That fo - safe-app-manifest.json — declares permissions and data streams - safe-app-manifest.json.sig — a GPG detached signature of the manifest On every access attempt, the gate checks: folder exists → manifest present → signature present → gpg --verify passes. All four must pass. Any failure → deny + log. No code changes to revoke access. Delete the folder, and that agent is done. I've been running 17 AI professors through this gate for months. Each one has its own signed folder, its own permitted data streams, its own context. None of them can access data outside their declared scope. --- What powers it locally Ollama runs the inference. Currently using qwen2.5:3b as the default. The system routes there first and falls back to free cloud APIs only if Ollama is unavailable. But Claude is just the first client. The MCP server speaks stdio MCP. Any agent that understands the protocol can connect — Gemini, local models, anything. The longer plan: Yggdrasil. A small model trained on the operational patterns this system generates — session handoffs, ratified knowledge atoms, governance logs. When that model is trained, it replaces the cloud fleet entirely. The system becomes fully air-gappable. And after that: an open-source Claude Code equivalent. A terminal AI agent that boots from your local repo, connects to willow via stdio, and has no dependencies you don't control. No telemetry. No cloud account required. Just you and the tools you built. willow-1.7 is the bus everything else rides. The client is just the first thing attached to it. --- Why local-first matters to me I have two daughters. I'm building this so they grow up with tools that help them think instead of thinking for them. That don't own their journals. That don't optimize their attention. That expire when they close the app. The current model is: agree once, we own everything forever. Your notes train our models. Your data lives in our building. Local-first is the other way. Your data lives on your machine. Consent is session-based — the system asks every time, and that permission expires when you're done. If you walk away, it stops. --- The bootstrap There's a separate installer repo, willow-seed, that handles the full setup from scratch — clones the repo, creates the Postgres database, scaffolds the first SAFE agent entry, writes the MCP config. Stdlib only, no dependencies. Consent gates before every action. python seed.py That's it. Tested it this week on a fresh partition. It works. --- Links - willow-1.7: https://github.com/rudi193-cmd/willow-1.7 - willow-seed: https://github.com/rudi193-cmd/willow-seed - SAFE spec: https://github.com/rudi193-cmd/SAFE --- Happy to answer questions. Still building. ΔΣ=42 submitted by /u/BeneficialBig8372 [link] [comments]
View originalSonnet is expensive, so I built a free open-source Sheets agent on Haiku that outperform the same prompt claude/gemini, here is what I learnt.
I live in Google Sheets. Financial models, projections, scenario planning — that's most of my working day. When Claude came out, I was excited. Sonnet genuinely gets financial logic. Growth rates, margin structures, break-even analysis — it's good at this stuff. So I started using it for everything. But the actual workflow was killing me. I'd describe a financial model in Claude.ai. Sonnet would build it in the canvas — with real formulas, which is more than most tools give you. But the canvas is not Google Sheets. You export it, and formulas break on the way over. Formatting disappears. Then you want to change one assumption — say marketing cap from 25% to 20% — and you're back in Claude, re-prompting, re-exporting, checking if everything survived. Each round trip eats Sonnet credits and time. Claude has a Google Sheets extension too. Tried it, hoping it would skip the export pain. It doesn't. The integration doesn't really understand what's in your sheet. It can't build a multi-sheet model step by step, can't coordinate between an Assumptions tab and a Projections tab. It's a chat box sitting in the sidebar. Then I tried Gemini for Sheets. Asked it for a financial plan. Got rough numbers in cells. No formulas. No structure. Just values, like it ran the math once and gave me the answer sheet. So my options were: Sonnet through Claude.ai with the canvas export loop. Claude's Sheets extension that barely integrates. Or Gemini handing me a calculator. I had Claude Code and I'd been watching what Vercel was doing with their AI SDK agent framework. I thought: what if I just build the thing myself, and make it work on Haiku so it doesn't cost a fortune? Here's the part I didn't see coming: Haiku running inside my agent now produces better spreadsheets than the same prompt on Claude.ai with Sonnet. Not because Haiku is smarter. It isn't. But I learned that spreadsheet work is not a text generation problem. It's a stateful execution problem. The model needs to know what it already wrote, where it wrote it, what depends on what, and what's still missing. None of the existing tools give it that. What the agent actually built One prompt: "I'm launching FrostBrew — an artisan cold brew coffee subscription at $29/month. 50 subscribers to start, 15% monthly growth. Build me a complete 12-month financial projection with break-even analysis." The agent planned the layout, then ran 101 steps on its own: Assumptions sheet — 9 editable parameters (price, growth rate, COGS %, marketing budget, OpEx, etc.) P&L Projection — 12 months × 10 metrics, all native formulas referencing Assumptions. Subscribers growing at 15% compound, revenue, COGS at 40%, gross profit, marketing spend with caps, OpEx with growth, EBITDA, margins, cumulative EBITDA Break-Even Analysis — fixed costs, contribution margin, break-even subscribers, break-even month Executive Summary — milestone comparisons (Month 1 vs Month 6 vs Month 12), year-over-year growth, profitability status, strategic narrative 5 charts — subscriber growth, revenue trajectory, EBITDA & cumulative profitability, expense breakdown, margin evolution Professional formatting — currency, percentages, conditional highlighting, section styling Total cost: ~$0.18. One formula needed manual correction out of 101 steps. Change one assumption and the entire 4-sheet model recalculates. That's a spreadsheet, not a screenshot of one. The three things that made it work 1. The Cell Map — show the model what it wrote At first I tried prompt engineering: "remember where you placed the data," "use exact cell references." It helped a little, but different models interpreted the instructions differently. The real fix: after every step, the system builds an explicit map of the spreadsheet state and feeds it back to the model. Sheet: P&L Projection Cols: B=Month 1, C=Month 2, ..., M=Month 12 2| Subscribers : B2=50, C2:M2 formula =ROUND(B2*(1+Assumptions!$B$3)) 3| Revenue : B3:M3 formula =B2*Assumptions!$B$4 4| COGS : B4:M4 formula =B3*Assumptions!$B$5 5| Gross Profit : B5:M5 formula =B3-B4 Sheet: Assumptions (key-value) 2| Subscription Price : B2=$29 3| Starting Subscribers : B3=50 4| Monthly Growth Rate : B4=0.15 5| COGS % of Revenue : B5=0.40 The model sees exactly what exists, where it is, and what's still missing. When it needs to write "=B3*Assumptions!$B$5", it can check that cell B5 on Assumptions holds the COGS rate. No guessing. This was the single biggest improvement. And it works across every model I tested — Haiku, GPT-5.4, Qwen — because it's data, not model-specific prompting. Show the model the truth and it makes better decisions. 2. Formula-first — let the spreadsheet do the math A financial model only needs a few AI judgment calls: starting subscribers, growth rate, price, COGS ratio, base OpEx. Maybe 8 values. Everything else should be native spreadsheet logic. So the agent prefers formulas: '=ROUND(Assumptions!$B$3*(1+Assumptions!$B$4)COLUMN(-2))' for
View originalCurated 550+ free AI tools useful for building projects (LLMs, APIs, local models, RAG, agents)
Over the last few days I was collecting free or low cost AI tools that are actually useful if you want to build stuff, not just try random demos. Most lists I saw were either outdated, full of affiliate links, or just generic tools repeated everywhere, so I tried to make something more practical mainly focused on things developers can actually use. It includes things like free LLM APIs like OpenRouter Groq Gemini etc, local models like Ollama Qwen Llama, coding tools like Cursor Gemini CLI Qwen Code, RAG stack tools like vector DBs embeddings frameworks, agent workflow tools, speech image video APIs, and also some example stack combinations depending on use case. Right now its around 550+ tools and models in total. Still updating it whenever new models or free tiers appear so some info might be outdated already. If there are good tools missing I would really appreciate suggestions, especially newer open weight models or useful infra tools. Repo link https://github.com/ShaikhWarsi/free-ai-tools If you know something useful that should be included just let me know and I will add it. submitted by /u/Axintwo [link] [comments]
View originalSpill It – I built a local, fast speech-to-text app for my 8GB Mac
I've been using Wispr Flow for a while, but it's gotten glitchy over time. So I started this as a weekend project: build something local that just works, built it fully on CC. The constraints shaped the product. I have a 2020 Mac with 8GB RAM, so I was honestly just building this for myself. Whisper V3 was way too slow locally on my hardware. I wanted something fast and snappy, so I went with NVIDIA's Parakeet TDT 0.6B, quantized to 4-bit (about 400MB). It's nearly instant. You release the hotkey and the text is there. I also made an active choice to skip multilingual and go English-only. That gave me the freedom to do serious rule-based post-processing on the STT output. Multilingual would have added complexity I didn't want. For post-processing, I tried local LLMs, even Gemma 4, but everything put too much pressure on memory and slowed things down. Settled on GECToR (a BERT-based tagger, about 250MB), which does decent cleanup: commas, full stops, capitalization. It edits rather than rewrites, which is what I wanted. Context awareness is the part I'm most excited about. The app reads your screen via the accessibility tree (filenames, names, git branches) and adapts formatting to where you're typing. Terminal gets different treatment than email. It's not perfect and it doesn't catch every word in context, but it does a surprisingly good job, especially in the terminal. Honestly, I've mostly been using this to talk to CC, and all the error don't come in the way of CC's comprehension. Local model with some errors works really well for CC use case. But for email and messages, you need more polish, so I added an optional cloud LLM layer (bring your own API key). From everything I've tested, Qwen3 on Cerebras and Llama on Groq perform best and are among the fastest. Based on my usage (about 3,000 words a day), I'm spending about $6 to $7 a month on API costs. A few other things: - Added Silero VAD, which helps a lot with noisy environments. Also helps with whispering that they keep taking about, personally I don't get why one would whisper. I've tested it in cafes speaking directly into the laptop. Does well with longer sentences, falters a bit more with short ones. - There are still occasional hallucinations at sentence boundaries, a stray "yeah" or "okay" that seeps through. Still working on it. Pricing: The local version is fully free. Unlimited, no login, no credit card, just download and go. The cloud LLM polish layer is a small one-time fee, but you bring your own API key. Ping me, will give you a free activation key, only ask please share feedback. I'd love your feedback, especially on the context-awareness approach and whether the local-first plus optional-cloud model makes sense as a product. Download from here: https://tryspillit.com. Would love to hear to the community's feedback. submitted by /u/afinasch [link] [comments]
View originalHad vibe-coded something like "dispatch" long time back, was too lazy all this while but wanted to OS the code
REPO: GITHUB Basically the title. I know there are hundreds of "access claude remote from telegram/whatsapp etc etc" codebases all over the internet, some of them are great. My situation was slightly specific, I preferred using the VScode UI for most things. When I used to commute for work I had a solid 2-2.5 hrs everyday to burn, but I didn't want the usual "remote" access, what I wanted was to access my terminal sitting at home. I have been building local servers etc for a while now and am well versed with tailscale. I simply vibecoded the part where my responses are pushed into the terminal at home via a tailscale pathway. On phone Laptop Anthropic took a while to launch Dispatch: this is something they should have shipped way earlier and way better. Like the concept of controlling your terminal from your phone is not some groundbreaking idea, people have been doing this with SSH for years. Because I tried Dispatch. I see some issues. One guy on the GitHub issues page said he sat through 10+ minutes of permission prompts on basic read commands. There's also a bug where it always spawns with Sonnet regardless of what model you have configured, and you can't change it from mobile. And the whole thing routes through Anthropic's servers. There's a GitHub issue from a Max subscriber where Dispatch was completely dead for 48 hours, support sent him bot replies, issue was marked "resolved" on the status page but still broken. I think they use relay servers but mine just keeps working. because it's tailscale. there's no Anthropic server in the middle to go down. So here's what ping-claude does: Claude finishes something at home, you get a notification on your phone with what it last said. Claude wants to do something destructive, you get approve/deny buttons on your phone. There's also a live activity feed showing every tool call as it happens. not just "Claude is working." you can see Bash running, Edit completing, Grep searching, in real time on your phone. The voice thing is genuinely the feature I use most. Groq Whisper, free tier, transcription in under a second. I just say "do this that" into my phone. The whole thing runs on your machine over tailscale. Nothing goes to any external server except the optional Groq call for voice. Setup is like 5 commands total, open the IP on your phone, add to home screen. Still under dev is the native push notifications, it's a PWA so the tab needs to be open. Expo app is on the list. if you want push notifications right now the Telegram integration works. (Yes it fully runs on a telegram bot) MIT licensed, been using it for months. would genuinely love contributors especially if anyone wants to take a crack anything else in this workflow. (IDK if it will be useful, but yeah) REPO: GITHUB submitted by /u/theRealSachinSpk [link] [comments]
View originalI built an AI content engine that turns one piece of content into posts for 9 platforms — fully automated with n8n
What it does: You give it any input — a blog URL, a YouTube video, raw text, or just a topic — and it generates optimized posts for 9 platforms at once: Instagram, Twitter/X, LinkedIn, Facebook, TikTok, Reddit, Pinterest, Twitter threads, and email newsletters. Each output is tailored to the platform (hashtags for IG, hooks for TikTok, professional tone for LinkedIn, etc.). It also auto-generates images for visual platforms like Instagram, Facebook, and Pinterest,using AI. Other features: - Topic Research — scans Google, Reddit, YouTube, and news sources, then uses an LLM to identify trending subtopics before generating content - Auto-Discover — if you don't even have a topic, it searches what's trending right now (optionally filtered by niche) and picks the hottest one - Cinematic Ad — upload any photo, pick a style (cinematic, luxury, neon, retro, minimal, natural), and Gemini transforms it into a professional-looking ad - Multi-LLM support — works with Mistral, Groq, OpenAI, Anthropic, and Gemini - History — every generation is saved, exportable as CSV The n8n automation (this is where it gets fun): I connected the whole thing to an n8n workflow so it runs on autopilot: 1. Schedule Trigger — fires daily (or whatever frequency) 2. Google Sheets — reads a row with a topic (or "auto" to let AI pick a trending topic) 3. HTTP Request — hits my /api/auto-generate endpoint, which auto-detects the input type (URL, YouTube link, topic, or "auto") and generates everything 4. Code node — parses the response and extracts each platform's content 5. Google Drive — uploads generated images 6. Update Sheets — marks the row as done with status and links The API handles niche filtering too — so if my sheet says the topic is "auto" and the niche column says "AI", it'll specifically find trending AI topics instead of random viral stuff. Error handling: HTTP Request has retry on fail (2 retries), error outputs route to a separate branch that marks the sheet row as "failed" with the error message, and a global error workflow emails me if anything breaks. Tech stack: - FastAPI backend, vanilla JS frontend - Hosted on Railway - Google Gemini for image generation and cinematic ads - HuggingFace FLUX.1 for platform images - SerpAPI + Reddit + YouTube + NewsAPI for research - SQLite for history - n8n for workflow automation It's not perfect yet — rate limits on free tiers are real — but it's been saving me hours every week. Happy to answer questions. https://preview.redd.it/f8d3ogk3nktg1.png?width=888&format=png&auto=webp&s=dcd3d5e90facd54314f40e799b32cab979dae4bf https://preview.redd.it/j8zl07llmktg1.png?width=946&format=png&auto=webp&s=5c78c12a223d6357cccaed59371e97d5fe4787f5 https://preview.redd.it/5cjas6hkmktg1.png?width=891&format=png&auto=webp&s=288c6964061f531af63fb9717652bececfb63072 https://preview.redd.it/k7e89belmktg1.png?width=1057&format=png&auto=webp&s=8b6cb15cfa267d90a697ba03aed848166976d921 https://preview.redd.it/3w3l70tlmktg1.png?width=1794&format=png&auto=webp&s=6de10434f588b1bf16ae02f542afd770eaa23c3f https://preview.redd.it/a40rh1canktg1.png?width=1920&format=png&auto=webp&s=1d2414c7e653a5f01f12a21a43e69bd4fb4b99ed submitted by /u/emprendedorjoven [link] [comments]
View originalYes, Groq offers a free tier. Pricing found: $0.075, $1, $0.30, $1, $0.075
Key features include: javascript, What inference provider are you using or considering using to access models?, Groq Raises $750 Million as Inference Demand Surges, Day Zero Support for OpenAI Open Models, From Speed to Scale: How Groq Is Optimized for MoE Other Large Models, Platform Solutions, Learn, Developers.
Groq is commonly used for: Groq runs the models you care about., Support for LLMs, STT, TTS, and image-to-text models, Popular models on-demand, Industry standard frameworks and integrations, Custom Models, Regional Endpoint Selection.
Groq integrates with: OpenAI, AWS Lambda, Google Cloud, Azure, Kubernetes, Docker, Jupyter Notebooks, GitHub.
Based on user reviews and social mentions, the most common pain points are: token cost, API costs, cost tracking.
2 mentions
Based on 30 social mentions analyzed, 20% of sentiment is positive, 77% neutral, and 3% negative.