Create lifelike speech with our AI voice generator and voice agents platform. Access 5,000+ voices in 70+ languages with secure APIs and SDKs.
ElevenLabs is praised for its advanced AI-driven voice generation capabilities, though users frequently express concerns over its high pricing, with many seeking more affordable alternatives. The social mentions highlight its significant cost of $99 per month, which is seen as steep next to free or cheaper competitors. There is limited user criticism apart from the cost, but the tool appears well-regarded for its functionality, assuming budget constraints are manageable for the consumer. Overall, ElevenLabs has a solid reputation for quality but suffers from negative sentiment regarding its pricing.
Mentions (30d)
0
Reviews
0
Platforms
4
Sentiment
17%
7 positive
ElevenLabs is praised for its advanced AI-driven voice generation capabilities, though users frequently express concerns over its high pricing, with many seeking more affordable alternatives. The social mentions highlight its significant cost of $99 per month, which is seen as steep next to free or cheaper competitors. There is limited user criticism apart from the cost, but the tool appears well-regarded for its functionality, assuming budget constraints are manageable for the consumer. Overall, ElevenLabs has a solid reputation for quality but suffers from negative sentiment regarding its pricing.
Features
Use Cases
Industry
research
Employees
880
Funding Stage
Series D
Total Funding
$1.0B
5 Expensive AI Tools... And Their Free Clones (You won’t believe how much you’re overpaying.) 💸 ChatGPT? $200/month 💸 Midjourney? $60/month 💸 ElevenLabs? $99/month 💸 Aiva? $54/month 💸 Tome? $16
5 Expensive AI Tools... And Their Free Clones (You won’t believe how much you’re overpaying.) 💸 ChatGPT? $200/month 💸 Midjourney? $60/month 💸 ElevenLabs? $99/month 💸 Aiva? $54/month 💸 Tome? $16/month But here’s the twist. Their free alternatives do 80–95% of the job. For $0. 🔥 Research: DeekSeek AI 🎨 Image Generation: Leonardo AI 🎙️ Text-to-Speech: Speechma 🎼 Music Generator: Suno AI 📊 Presentation Builder: Gamma Whether you're a content creator, founder, student, or solo builder 👉 You don't need to burn your wallet to build smart. Save this post so you always know where to find powerful free tools. #AITools #ProductivityTools #FreeAI #NoCode #SoloFounder #Bootstrapping #StartupTips --- Would you like a shortened caption version for TikTok/Instagram reels under 220 characters?
View originalPricing found: $0, $6, $22, $11, $99
Checkout this Explainer Video, Made in under $1 with Claude Design + Eleven Labs
Claude Design can make great animations, but getting to a final video is a bit hard. The audio is missing. Even if you use a TTS model, it does not align. Here is the process I used to get the video above Get Claude to write a good script Feed the script to a Text to Speech (TTS) model to get the audio Feed the audio to a Speech to Text (STT) model to get key timestampes Use the script and the STT output to Claude Design to get a video that's aligned with your audio Use Claude Video export to put it all together into an MP4 with audio The complete breakdown with all prompts is here: https://claudevideoexport.com/blog/how-to-make-professional-explainer-video-under-1-dollar submitted by /u/gnurpreet_ [link] [comments]
View originalI'm Building a Fully-Automated AI-Animated Video Show with Claude
TL;DR: I'm building a pipeline that takes a real prediction market bet from Polymarket or Kalshi (like "Will the U.S. confirm aliens exist?"), writes a script for my two AI characters (who argue about its merits like they're the Siskel and Ebert of prediction markets), generates their voices and talking-head video, creates animated B-roll and text cards, and composites it into an approximately 60-second episode meant for social. All vibecoded with Claude. Cost: ~$2.50 per episode. Some example outputs: Will Jesus Christ return by 2027?https://www.youtube.com/shorts/xMep6S5a7z4 Will the US Government confirm aliens exist? https://youtube.com/shorts/FFU20auHijQ Will Trump buy at least part of Greenland? https://youtube.com/shorts/m8uynMUisF8 Who will be the next James Bond? https://youtube.com/shorts/wmwLvjcz-eI These are all real money bets, if you can believe that. The Show The Sal & Eddie Show. Two characters argue about one prediction market bet per episode. Sal is the handicapper — reads odds like a racing form, names the price, tells you where the smart money is. Eddie is the philosopher and can't believe these markets exist, finds the sublime in the ridiculous. They argue for 60 seconds, vertical format, ready for social. The whole thing runs on my NAS (which is mainly my Plex server) in Docker. 100% automated from choosing the bet to final video output. What Happens When I Push the Button Market Pull (Polymarket/Kalshi APIs) → Editorial Scoring — is it an interesting market? (Claude Sonnet) → Script Generation (5 recursive Claude Opus calls) → Emotion Casting to select character images (1 Opus call) → Visual Creative Direction of script (3 Opus calls) → Dialog recording (5 ElevenLabs calls with word-level timestamps) → Talking Head videos (5 Hedra Character-3 calls) → Visual Asset creation (GPT Image 2 → Veo 3 Fast, also via Hedra API) → Edit Assembly (1 Opus call + Python post-processor) → Final Composite — picture, overlays, captions, subtitles (FFmpeg) Production time: ~15 minutes from pressing the button to final cut, fully automated. Cost: ~$2.50/episode — 90% of that is Hedra credits for talking heads and animation. The 8+ Claude Opus calls that drive every creative decision cost about 15 cents total. ElevenLabs TTS is a nickel. What's Working Recursive script generation. Each "turn" gets its own Opus call with full conversation history. Eddie's reaction to Sal is a "real" reaction, not a pre-planned exchange. Two system prompts with full character bibles for better voice separation. Emotion casting as a blind pass. After scripts are locked, a separate Opus call reads the dialogue with character names stripped and assigns emotional postures from a constrained menu, which selects the correct "emotional pose" to use for Hedra character generation for each turn. Sequential visual creative calls. This produces the inset cutaways — three calls, each seeing previous output: main animation, second animation (sees script + hero), fill-in animation (sees everything). Sequential constraints prevent all three visuals from depicting the same thing. The split between LLM & Python decisions. This was my biggest recent lesson. I had an Opus prompt for edit assembly (placing overlays on the timeline) that kept failing — dead stretches, stacked animations, missing coverage. Every prompt fix pushed something else out of working memory. The fix: let Opus make creative decisions (what text cards to write, where to anchor visuals) and let Python handle mechanical rules (every turn needs an overlay, no back-to-back video assets). Same constraints, but the mechanical ones are deterministic code, not prompt instructions. Still WIP Making the insets funnier. The visual style produces gorgeous editorial illustrations but not always comedy. When the style was more cartoonish, the animations landed as jokes. There's an ongoing tension between visual quality and comedic tone. Overall episode timing. Some turns still run 8-10 seconds of pure talking head before a visual appears. Getting better but not solved. Figuring out what to do with this. Maybe it's a daily video show. Maybe it's an app that lets you get Sal and Eddie to argue over anything you want them to. I already have them giving me a daily briefing on what comics I should and shouldn't buy on eBay. Happy to answer questions about any part of the architecture, but the important thing: I am not a coder at all. This whole thing is vibe-coded with Claude. Built with Claude Opus 4 (creative), Claude Sonnet 4 (editorial), ElevenLabs (TTS), Hedra Character-3 (talking heads), GPT Image 2 (stills), Veo 3 Fast (animation), Grok Video I2V (cinemagraphs), FFmpeg (assembly). Running on a Synology NAS in Docker. submitted by /u/Campfire_Steve [link] [comments]
View originalI ran 100 Claude + Codex sessions in parallel to understand what I'm doing wrong in marketing my open source "Claude Command Center". Here's the playbook they came up with.
A week ago I launched my open-source project (Claude Control Center) on this subreddit. Got 0 upvotes. Dead in 5 hours. :) [The app is awesome - great way to manage multiple sessions and avoid waiting on top of Claude + Codex - try it :) git:amirfish1/ccc . So I spawned 100 Claude + Codex agents in parallel and asked them to figure out what I did wrong (It had two hours left on my weekly Claude limit and 20% left - tried to think of good use :) ) . 30 minutes and 100 artifacts later, they handed me back a playbook. https://reddit.com/link/1tfbxmf/video/0mi1ytksol1h1/player The headline finding: stars don't come from better code. They come from: marketing surface. Tagline, demo GIF, founder credential, hosted landing page, multi-shot Hacker News, awesome-list inclusion. The system found that gap on its own - I never told it to study marketing. 5-min video walking through the 7 findings + what the agents drafted (Show HN body, X thread, LinkedIn post, channel plan): https://youtu.be/Tm2svTe_Ed4 The video itself - is *ON PURPOSE* 100% built by the AI who created the agents [happy to share the skill that builds it]. I brought: - Becky (the narrator) is ElevenLabs Jessica (TTS). - Lip-sync is fal.ai OmniHuman. - Playwright for screenshots. - Slides are HTML rendered via Chrome headless. The whole make_video.py pipeline + the 100-agent spawn script is open if anyone wants it. The interesting thing isn't the video - it's that 100 parallel agents found a non-obvious channel (Anthropic's official plugin registry, which nobody is using) that I would never have spotted myself. https://preview.redd.it/mwvi8t9arl1h1.png?width=3588&format=png&auto=webp&s=ffd8130b52330ffd1470d59c23d656cc29c24b65 https://preview.redd.it/r0w1rnvgrl1h1.png?width=3588&format=png&auto=webp&s=bf086423552102b82fe4dd5931243329bf1c61d0 https://preview.redd.it/tlyv7bgcsl1h1.png?width=2784&format=png&auto=webp&s=08d5810f14f4b3237825f7116fe965483ef0ffdd Happy to share any of the prompts, the scripts, or the marketing package that was generated. submitted by /u/Mediocre-Thing7641 [link] [comments]
View originalModelMeter - A free, open source dashboard to track your costs across Anthropic, OpenAI, Grok, and Elevenlabs
https://preview.redd.it/v8jmbgi8gw0h1.png?width=1075&format=png&auto=webp&s=10cd37118815f27705f647dd75de48f577ae8f94 Like most enthusiasts, I use multiple providers. This also means that I'm constantly mashing the usage buttons on their consoles to see how much usage I have left and make sure I'm not burning through my API budget. I built ModelMeter, a simple dashboard application that tracks usage across multiple providers (Claude Code, Anthropic API, and OpenAI API for now). It runs locally, never phones home (EVER), and your API keys never leave your machine. MIT licensed, full source code on GitHub — and it will always be free and always be open source, no exceptions. If you just want to run it, there's a pre-built binary on the releases page that needs no installer and no admin rights. I'd appreciate any feedback you might have. Star the repo if you like it. GitHub: https://github.com/rupprath/modelmeter Windows executible: https://github.com/rupprath/modelmeter/releases/latest/download/modelmeter.exe Here’s the project; here’s how I made it: Started with an initial requirements document I did in Claude using Opus as the model Lots and lots of revisions to the requirements document (I've been a tech product manager for many years) Massive feature reduction to get something I thought could actually get built, saving the other features for later releases Initially I wanted to support Gemini and Grok, but found that their APIs don't comply yet UI Design done in Claude design Coded with Claude code. I've never coded a single line in Rust before submitted by /u/OmegaNetRob [link] [comments]
View originalI built 9 Claude skills in one session for my solo studio and here is what changed
Spent yesterday building nine skills for the work I do across three SaaS products and a handful of client projects. Sharing what I learned because the leap in productivity surprised me. What a skill is in case you have not built one yet: a folder with a SKILL.md file containing instructions that teach Claude how to handle a specific type of task. The skill auto-triggers when you describe the task naturally. You do not have to call it by name. The nine I built: Video production (FFmpeg scripts, voiceover prompts, social clip extraction) AI visual content (branded graphics, mockups, marketing assets) API documentation (OAuth debugging, integration tracking) Social media automation (cross-platform posting, voice consistency) SEO content strategy (keyword research, content calendars) Support ticketing (email templates in my voice) Product analytics dashboards (real metrics, real queries) Database performance optimization (query rewriting, indexing) Financial modeling (MRR forecasting, scenario planning) The biggest unlock was not the individual skills. It was what happens when they stack. I said "create a demo video for my HR SaaS and show me the analytics impact." Two skills auto-triggered. Got an FFmpeg recording script, an editing manifest, a voiceover draft, AND a dashboard mockup showing what metrics would prove the video drove signups. The thing that took me longest to figure out: Do not write skills as documentation. Write them as instructions to an experienced colleague who is about to start work for you. Include the specifics. My audio devices by name. My brand colors as hex codes. My customers and what I charge them. The words I refuse to use. The way I close emails. The more specific, the better the output. A few that pulled their weight immediately: The support template skill caught its own slip when it accidentally used a word I had banned, flagged it inline, and offered the corrected version The financial model knew my actual MRR, runway, and product roadmap, so the forecast was usable, not generic The video skill defaulted to recommending recording without audio so I could layer ElevenLabs voiceover in post, which is what I actually do Curious if anyone else is using skills heavily yet. What patterns have you found work best for solo or small team work? submitted by /u/Wise-Cardiologist-31 [link] [comments]
View originalWho else knew Claude could make MP3 files? News to me.
submitted by /u/mojorisn45 [link] [comments]
View originalAI voice generation has a workflow problem, not just a quality problem
Most discussion around AI voice tools focuses on model quality. How natural is the voice? How good is cloning? Can it handle emotion? Can it speak multiple languages? Those things matter, but I think the bigger unsolved problem is workflow. Generating one short voice clip is easy now. The hard part starts when someone wants to make something longer: a podcast draft audiobook chapter training module video script ad variation game dialogue scene multi-character narration At that point, the task is no longer just “text to speech.” It becomes orchestration: splitting a script into usable blocks assigning voices to different speakers keeping speaker identity consistent regenerating one bad line without redoing everything handling pauses, reactions, and emotional tags editing timing between lines adding music or SFX under dialogue exporting stems, transcripts, and markers keeping the whole project editable later This feels similar to what happened with image/video generation. The model output matters, but the real product value comes from the surrounding workflow: control, iteration, structure, editing, and reuse. For AI voice, I think the next step is not only “better ElevenLabs-style voices.” It is moving from: text box → generated clip to: script → speakers → voices → takes → timeline → final audio project Curious how people here see this. Do you think generative audio becomes a serious production tool only when it has full project/timeline workflows, or will most people keep using simple clip-based TTS tools? https://murmurtts.com/ submitted by /u/tarunyadav9761 [link] [comments]
View originalevery night after work I start something and it goes till 5 AM in the morning
it's saturday and i finally finished the thing my friends made me build. i give referrals every now and then. for years now the bottleneck has been the same one every single time. friend asks me to refer them somewhere. i say sure, send me your resume. they say "yeah will do tonight". two weeks pass. by the time the resume shows up the role is filled. some never send it at all. they're not lazy, just allergic to opening Word, fighting with margins and choosing the design on a saturday afternoon. so this weekend i fixed the saturday afternoon problem. resumex: clone the repo, open Claude Code, run /start. claude picks one of 100+ templates with you, takes your linkedin or a paste of your old resume, and writes a real one from it. then you talk to it. "tighten that bullet". "make a backend-focused variant". "swap to brutalist-redbar for the design role". Cmd+P → Save as PDF when you're happy. lives on your laptop. may be even push to a git repo. no signup, no SaaS, no monthly fee. MIT licensed. it's yet another resume thing, i know. honestly this is open source warmup. promise the next projects are cooler. but if you've been delaying your resume because the existing tools are gross, just clone it and finish your saturday like a normal person. submitted by /u/karngyan [link] [comments]
View originalList of people at big-tech / professors / researchers who've jumped shit to launch their own AI labs for something Frontier/Foundational/AGI/Superintelligence/WorldModel
Note: gemini deep research -> rearranged/filtered ; valuation numbers likely not accurate but big point is quite mind blowing the number of researchers now with their own >100million/billion dolar values labs in quite a short time with a vague pitch and a maybe demo. Skipped perplexity/cursor/huggingface since they are with utility. Left some just for completion like black forest labs, synthesia, mistral since they have tanginble products. Skipped labs from china since they've been meaningfully killing it with their open source releases ───────────────────────────────────────────────────────── Safe Superintelligence Inc. (SSI) Founders:Ilya Sutskever (former OpenAI Chief Scientist), Daniel Gross, Daniel Levy Location & Founded:Palo Alto, USA & Tel Aviv, Israel | Founded: 2024 Funding / Valuation:$3B raised | Series A Description:Singularly focused on safely developing superintelligent AI that surpasses human capabilities. Deliberately avoids near-term commercial products to concentrate entirely on the technical challenge of safe superintelligence. ───────────────────────────────────────────────────────── Thinking Machine Labs Founders:Mira Murati (former OpenAI CTO), Barrett Zoph et al. Location & Founded:San Francisco, USA | Founded: 2025 Funding / Valuation:$2B seed | $12B valuation Description:Advance AI research and products that are customizable, capable, and safe for broad human-AI collaboration. Focused on frontier multimodal models with a strong safety and interpretability research agenda. ───────────────────────────────────────────────────────── Mistral AI Founders:Arthur Mensch, Guillaume Lample, Timothée Lacroix (former DeepMind & Meta FAIR) Location & Founded:Paris, France | Founded: 2023 Funding / Valuation:~€11.7B valuation | Series C Description:Develops open-weight and proprietary frontier language and multimodal foundation models. Champions openness and efficiency in AI development, with models like Mistral 7B and Mixtral widely adopted in enterprise and research settings. ───────────────────────────────────────────────────────── Advanced Machine Intelligence (AMI) Founders:Yann LeCun (Meta Chief AI Scientist), Alexandre LeBrun, Laurent Solly Location & Founded:Paris, France | Founded: 2026 Funding / Valuation:$3.5B pre-money valuation | Seed Description:Aims to build world-model AI systems capable of reasoning, planning, and operating safely in real-world environments — directly inspired by LeCun's 'world model' thesis as an alternative path to AGI beyond current LLM paradigms. ───────────────────────────────────────────────────────── World Labs Founders:Fei-Fei Li (Stanford AI Lab), Justin Johnson et al. Location & Founded:San Francisco, USA | Founded: 2023 Funding / Valuation:$230M raised | Series D Description:Build AI models that can perceive, generate, reason, and interact with 3D spatial worlds. Focused on large world models (LWMs) that go beyond language and flat images to understand physical space and context. ───────────────────────────────────────────────────────── Eureka Labs Founders:Andrej Karpathy (former Tesla AI Director & OpenAI co-founder) Location & Founded:Tel Aviv, Israel & Kraków, Poland | Founded: 2024 Funding / Valuation:$6.7M seed Description:Creating an AI-native educational platform integrating AI Teaching Assistants to radically scale personalised learning. Envisions a future where an AI teacher can guide anyone through any subject, starting with deep technical topics like neural networks. ───────────────────────────────────────────────────────── H Company Founders:Former DeepMind researchers Location & Founded:Paris, France | Founded: 2023 Funding / Valuation:€175.5M raised Description:Develops AI models to boost worker productivity through advanced agentic capabilities, with a long-term vision of achieving AGI. Focuses on models that can take sequences of actions and interact with digital environments. ───────────────────────────────────────────────────────── Poolside Founders:Jason Warner, Eiso Kant Location & Founded:Paris, France | Founded: 2023 Funding / Valuation:$500M | Series B Description:Building AI agents that autonomously generate production-grade code, framed as a stepping stone toward AGI. Believes that software engineering is a key domain for training and demonstrating general reasoning capabilities. ───────────────────────────────────────────────────────── CuspAI Founders:Max Welling (University of Amsterdam / Microsoft Research), Chad Edwards Location & Founded:Cambridge, UK | Founded: 2024 Funding / Valuation:$130M raised | Series A Description:Accelerating materials discovery using AI foundation models, aiming to power human progress through AI-driven science. Applies large generative models to the design and prediction of novel materials for energy, medicine, and manufacturing. ───────────────────────────────────────────────────────── Inception Founders:Stefano Ermon (Stanford) Locat
View originalThe Beautiful Lie - Teaser
He taught the world to look elsewhere. Then it burned. A celebrated fashion photographer with an eye that shaped how a generation wanted to be seen. His gift became his disguise — turning pain into elegance, shame into style, and ruin into glamour. The Beautiful Lie — what happens when the life behind the image catches fire. Created with: Claude Opus 4.7 | Luma Agents / UNI-1 | Dreamina Seedance 2.0 | Music by ElevenLabs @Dreamina_AI #DreaminaCPP submitted by /u/Salt-Breakfast-4954 [link] [comments]
View originalI built a hands-free voice AI that sends emails mid-conversation — and that's just one feature. Here's everything AskSary can do.
https://reddit.com/link/1symbsj/video/k2no3zfgq1yg1/player Been building AskSary solo for a while. Just shipped hands-free voice email - you're mid-conversation with an AI and you say "send an email to [john@example.com](mailto:john@example.com) subject X body Y" and it pre-fills the Gmail modal automatically. One tap sends. Powered by OpenAI Realtime API, works in 22 languages. But that's just the latest feature. Here's the full picture: Every major model in one place GPT-5-Nano, GPT-5.2, GPT-5.2 Pro, O1 Reasoning, Claude Sonnet 4.6, Grok 4, Gemini 2.5 Flash, Gemini 3.1 Pro, Gemini Ultra, DeepSeek V3, DeepSeek R1 - with smart auto-routing or manual override. Pro-Active Personalisation On every login the AI reads your previous conversations and sends the first message itself - asking if you want to continue or start fresh. Before you type a single word. Persistent Cross-Model Memory Start a conversation with Claude on your phone, open your laptop, switch to GPT-5.2 - it already knows what you discussed. No copy-pasting, no summaries. Just works. Knowledge Base - RAG Upload docs up to 500MB per file, unlimited uploads, chat with them across any model via OpenAI Vector Store. Your files stay in context forever. Integrations Google Drive, Gmail, Google Calendar, Notion - access files, get email and calendar summaries, use them in chat or push them to your Knowledge Base. Generation Tools Image Gen - GPT-Image-1 and Nano Banana Pro Flux Image Editor - full editing suite with visual history Video Studio - Luma Dream, Veo 3.1, Kling 1.6 / 2.6 / 3, up to 10 second AI videos with audio Music Studio - 30 second tracks with custom or AI lyrics via ElevenLabs, visualizer built into chat 3D Model Studio - Meshy with STL export (deploying soon) Video Analysis - upload up to 500MB or paste a YouTube link Developer and Builder Tools Vision to Code - screenshot any UI, get live editable code Web Architect - build full web apps from a single prompt Game Engine - build and prototype games with AI Code Lab - split screen live coding with SQL Architect, Bug Buster, Git Guru, Regex Generator, Test Genie and more Tavily web search across all models Voice and Audio Real-time 2-way voice chat - 8 voices, near-zero latency WebRTC Podcast Mode - two AI voices, switchable, near-zero latency, downloadable as MP3 Voiceover Studio, Voice Notes, Voice Tuner Productivity and Content Slides, Docs and File Tools Pro Writer and Content Library Social Tools - Hook Generator, Video Script, Hashtag Creator, Idea Spark Business Suite - Pitch Deck Builder, Deep Analytics, Legal Eagle, Maths Solver Daily Briefing and Market Watch CV Creator, Email Polisher, Cover Letter Builder, TL;DR Bot Share conversations or snippets with anyone Platform Extras 30+ live interactive wallpapers and themes Custom Agents and Personas Folder organisation and Smart Search across chat history Media Manager Gallery - all your generated content in one place Fully customisable UI in 26 languages with full RTL support The Stack Frontend: Next.js, Capacitor (iOS + Android), Vanilla JS / React Backend: Vercel serverless, Firebase / Firestore, Firebase Admin SDK AI: OpenAI, Anthropic, Google, xAI, DeepSeek Generation: Luma AI, Kling via Replicate, Veo via Replicate, ElevenLabs, Flux via Replicate, Meshy Integrations: Google Drive, Notion, Tavily, OpenAI Vector Store, Stripe, CloudConvert, Sentry Rendering: Mermaid, MathJax Platforms: Web, iOS, Android, Apple Vision Pro What you get free just for creating an account (1,000 credits/month, rolling): Unlimited chat on GPT-5 Nano, Gemini Flash and DeepSeek V3 - no daily limits, zero credit charge 25 image generations via GPT-Image-1 and Nano Banana Pro - 40 credits each 8 image edits via Flux Studio - 80 credits each 2 song generations via ElevenLabs - 350 credits each 2 video generations via Luma Dream and Kling - 350 credits each ~70 messages on Claude Sonnet 4.6, GPT-5.2, Grok 4, Gemini 3.1 Pro and DeepSeek R1 - 15 credits each No credit card required. Built entirely solo. No CS degree, no team, no funding. Started because I asked an AI to build me a chatbot and it failed - so I built my own. Accepted to LEAP 2026 in Saudi Arabia along the way. Happy to answer anything about the build. asksary.com submitted by /u/Beneficial-Cow-7408 [link] [comments]
View originalI built a solo AI platform from Bahrain with no funding, no team and no ad spend - here's what's inside it after 4 months
https://reddit.com/link/1sxotqx/video/xlaqd9i8guxg1/player I'm a self-taught developer, 39 years old, based in Bahrain. Four months ago I started building AskSary - a multi-model AI platform with a persistent memory layer that sits above all the models. The core idea: the model is not the identity. Most AI tools lose your context the moment you switch models. I built the layer that remembers you across all of them. Here's what's shipped so far: Models & Routing Every major model in one place - GPT-5.2, Claude Sonnet 4.6, Grok 4, Gemini 3.1 Pro, DeepSeek R1, O1 Reasoning, Gemini Ultra and more - with smart auto-routing or manual override. Memory & Context Persistent cross-model memory. Start with Claude on your phone, switch to GPT on your laptop - it already knows what you discussed. Proactive personalisation that messages you first on login before you've typed a word. Integrations Google Drive and Notion - connect once, pull files and pages directly into chat or your RAG Knowledge Base. Unlimited uploads up to 500MB per file via OpenAI Vector Store. Video Analysis - Gemini native video understanding for YouTube URL analysis (no download required, processed natively) and direct file upload up to 500MB. Full breakdown of visuals, audio, dialogue, editing style and key moments. Generation Image generation and editing, video studio across Luma, Veo and Kling, music generation via ElevenLabs, video analysis via upload or YouTube URL. Builder Tools Vision to Code, Web Architect, Game Engine, Code Lab with SQL Architect, Bug Buster, Git Guru and more. Tavily web search across all models. Voice & Audio Real-time 2-way voice chat at near-zero latency, AI podcast mode downloadable as MP3, Voiceover, Voice Notes, Voice Tuner. Platform Custom agents, 30+ live interactive themes, smart search, media gallery, folder organisation, full RTL support across 26 languages, iOS and Android apps, Apple Vision Pro. Where it is now 129 countries. Currently at 40 new signups a day. 1080 Signup's so far after 4 weeks or so. MRR just started. Zero ad spend. All of it built solo, one feature at a time, on a balcony in Bahrain. The Stack: Frontend - Next.js, Capacitor (iOS and Android) and Vanilla JS / React Backend - Vercel serverless functions, Firebase / Firestore (database + auth) and Firebase Admin SDK AI Models - OpenAI (GPT, GPT-Image-1), Anthropic (Claude), Google (Gemini), xAI (Grok), DeepSeek Generation APIs - Luma AI (video), Kling via Replicate (video), Veo via Replicate (video), ElevenLabs (music), Flux via Replicate (image editing), Meshy (3D — coming soon) Integrations - Google Drive (OAuth 2.0), Notion (OAuth 2.0), Tavily (web search), OpenAI Vector Store (RAG), Stripe (payments), CloudConvert (document conversion), Sentry (error tracking), Formidable (file handling) Rendering - Mermaid (flow charts) and MathJax Platforms - Web, iOS, Android, Apple Vision Pro (visionOS) Languages - 26 UI languages with full RTL support asksary.com Happy to answer questions on any part of the build - stack, architecture, API cost management, anything. submitted by /u/Beneficial-Cow-7408 [link] [comments]
View originalI got tired of the current ticketing systems, so I (Claude ofc) built a better one for everyone — thank you Claude
WARNING: anecdotal rant incoming. Jira requires a PhD to administer properly, and a second one to figure out why a Story is in the wrong sprint. ServiceNow requires the wealth of a cartel drug lord and a procurement team to even get a quote. Freshservice and Zendesk are fine until you need anything custom, then they fall apart. Most of the rest are form-builders with status fields strapped to a queue. Y'all know what I mean. For the better part of my career — 15+ years in IT — auditing tickets for accuracy (ticket triaging) was just taking up too much time. Tickets where the priority was wrong, the category was blank, the subject line three words and a typo. Then writing reports (this is not the focus of the tool, use something else for better reporting, like powerbi / tableau or w.e.) from that data. Manually. Like it was 2010. So I built my own. It's called BITSM. Multi-tenant IT helpdesk with an AI layer called Atlas baked in from day one — not bolted on. Atlas runs a tool-use loop rather than one-shot completions. It searches the knowledge base, looks up ticket history, writes custom fields, and decides when to hand off to a human. The whole point is to handle the grunt work that fills up support queues — tagging, categorizing, routing, drafting responses, flagging when something looks like a known issue — so the people on the queue can focus on the things that actually need a human. Intake channels: web portal, chat widget, inbound email (Cloudflare Email Worker), SMS, WhatsApp, and a voice agent (Twilio + ElevenLabs). Three-tier escalation — Claude Haiku for frontline, Sonnet for harder problems, human for everything else. BYOK for every external service: Anthropic, OpenAI, Voyage, Resend, Twilio, ElevenLabs, Stripe. Stack is Flask 3.x, React 19, PostgreSQL 16 with pgvector, Redis 7, Docker Compose. Running in production at bitsm.io. Built solo on weekends over the past year — and full transparency: I pair-programmed a huge amount of this with Claude (Anthropic's). I'm a one-person shop and that collaboration is the only reason it shipped at the scope it did. If you're a solo builder hesitating on AI-assisted dev, stop hesitating. License note, because someone will ask: Business Source License 1.1, not open source. Self-hosting for your own team is free. If you're building a hosted or managed service on top of it, that requires a commercial license. Converts to Apache 2.0 in four years. Upfront rather than buried. The repo: https://github.com/NovemberFalls/BITSM Happy to answer questions about the architecture or the AI design. A lot of the Atlas patterns came out of Ed Donner's agentic LLM courses, which I'd recommend to anyone building in this space. submitted by /u/Novaworld7 [link] [comments]
View originalBuilt a multi-model AI platform with real-time WebRTC voice, persistent cross-model memory, and a full generation suite - free account gets 1 min voice/month
https://reddit.com/link/1sutga7/video/ktd3pxcam7xg1/player I've been building AskSary for the past few months - a multi-model AI platform - and just shipped real-time 2-way voice chat powered by OpenAI's WebRTC API. The visualization reacts to your voice in real time: 180 radial frequency bars orbit a glowing orb, 280 particles drift across a full-screen canvas, aurora sweeps and ripple waves emit on voice peaks, and the whole thing color-shifts from cool blue (listening) to warm violet (speaking). Near-zero latency, 8 voice options. Anyone with a free account at asksary.com gets 1 minute of real-time voice every month to try it out - no credit card needed. The platform also has a lot more built around it if you're curious: Models - GPT-5-Nano, GPT-5.2, GPT-5.2 Pro, O1 Reasoning, Claude Sonnet 4.6, Gemini 2.5 Flash, Gemini 3.1 Pro, Gemini Ultra, Grok 4, DeepSeek V3, DeepSeek R1 - with smart auto-routing or manual selection Memory and context - Persistent cross-model memory. Start on mobile with Claude, switch to GPT-5.2 on desktop and it already knows the conversation. Plus proactive personalization: on every login the chatbot reads your previous sessions and opens with a message asking if you want to continue - before you type anything. RAG - Upload docs up to 500 MB each, unlimited uploads, chat with them across any model via OpenAI Vector Store Generation - GPT-Image-1, Nano Banana Pro + Flux editor with visual history, Video Studio (Luma, Veo 3.1, Kling), Music Studio with ElevenLabs and in-chat visualizer, 3D Model Studio with STL export (coming soon) Builder tools - Vision to Code, Web Architect, Game Engine, Code Lab with SQL Architect / Bug Buster / Git Guru and more Voice and audio - Real-time chat, Podcast Mode (two AI voices, downloadable MP3), Voiceover, Voice Notes, Voice Tuner Productivity - Slides, Docs, Pro Writer, Social tools, Business Suite, CV Creator, Daily Briefing, Market Watch Platform - 30+ live wallpapers, Custom Agents, Folder org, Smart search, Media Gallery, 26 languages + RTL, fully customizable UI Happy to answer questions about the WebRTC implementation or anything else. Would love to hear what you think of the voice visualization. submitted by /u/Beneficial-Cow-7408 [link] [comments]
View originalI built real-time 2-way voice chat into my AI platform using OpenAI WebRTC - free to try (1 min/month)
https://reddit.com/link/1sut0jp/video/f7wqfo9zi7xg1/player I've been building AskSary for the past few months - a multi-model AI platform - and just shipped real-time 2-way voice chat powered by OpenAI's WebRTC API. The visualization reacts to your voice in real time: 180 radial frequency bars orbit a glowing orb, 280 particles drift across a full-screen canvas, aurora sweeps and ripple waves emit on voice peaks, and the whole thing color-shifts from cool blue (listening) to warm violet (speaking). Near-zero latency, 8 voice options. Anyone with a free account at asksary.com gets 1 minute of real-time voice every month to try it out - no credit card needed. The platform also has a lot more built around it if you're curious: Models - GPT-5-Nano, GPT-5.2, GPT-5.2 Pro, O1 Reasoning, Claude Sonnet 4.6, Gemini 2.5 Flash, Gemini 3.1 Pro, Gemini Ultra, Grok 4, DeepSeek V3, DeepSeek R1 - with smart auto-routing or manual selection Memory and context - Persistent cross-model memory. Start on mobile with Claude, switch to GPT-5.2 on desktop and it already knows the conversation. Plus proactive personalization: on every login the chatbot reads your previous sessions and opens with a message asking if you want to continue - before you type anything. RAG - Upload docs up to 500 MB each, unlimited uploads, chat with them across any model via OpenAI Vector Store Generation - GPT-Image-1, Nano Banana Pro + Flux editor with visual history, Video Studio (Luma, Veo 3.1, Kling), Music Studio with ElevenLabs and in-chat visualizer, 3D Model Studio with STL export (coming soon) Builder tools - Vision to Code, Web Architect, Game Engine, Code Lab with SQL Architect / Bug Buster / Git Guru and more Voice and audio - Real-time chat, Podcast Mode (two AI voices, downloadable MP3), Voiceover, Voice Notes, Voice Tuner Productivity - Slides, Docs, Pro Writer, Social tools, Business Suite, CV Creator, Daily Briefing, Market Watch Platform - 30+ live wallpapers, Custom Agents, Folder org, Smart search, Media Gallery, 26 languages + RTL, fully customizable UI Happy to answer questions about the WebRTC implementation or anything else. Would love to hear what you think of the voice visualization. Free to try at asksary.com submitted by /u/Beneficial-Cow-7408 [link] [comments]
View originalYes, ElevenLabs offers a free tier. Pricing found: $0, $6, $22, $11, $99
Key features include: AI Voice Generator, Access a library of 10,000+ studio quality AI voices, ElevenCreative, ElevenAgents, All-in-one AI editor, Omnichannel agents, Analytics, Testing.
ElevenLabs is commonly used for: Eleven Multilingual.
ElevenLabs integrates with: OpenAI, AWS Lambda, Slack, Zapier, Google Cloud, Microsoft Azure, Discord, Trello, Notion, Salesforce.
Based on user reviews and social mentions, the most common pain points are: token usage, cost tracking.
Sequoia Capital
VC Firm at Sequoia Capital
2 mentions
Based on 42 social mentions analyzed, 17% of sentiment is positive, 81% neutral, and 2% negative.