Grain gives you AI-powered meeting recording for everyone, not just sales.
Grain is frequently praised for its advanced AI capabilities, particularly in enhancing agentic systems, as mentioned in discussions around "Signals" and in video content featuring "Grain AI." However, specific complaints about the software are less prominent in online discussions and reviews. Sentiment around pricing is unclear due to the lack of direct mentions, suggesting it may not be a significant point of contention or praise among users. Overall, Grain seems to have a strong reputation for innovation in AI, particularly in contexts like AI agent enhancements.
Mentions (30d)
12
2 this week
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Grain is frequently praised for its advanced AI capabilities, particularly in enhancing agentic systems, as mentioned in discussions around "Signals" and in video content featuring "Grain AI." However, specific complaints about the software are less prominent in online discussions and reviews. Sentiment around pricing is unclear due to the lack of direct mentions, suggesting it may not be a significant point of contention or praise among users. Overall, Grain seems to have a strong reputation for innovation in AI, particularly in contexts like AI agent enhancements.
Features
Use Cases
Industry
information technology & services
Employees
90
Funding Stage
Venture (Round not Specified)
Total Funding
$20.0M
Pricing found: $0, $19, $0, $19
Coffee, Claude, and Remotion is all you need to make launch videos.
https://reddit.com/link/1tik0qe/video/9bh6ypr3ca2h1/player A few hours, Claude Code + Remotion, 4 black coffees, no design tools, no After Effects, no editor. The whole trick: Remotion is React for video. You write JSX, you get an mp4. Every animation is interpolate(frame, [start, end], [from, to]). That means Claude Code can write the entire video for you — it already knows React, animation is just numbers, and you can iterate the same way you iterate on a landing page. Change a value, re-render, see what happens. That feedback loop is the whole unlock. I described the scenes I wanted, Claude wrote them, I tweaked timing and cut whatever felt slow. 5 small things that made it not look like a dev made it: Crossfade every cut. Don't hard-cut between scenes. Overlap them and blur-fade. Instantly stops feeling like a slideshow. One easing curve everywhere. cubic-bezier(0.22, 1, 0.36, 1) (expo-out) on every animation. Consistency in motion is 80% of "looks designed." Film grain + vignette overlay. Two dumb components on top of everything — SVG noise at 2% opacity, soft dark vignette. Cheapest cinematic upgrade in existence. Layered audio, not one track. Background music low, plus targeted SFX - whoosh only on chapter cuts, typing during the hook, pop on the CTA. Overdoing SFX is the #1 amateur tell. Cut ruthlessly. If a scene doesn't earn its place in 3 seconds, kill it. The first cut is always too long. Stack: Remotion, React, TypeScript, Claude Code, Google Fonts (DM Sans + Crimson Pro), a few SFX from freesound.org, one royalty-free background track. $0 in tools. Bonus meta thing: the video isn't a screen recording of my product. It's a Remotion-built launch video that features a real video output from my product (the Cultured AF deck one). So I used InkMotion to make the demo footage inside the launch video. Probably should've just used InkMotion to make the whole launch video and saved the 4 coffees. Next time. Happy to answer specifics in the comments. submitted by /u/Top_Commission_8567 [link] [comments]
View originalI used Claude AI to build an $86 million underground bunker bible. I have autism. This is my happy doc.
It all started with the floor plan of a real, existing Cold War AT&T Long Lines underground hardened relay station. 54,000 sq ft across three underground levels, although I took editorial decision making to move it to a ridge in rural West Virginia, I kept its blast-rating, which was set to survive a 20 megaton airburst at 2.5 miles. That was the seed. Full scale prepper autism did the rest. It has since morphed into 3 spreadsheets — 86 tabs total: • A food inventory across 20 categories tracking every freeze-dried and #10-can product I can find — ancient grains, heirloom legumes, 7 pasta cuts, dehydrated everything, shelf-stable cheese, the works • A supply inventory with 3,466 line items across 36 categories — water systems, medical, dental, pharmacy, livestock, food production, barter metals, recreation, and yes, a full pest control and IPM tab • A 30-section infrastructure specification with every system in the building engineered out I fed it 150+ product manuals and parts order forms. The generator fleet alone is 13 units — 10× Cummins C150N6 propane-primary, a C500N6 500 kW surge unit, and 2× diesel emergency fallback — all Cummins for parts commonality. Battery bank is 4,500 kWh LFP across 10 named banks (A through J, each with a designated role). There’s a 400,000 gallon underground propane farm across 40 ASME tanks in 8 clusters — I learned the exact burial incline and setback distance required to keep groundwater clean if a tank lets go. 120,000 gallons of diesel backup. 88 kW of solar. A 1,000,000-gallon internal water reserve fed by a 300-ft artesian well. Propane endurance: ~30 years normal ops with solar. Sealed-mode runs 8 to 4.5 years depending on scenario. I actually set up a real LLC (online, $99) just to get access to US Foods and Sysco order forms so I could upload real commercial pricing and stock the food tabs more accurately. My original “what would I do if I won $10 million” thought experiment is now an $86,200,497 projected build cost. That number is real. It comes from 24 budget sections with make/model line items, freight, install, and commissioning costs for everything from the Kubota K-Series MBR wastewater trains to the American Safe Room blast doors (14 of them, 50+ psi NBC/EMP-rated, Kaba Mas X-10 cipher locks) to the surface greenhouse. Claude turns vague ideas into engineering-grade detail — cross-references, failure modes, zone-specific storage rules, propane endurance by operating scenario, spare parts matrices. It’s like having a tireless survival engineer who genuinely loves spreadsheets. I’ll say “scan all sheets row by row for any item that lacks a minimum stock level” and it just… does it. Thoroughly. Every time. No complaints. So much of this is typed stimming. I’ve had exhaustive conversations with my psychologist about it — she’s aware, but not alarmed, and honestly the resulting digital bunker bible is scarily comprehensive. It even has a cover tab now. Black and amber, Courier New, classified-document aesthetic. Because of course it does. What’s the most unhinged rabbit hole you’ve gone down with AI? submitted by /u/Unable_Internet4626 [link] [comments]
View originalI replicated Anthropic's Generator-Evaluator harness to build a website through 12 adversarial AI iterations - here's the result and what I learned
Anthropic recently published their harness design for long-running apps — a multi-agent architecture inspired by GANs where a Generator builds code and an Evaluator critiques it in a loop. I built my own version using Kiro CLI and used it to generate a marketing website for my project Mnemo (persistent memory for AI coding agents). The architecture: Planner (runs once) → Generator ↔ Evaluator (12 iterations) Each agent is a separate CLI process with zero shared context. They communicate only through files (spec.md, eval-report.md). The Evaluator uses Playwright to actually browse the live site — not just read code. What made it work: Clean slate per invocation — each agent starts fresh, reads only its input files. Prevents context anxiety. Playwright MCP for testing — the evaluator navigates, clicks, resizes viewports. Catches visual bugs code review never would. Anthropic's frontend design skill — explicitly penalizes generic AI patterns (Inter font, purple gradients, card layouts). Forces creative risk-taking. Continuous iteration, not retry-on-failure— all 12 rounds run regardless. Each one improves. The progression was wild: Iteration 1: Exactly what you'd expect from AI — functional but forgettable Iteration 4: Generator pivoted to "Terminal Noir" — IBM Plex Mono, amber on black, grain textures, scanlines. This is the kind of creative leap that doesn't happen in single-shot generation. Iterations 5-12: Polish, accessibility, responsive fixes, reduced-motion support Stats: Total time: 3h 20min Iterations: 12 (generator + evaluator each) Manual code written: 0 lines (I fixed a few visual issues after) Tech: Next.js, Tailwind, Framer Motion, TypeScript Live result: https://mnemo-mcp.github.io/Mnemo/ Documentation : https://github.com/Mnemo-mcp/Harness Key takeaway: The model is the engine. The harness — the constraints, feedback loops, and adversarial structure around it — is what determines whether you get AI slop or something genuinely distinctive. submitted by /u/killerexelon [link] [comments]
View originalCreative writing has visibly regressed in newer models
Hi I'm testing different models for my game. I've noticed that creative writing has visibly regressed over time. From actual entertaining and explicit prose towards LinkedIn-ish cringe MBA approved enterprise I'm a helpful and safe assistant-ish style. Example from my game: Claude Opus 4.7: The village square reeks of woodsmoke and goblin-stink. At the far end, the warband bunches together — lean, scab-knuckled, blades already bare and twitching in their grips. Their leader stands a half-pace forward, eyes flicking across the crowd like a butcher pricing meat. Sneek bares too many teeth in a grin, the gums black where they've rotted. One filthy fist is wrapped around a length of rope. The other end is knotted at the wrists of a villager — old, grey-haired, head bowed so low his chin scrapes his chest. A dark stain spreads down the front of his trousers. Sneek Grain. THREE sacks. Old man's tools — all of them, the good iron, don't think Sneek doesn't know. And the girl, the strong one, yes — bring them out, OR. We. Burn. The well. Claude Sonnet 4 (already retired by Anthropic, only accessible over other providers): The square reeks of unwashed goblin musk and human terror-sweat. Lean, feral shapes mass at the far end — yellow eyes gleaming, rust-stained weapons already drawn. Their leader stands a half-step ahead, scarred lips pulled back in a predator's grin as his beady eyes rake over the crowd like he's already dividing the spoils. Sneek holds a frayed length of rope in his gnarled claws, yellowed teeth jutting at odd angles from blackened gums. On the other end: Jorik the miller's son, wrists bound and bleeding, his head hanging in defeat as snot drips from his broken nose onto the dirt below. Sneek Grain! THREE SACKS of your finest! The old man's hammer and tongs! And THAT girl there — she looks STRONG enough for Sneek's purposes! Bring them NOW or we'll poison your precious well and watch you all DIE of thirst! Sonnet 4's version is actually fun to read whereas Opus 4.7 sounds flattened down and boring. What do you think? I think it's a shame that all the big labs are moving towards erasing interesting/creative narrative prose/writing and we're all collectively converging towards sounding like corporate-safe LinkedIn managers. I guess that's what happens if all the RLHF evaluators are flagging anything that could be harmful or dangerous as wrong submitted by /u/tschilpi [link] [comments]
View originalPractical lessons from 50K lines of production code with Claude Code
I've been using Claude Code in full agentic mode for two months — not just autocomplete, but letting it write features, run tests, read CI output, and push fixes. Around 50K lines of production code. Here's what isn't obvious until you've done it at scale: Containerise it. Claude attempted to write to /etc/shadow in my container — a privilege escalation. Without containerisation, that's a real security incident. Docker with no host mounts and restricted network is non-negotiable. Give each instance its own GitHub bot account. You get transparency (everyone sees it's an LLM), fine-grained repo permissions, and the agent can manage its own PRs — open them, read review comments, push fixes. Flaky tests are worse than no tests. When CI is unreliable, the agent uses every failure as an excuse: "oh, the test is just flaky." Once I forced the test harness to be reliable, it had to actually fix its bugs. `CLAUDE.md` rules that matter: "every function gets a test," "never disable or weaken existing tests," "never use global mutable state." Also: insist on descriptive error messages — the agent's ability to self-repair from CI failures improves dramatically when error output is actually useful. "Not possible" = reassign as research. Multiple times Claude claimed something couldn't be done. Each time, telling it to research the problem without trying to fix it produced a solution. The article itself is a case study — the agent that drafted it pushed directly to main despite explicit instructions to open a PR. Then after writing an appendix analysing why that was wrong, it pushed the appendix directly to main again. Comprehension ≠ compliance. Full write-up: https://jappiesoftware.com/blog/a-practical-guide-to-agentic-software-development.html submitted by /u/jappieofficial [link] [comments]
View originalSignals: finding the most informative agent traces without LLM judges [R]
Hello Peeps Salman, Shuguang and Adil here from Katanemo Labs (a DigitalOcean company). Wanted to introduce our latest research on agentic systems called Signals. If you've been building agents, you've probably noticed that there are far too many agent traces/trajectories to review one by one, and using humans or extra LLM calls to inspect all of them gets expensive really fast. The paper proposes a lightweight way to compute structured “signals” from live agent interactions so you can surface the trajectories most worth looking at, without changing the agent’s online behavior. Computing Signals doesn't require a GPU. Signals are grouped into a simple taxonomy across interaction, execution, and environment patterns, including things like misalignment, stagnation, disengagement, failure, looping, and exhaustion. In an annotation study on τ-bench, signal-based sampling reached an 82% informativeness rate versus 54% for random sampling, which translated to a 1.52x efficiency gain per informative trajectory. Paper: arXiv 2604.00356. https://arxiv.org/abs/2604.00356 Project where Signals are already implemented: https://github.com/katanemo/plano Happy to answer questions on the taxonomy, implementation details, or where this breaks down. submitted by /u/AdditionalWeb107 [link] [comments]
View originalMusic Video Production
I want to make an AI-generated music video with a gritty black-and-white aesthetic, visible film grain, and realistic-looking people (some inspired by famous individuals). I’m new to AI video creation and don’t really know which software or workflow would be best for this kind of project. I’m based in the Netherlands, so the tools need to be available here. I was considering Seedance 2.0, but I’ve read that it may not be fully accessible outside China yet. Can anyone recommend the best AI tools/software for creating cinematic, realistic music videos with this kind of style? I’d also appreciate any advice on workflows, especially for achieving a vintage 90s film look. submitted by /u/loodgeboodge [link] [comments]
View originalI built IKANDY with Claude — a free music visualizer for PC (Integrates with Spotify, VLC, foobar2000, and works with everything else)
I built this almost entirely with Claude as my coding partner. I'm not a professional developer, but I do work on enterprise software development. I had an idea and used Claude to help architect, debug, and iterate the whole thing from scratch. What it does: IKANDY is a desktop music visualizer for Windows. It connects to Spotify, VLC, or foobar2000 and generates real-time visuals synced to your audio. Features include: 500+ MilkDrop/Butterchurn presets with auto-cycle 12 real-time GLSL shaders Synced lyrics (LRCLIB) Bass-reactive vignette, grain, and FX overlays 6 UI themes Physics and classic lyrics display modes submitted by /u/Far-Employee-9531 [link] [comments]
View original"He wanted to be CEO": Early OpenAI VC Vinod Khosla says Elon Musk’s bid for control led to the Sam Altman feud and his major investment
When Vinod Khosla sat down with Fortune in early March, he offered some key context for one of the most consequential tech trials in American history: Musk vs. Altman. “He wanted to be CEO,” Khosla said of Elon Musk. He explained the context around how Musk and Altman fell out around the governance of a then-obscure AI lab called OpenAI. Khosla added that he “wasn’t privy” to previous internal battles at OpenAI, telling Fortune's Editor-in-Chief Alyson Shontell to “take that with a grain of salt,” but he had no doubt that Musk wanted to run the AI company. “It seems like he wanted it like a private fiefdom, with him in charge, instead of what he claims—the public benefit company it is now. He essentially was holding the team, Sam and Greg and others hostage, and Sam had to look for other sources of money.” With Altman raising funds, that led naturally to a conversation, and Khosla said his resulting investment was “the largest bet I’d placed in 40 years by a factor of two for an initial bet”: $50 million at a $1 billion valuation, a bet that is now worth several hundred billion more. It was such a large investment, he revealed, that over 20 years in Khosla Ventures, “it’s the only time I made an investment and sent an apology letter to my LPs, saying I’m doing it anyway, but I realized how foolhardy this looks.” Read more: https://fortune.com/2026/04/28/vinod-khosla-says-elon-musk-wanted-to-be-openai-ceo-sam-altman-trial/ submitted by /u/fortune [link] [comments]
View originalSometimes the obvious...is not so obvious.
C.C., old buddy, why did you write 50 lines of code to ensure a constant wasn't mutable?" I love Opus, man. "He" reminds me of an old friend who was absolutely brilliant, but give him too many bong hits and he was off in a rabbit hole talking about UFOs, fifth dimensional travel and, "Bob Lazar is full of shit, man!" The mods wanted me to provide the 50 line sample that backs up my opening quote (rightfully so.) It happened with work code, so I can't copypasta, but that little ditty went something like this: (insert slow jazz here) 1 import inspect import sys import logging class ImmutableConstantMeta(type): """Metaclass to prevent rebinding of class-level constants.""" def __setattr__(cls, name, value): if name.isupper(): raise TypeError(f"CRITICAL: Cannot rebind constant '{name}'") super().__setattr__(name, value) class LegacyMigrationConfig(metaclass=ImmutableConstantMeta): # The actual constant that should have just been 1 line MAX_DB_RETRIES = 3 def max_db_retries(self): """Getter to ensure the constant is accessed safely.""" # Sanity check the constant's type in memory if not isinstance(self.MAX_DB_RETRIES, int): logging.critical("Security Alert: Constant type mutated in memory!") raise ValueError("MAX_DB_RETRIES must be an integer.") # Sanity check the value bounds if self.MAX_DB_RETRIES 10: logging.critical("Integrity Error: Constant bounds violated!") raise ValueError("MAX_DB_RETRIES must be between 0 and 10.") # Inspect the calling frame to ensure authorization caller_frame = inspect.currentframe().f_back caller_module = inspect.getmodule(caller_frame) if caller_module is not None and "django" not in caller_module.__name__ and "scripts" not in caller_module.__name__: logging.warning(f"Suspicious access from {caller_module.__name__}") # Ensure the integer memory signature hasn't changed unexpectedly if sys.getsizeof(self.MAX_DB_RETRIES) > 28: raise MemoryError("Constant memory allocation altered by external process.") return self.MAX_DB_RETRIES .setter def max_db_retries(self, value): """Strictly block any assignment attempts with a hard exception.""" logging.error(f"Attempted mutation of MAX_DB_RETRIES to {value}") raise AttributeError( "Attempted to mutate a protected constant. " "MAX_DB_RETRIES is strictly immutable and locked at the metaclass level." ) u/max_db_retries.deleter def max_db_retries(self): """Strictly block any garbage collection or deletion attempts.""" raise TypeError("Cannot delete a protected system-level migration constant.") # Helper function to access the constant safely def get_safe_retry_limit(): config = LegacyMigrationConfig() return config.max_db_retries Like, dude. I'm not writing SIL 4 code in Python.2 I'm an old programmer. I was refactoring COBOL in the 90s, man. (I swear I'm not a hipster.) I absolutely love Claude Code. CC is nothing short of a miracle. I may even be able to retire early because of CC. Hell, the fact that I may even be able to retire, at all, because of AI, would be a miracle.3 So, I find the juxtaposition between "this sucks" and "this rocks" humorous. I know Louis CK is a polarizing figure, but he had one old bit that struck a nerve with me. He was on a plane and Wifi (on a plane) was new. Everyone was amazed. Shortly into the flight, the Wifi failed and some guy scoffed, "This is bullshit, man." Louis' point was the guy wasn't appreciating the fact that Wifi, on a plane, was even possible or the technological miracles mankind has achieved, in such a short period of time. (My friend would say it's because Boeing reverse-engineered that "shit" they found in Roswell.) Having said all of that, I'm grateful for this technology. It's not a perfect tool, but damn if it isn't useful most of the time. And that's good enough for me. I've encountered my share of goofiness (like the nonsense above) and maddening edits that have really pissed me off. Here are my 3 tips to get CC's best. They're not original. These are all just anecdotal and IME, so take it with a grain of sodium chloride (or sodium hydroxide, if you're nasty.) 1.) Clear early, clear often. 1m context is not real. It sounds cool. The idea is cool...but, if you cross over 250K tokens, you're going to have a bad time. 2.) CC ignores your CLAUDE.md and explicitly does something you tell "him" not to? Or "he" makes an egregious, WTF error? Exit CC and restart. Do not clear. Exit the CLI, all the way. If you're configured to get the latest release, you may just find yourself on a new version of CC that fixes the very issues you were encountering a moment ago. 4 3.) Plan. Plan to plan...and then discuss. I may spend a full day -- or even a couple of days5 -- working on a plan and then going back and forth with CC to refine it before any code is written. Think of it this way: how good of a job are you going to do assembling an Ikea armoire (Shitzfling) without the instructions? So, there you have it. My honest take and experience in working with this "miracle worker." It can be fu
View originalGPT-Image-2 vs Nano Banana 2, nb2 tried its best...
the left one is so incredibly real i had to zoom in and verify it was actually AI, and the atmosphere the light the hair, all so realistic generated with the same prompt on AtlasCloud.ai to keep consistent Prompt: A candid, medium close-up photograph of a young Asian woman sitting on a traditional woven rattan chair outside a restaurant at night. She has long, straight black hair, dewy makeup, and is looking slightly away to the left. She wears a white ribbed cotton tank top over a black lace bralette, and medium-wash blue denim jeans. Small accessories like a thin necklace and bracelets are visible. She is leaning back, with her left arm resting casually on the chair's back. The background features the restaurant's dark glass facade on the right. In the distance on the left, a bright yellow sign for "KOZY KORNER RESTAURANT LIQUORS" is illuminated above a street scene. The lighting is warm and ambient, originating from the streetlights and restaurant, with some visible film grain. submitted by /u/Fresh-Resolution182 [link] [comments]
View original🔥Images 2.0 🔥 Prompt: “A massive pile of rice, on ONE rice grain there is text reading ‘wOw’ ”. API 4K ❤️
submitted by /u/py-net [link] [comments]
View originalWhy I'm enjoying Claude Design as a PM (not for taste, for workflow)
I'm a PM at a small company, we work on apps and web products with a few million users. Our engineering is deep into Claude Code, and I personally lean on Claude Artifacts / Cursor / Gemini almost daily to generate prototypes — mostly so designers and devs can see what I'm proposing instead of reading a wall of text in a PRD. For a long time I had four persistent pain points: No real collaboration. Every round I'd export HTML, we'd meet, discuss, I'd go back to the AI to iterate. I'd end up with 10+ HTML versions floating around. Huge time sink. No way to plug in our design system. (Maybe a skill issue — I haven't gone deep on Pencil or Stitch.) My demos looked ugly enough that our designer would roast them. I wanted prototypes that actually matched our product's visual language. No page-by-page view. Designers and devs had to click through the demo to figure out how many screens there were and how they connected. My designer recently started asking PMs to screenshot every page of a web demo and annotate elements + navigation logic — which honestly felt like a step backward. No fine-grained tweaking. For small changes — copy, a module's proportion, the style of one element — I didn't want to re-prompt and wait for a full regeneration every time. Then I tried Claude Design this week, and it pretty much addressed all four: Org-scoped sharing works. Designers and devs can open the same design and see changes live. No more HTML file graveyard. Design system import is built in. (Though I burned through my entire weekly limit just setting it up 🥲 — actual results next week.) Pages render on a canvas like Figma frames — titled, interactive, and the full flow is visible at a glance. Way easier for the team to grasp the logic without clicking through. The sliders / custom knobs are the real unlock for me. For a lottery page I was prototyping, Claude gave me a control to swap between a spinning wheel, gachapon, and card-draw — all interactive, no re-prompting. This is the thing I've been wanting for a year. So — pretty happy with it as a tool. It obviously hasn't improved my design taste; that's still on me. And the weekly limit is real, plan accordingly. Curious what workflows other PMs / non-designers have landed on for collaborating with designers and devs via AI. Anything I should be trying alongside this? submitted by /u/InfiniteJX [link] [comments]
View originalHow I got my Claude Design landing video to actually play in Safari. * Claude Design is amazing btw.
claude design I used Claude Design to make a 17-second landing animation. The designer output was beautiful, took me ~30 minutes to generate + iterate. Normally this is a week of motion-graphics work. Then I tried to ship it on shipfolio.app. Chrome played it. Safari showed a black screen. 19 commits later I understand why (or claude did lol). Sharing in case someone else is about to eat the same 4 hours: Safari quietly refused my video. Turns out the "most compatible" video format (called Baseline) is the one Safari hates. The big sites like Framer and Resend all use a different flavor (High). Copied their setup, worked instantly. Dark gradients looked like stripes. My intro fades through black. On the first export, the black wasn't smooth, it came out in visible bands. Adding a tiny amount of noise to the video (a single flag called -tune grain) smoothed it out. Human eye reads the noise as grain, not stripes. 3. Safari remembers when a file is broken. I re-exported the video six times to the same filename. Safari had already decided that URL was bad and kept refusing it even after I fixed it. Renaming the file (v2.mp4 → v3.mp4) made Safari treat it as new. 4. Telling the browser to "preload everything" backfired. I assumed preload="auto" would help. It doesn't, it makes Safari less likely to autoplay. Switched to preload="metadata" (just enough to know how long the video is) and autoplay worked. The one that actually broke me. Claude Design's animation tool saves your playback position to the browser. So every time I reloaded to record a clean take, it picked up from wherever I last paused, not from the beginning. That's why I kept getting footage of scene 3 instead of scene 1. Fix was one line of code that tells the tool "pretend nothing was saved." Took 4 hours to find. 5 seconds to write (for claude again lol). Anyone else found a cleaner way to add their animation exports to their landing page? submitted by /u/Vitalic7 [link] [comments]
View originalI ran Opus 4.7 vs Old Opus 4.6 vs New Opus 4.6 on 28 Zod tasks
Opus 4.7 vs Old Opus 4.6 vs New Opus 4.6 on a 28-task Zod benchmark Everyone says Opus 4.6 was getting dumber. Then Opus 4.7 released mid-test, so I ran both questions end-to-end: does a fresh Opus 4.6 still match the March-19 Opus 4.6, and is 4.7 actually better? Three Opus snapshots, 28 historical Zod tasks, identical 12/28 test pass rate across all three arms. On raw pass rate the upgrade looks flat. Above the test gate the arms diverge enough that the useful mental model is Opus 4.7 is directionally better, not categorically better Opus 4.7 appears to be a more disciplined coder, not a fundamentally smarter one. On cost, tokens, and wall-clock time: 4.7 is cheaper per task than March 4.6 ($8.11 vs $8.93), uses fewer total tokens (44.0M vs 49.1M), and finishes the full 28-task run faster (1h 30m vs 1h 36m). Fresh 4.6 is the cheapest arm, but it takes 2.3x longer to produce looser, less equivalent patches. I'm building Stet, which scored these runs on equivalence, footprint, craft, and discipline beyond pass/fail. Zod was chosen as a specific, concrete repo rather than a high-level benchmark — I've seen similar shapes on internal repos. Arm Reasoning effort What it represents Opus 4.6, March 19, 2026 high Earlier Opus 4.6 run on this same task set Opus 4.6, April 16, 2026 high Fresh Opus 4.6 rerun on the same task set Opus 4.7, April 16, 2026 high Fresh Opus 4.7 run on the same task set Methodology: Sample merged commits from Zod as the baseline Run each Opus snapshot in Claude Code to reproduce the same changes Score each patch alongside test pass rate on: Equivalence — does the patch solve the intended problem, regardless of whether tests catch it? Code-review pass — binary: does the patch look merge-worthy? Footprint risk — how divergent is the patch from the accepted change? Lower is better. Craft (0–4) — simplicity, coherence, intentionality, robustness, clarity. Discipline (0–4) — instruction adherence, scope discipline, diff minimality. Grading notes: the judge is gpt-5.4, run with identical rubric versions across all three arms. Each patch is scored independently - The judge sees the patch and task, not the arm label or model name. No dual-rater calibration, so treat absolute scores as directional; the cross-arm deltas are the thing to trust. Headline Arm Tests passed Equivalence Code-review pass Footprint risk Mean time/task Cost/task Total tokens Old Opus 4.6 12/28 39.3% 11/28 0.210 3m 26s $8.93 49.1M New Opus 4.6 12/28 32.1% 7/28 0.221 7m 58s $6.65 35.6M Opus 4.7 12/28 46.4% 7/28 0.090 3m 12s $8.11 44.0M All three arms pass identical tests. The one dimension where 4.7 doesn't lead is the binary code-review bar, where the March 19 run cleared it more often (11 vs 7); fresh 4.6 is modestly cheaper per task. A lot of people say 4.7 is more expensive. On this slice it isn't: $8.11/task vs $8.93 for March 4.6, and 44.0M vs 49.1M total tokens. Fresh 4.6 is the cheapest arm ($6.65, 35.6M tokens) but takes 2.3x longer to produce looser, less equivalent patches — the savings buy you worse output. Everywhere else — equivalence, footprint risk, maintainability on shippable-looking patches, mean task time — 4.7 is the strongest of the three. New Opus 4.6 is the weakest arm: lower equivalence, higher footprint risk, longer time-to-task. It used ~28% fewer input tokens than the March run despite taking 2.3x longer. Whatever changed under the hood, the output is looser patches, and thinking for less. Footprint risk is the clearest signal Footprint risk asks whether the patch is larger or more divergent than the accepted change. Lower is better. It's the delta I'd trust most - showing more than 2x relative drop, on a more continuous measurement than the rubric scores. Arm Mean footprint risk Low Medium High Old Opus 4.6 0.210 26 1 1 New Opus 4.6 0.221 22 3 3 Opus 4.7 0.090 27 1 0 Opus 4.7 had no high-footprint patches. New Opus 4.6 more often made changes that touched more code than necessary. Equivalence Equivalence asks whether the patch solves the intended problem, not merely whether available tests catch it. 4.7's patches were more equivalent with the human-authored Zod changes, consistent with being more aligned to codebase standards and human intent. Arm Equivalence Old Opus 4.6 39.3% New Opus 4.6 32.1% Opus 4.7 46.4% Review shape on shippable patches Narrowing to patches that cleared the code-review bar (higher is better): Arm Correctness Bug risk Edge cases Maintainability Overall Old Opus 4.6 1.38 2.08 2.00 2.00 1.87 New Opus 4.6 2.00 2.46 2.46 2.46 2.35 Opus 4.7 2.15 2.54 2.46 2.85 2.50 The pattern isn't "4.7 is uniformly more correct." It's closer to: when 4.7 produces a shippable-looking patch, that patch tends to be cleaner and more maintainable. Craft and discipline Craft (simplicity, coherence, intentionality, robustness, clarity, 0–4): Arm Simplicity Coherence Inten
View originalYes, Grain offers a free tier. Pricing found: $0, $19, $0, $19
Key features include: Universal capture, Context for AI, AI agent integrations, Team collaboration, What is Grain?, Who is Grain for?, How do I use Grain with my team?, How much does Grain cost?.
Grain is commonly used for: Frequently Asked Questions.
Grain integrates with: Zoom, Google Meet, Microsoft Teams, Slack, Salesforce, Asana, Trello, Notion, HubSpot, Calendly.
Based on 20 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
MIT Tech Review AI
Publication at MIT Technology Review
1 mention

🔁 Shifting to an Everboarding Culture | Katelyn Cox
May 30, 2025