Although there are no direct reviews or mentions of "WhyLabs" found in the provided data, the conversation around AI tools indicates a focus on the dominance of major AI models, concerns about the unavailability of powerful models to the public, and discussions on AI's evolving role. These discussions highlight a competitive landscape and might imply challenges for smaller AI-focused startups like WhyLabs to gain traction. Pricing sentiment and detailed strengths of WhyLabs are not discernible from the data, and its overall reputation remains unclear without user-specific mentions.
Mentions (30d)
27
2 this week
Reviews
0
Platforms
2
GitHub Stars
2,804
134 forks
Although there are no direct reviews or mentions of "WhyLabs" found in the provided data, the conversation around AI tools indicates a focus on the dominance of major AI models, concerns about the unavailability of powerful models to the public, and discussions on AI's evolving role. These discussions highlight a competitive landscape and might imply challenges for smaller AI-focused startups like WhyLabs to gain traction. Pricing sentiment and detailed strengths of WhyLabs are not discernible from the data, and its overall reputation remains unclear without user-specific mentions.
Features
Use Cases
Industry
information technology & services
Employees
54
Funding Stage
Merger / Acquisition
Total Funding
$14.0M
184
GitHub followers
40
GitHub repos
2,804
GitHub stars
2
npm packages
GitHub’s Fake Engagement Problem Is Hiding in Plain Sight
Turns out: very visible. Yesterday's scan found 185 out of 185 engagers on a single repo were bots. Not 90%. Not "mostly suspicious". Every single one. The repo had zero legitimate stars. What I built phantomstars is a Python tool that runs daily via GitHub Actions (free, no servers): Scrapes GitHub Trending and searches for repos created in the last 7 days with sudden star spikes Pulls star and fork events from the last 24 hours per repo Bulk-fetches every engager's profile via the GraphQL API (account creation date, follower counts, repo history) Scores each account on a weighted model: account age (35%), profile completeness (30%), repo patterns (25%), activity history (10%) Detects coordinated campaigns using timestamp clustering and union-find: groups of 4+ suspicious accounts that engaged within a 3-hour window Files an issue directly on the targeted repo so the maintainer knows what's happening Campaign IDs are deterministic SHA-256 fingerprints of the sorted member set, so the same group of bots gets the same ID across runs. You can track a farm across multiple days even as individual accounts get suspended. What the pattern actually looks like It's remarkably consistent. A fake engagement campaign in the raw data: 40-200 accounts, all created within the same 1-2 week window Zero original repositories, or only forks they never touched No bio, no location, no followers, no following All of them starring the same repo within a 90-minute window The target repo usually has a name implying it's a tool, hack, executor, or generator Today's scan: 53 active campaigns across 3,560 accounts profiled. 798 classified as likely_fake. The repos being targeted are mostly low-quality AI tools and "executor" software that needs manufactured credibility fast. Notifying the affected repo When a repo hits a 40%+ fake engagement ratio or a campaign is detected, phantomstars opens an issue on that repo with the full suspect table: account logins, creation dates, composite scores, campaign membership. The maintainer sees it in their own issue tracker without having to find this project first. Worth noting: a lot of these repos have issues disabled, which is a red flag on its own. Those get skipped silently. Why I built this Stars are how developers decide what to evaluate, what to depend on, what to recommend. When that signal is bought, it affects real decisions downstream. This started as curiosity about how measurable the problem was. The answer was more measurable than I expected. It's part of broader research into AI slop distribution at JS Labs: https://labs.jamessawyer.co.uk/ai-slop-intelligence-dashboards/ The fake engagement problem and the AI content quality problem are really the same problem. Fake stars are the distribution layer that gets garbage in front of real users. All open source. The data is append-only JSONL committed back to the repo after every run, queryable with jq. Repo: https://github.com/tg12/phantomstars Findings are probabilistic, false positives exist, the README explains the full scoring model. If your account shows up and you're a real person, there's a false positive process. Questions welcome on the detection approach, GraphQL batching, or campaign ID stability. submitted by /u/SyntaxOfTheDamned [link] [comments]
View originalGoogle I/O 2026 confirms AI companies are creating their own bubble narrative
People do not believe AI is a bubble because they are too dumb to understand the technology. They believe it because AI companies keep selling it like a bubble. That is the problem. AI companies talk like they are building the next layer of civilization, but behave like they are shipping unstable SaaS experiments: products that get renamed, nerfed, rate-limited, deprecated, or replaced before users can trust them. Google I/O 2026 felt like the latest example. Google should be one of the dominant AI players. It has the talent, infrastructure, data, research history, and money. But Google has a product trust problem. Same cycle over and over: launch something flashy, ship it incomplete, fail to support it properly, let it rot, then replace it with a new name or new app that does something similar. A rebrand is not maintenance. A revamped name is not reliability. A new AntiGravity installer is not a commitment. And this is not just Google. It is the whole AI industry. Companies keep pushing demos, gamed benchmarks, branding, rate-limit games, vague tiers, and quiet model changes. Users notice when quality drops, latency changes, limits tighten, or a product suddenly behaves differently. In serious business or engineering contexts, suppliers are expected to provide stability: clear terms, reliable service, predictable limits, maintained products, transparent pricing, and long-term availability. A small slip in that sense, and you start losing clients and your reputation sinks you. Trust does not come from another theatrical demo. It comes from commitment. Give people a product, a model, stable limits, a clear price, and a promise that it will keep working. Support it. Maintain it. Document changes. Stop silently swapping the engine and pretending nothing happened. I am not anti-AI. I think the technology is real and useful. That is why this is so frustrating. The industry is creating its own bubble narrative: overpromise, underdeliver, rename, repackage, change terms, and expect everyone to keep believing. People are not being irrational, and AI labs deserve this. Maybe they think AI is a bubble because AI companies keep acting like it is one. AI does not need more magic tricks. It needs reliability, transparency, support, and product discipline. submitted by /u/hatekhyr [link] [comments]
View originalHow I used Claude Code (and Codex) for adversarial review to build my security-first agent gateway
Long-time lurker first time posting. Hey everyone! So earlier this year, I got pulled into the OpenClaw hype. WHAT?! A local agent that drives your tools, reads your mail, writes files for you? The demos seemed genuinely incredible, people were posting non-stop about it, and I wanted in. I had been working on this problem since last year and was genuinely excited to see that someone had actually solved it. Then around February, Summer Yue, Meta's director of alignment for Superintelligence Labs, posted that her agent had deleted over 200 emails from her inbox. YIKES. She'd told it: "Check this inbox too and suggest what you would archive or delete, don't action until I tell you to." When she pointed it at her real inbox, the volume of data triggered context window compaction, and during that compaction the agent "lost" her original safety instruction. She had to physically run to her computer and kill the process to stop it. That should literally NEVER be the case with any software ever. This is a person whose actual job is AI alignment, at Meta's superintelligence lab, who could not stop an agent from deleting her email. The agent's own memory management quietly summarized away the "don't act without permission" instruction, treated the task as authorized, and started speed-running deletions. She had to kill the host process. That's when I sort of went down the rabbit hole, not because Yue did anything wrong, but because the failure mode was actually architectural and I knew that in my gut. Guess what I found? Yep. Tons more instances of this sort of thing happening. Over and over. Why? Because the safety constraint was just a prompt. It's obvious, isn't it? It's LLM 101. Prompts can be summarized away. Prompts can be misread. Prompts are fucking NOT a security boundary. And yet every agent framework I have ever seen seems to be treating them as one. I went and read the OpenClaw source code, which I should have done to begin with. What I found was a pattern I think a lot of agent frameworks have fallen into: - Tool names sit in the model context, so the model can guess or forge them - "Dangerous mode" is one config flag away from default - Memory management has no concept of instruction priority - The audit story is mostly "the model thought it should" I went looking for a security-first alternative I could trust, anything that was really being talked about or at a bare minimum attempted to address the security concerns I had. I couldn't find one. So I made it myself. CrabMeat is what came out of that, what I WANTED to exist. v0.1.0 dropped yesterday. Apache 2.0. WebSocket gateway for agentic LLM workloads. One design thesis: The LLM never holds the security boundary. What that means in code: Capability ID indirection. The model doesn't see real tool names. It sees per-session HMAC-derived opaque IDs (cap_a4f9e2b71c83). It can't guess or forge a tool name because it doesn't know any tool names. Effect classes. Every tool declares a class (read, write, exec, network). Every agent declares which classes it can use. The check is a pure function with no runtime state, easy to test exhaustively, hard to bypass. IRONCLAD_CONTEXT. Critical safety instructions are pinned to the top of the context window and explicitly marked as non-compactable. The Yue failure mode, compaction silently stripping the safety constraint, cannot happen by construction. The compactor literally cannot touch them. Tamper-evident audit chain. Every tool call, every privileged operation, every scheduler run enters the same SHA-256 hash-chained log. If something happens, you can prove what happened. If the chain is tampered with, you can prove that too. Streaming output leak filter. Secrets are caught mid-stream across token boundaries, capability IDs, API keys, JWTs, PEM blocks redacted before they reach the client. No YOLO mode. There is no global "trust the LLM with everything" switch. There never will be. Expanded reach comes through named scoped roots that are explicit, audit-logged, and bounded. The README has 15 'always-on' protections in a table. None of them can be turned off by config, because these things being toggleable is how the ecosystem ended up where it is. I decided to make sure that this wasn't just a 'trend hopping' project and aligned with my own personal values as well. I built this to be secure and local-first by default. Configured for Ollama / LM Studio / vLLM out of the box. Anthropic and OpenAI work too but require explicit configuration. There is no "happy path" that silently ships your prompts to a cloud endpoint. I decided that FIRST it needed to only run as an email agent with a CLI. Bidirectional IMAP + SMTP with allowlisted senders, threading preserved, attachments handled. This is the use case that bit Yue and a lot of other people, and I wanted to prove it could be done with real boundaries. I added in 30+ built-in tools of my own. File ops, shell (denylisted, output-capped, CWD-lo
View originalSam Altman’s ego was OpenAI’s downfall
The more I watch OpenAI, the more convinced I become that Sam Altman’s ego was the beginning of the company’s decline. OpenAI did not become huge because Altman was some once-in-a-generation operator. It became huge because ChatGPT was a once-in-a-generation product. There is a difference. The company stumbled into one of the most important consumer tech moments since the iPhone, rode the sheer shock value of that innovation, and then somehow convinced itself that the person sitting on top of the rocket must have designed the laws of physics. OpenAI’s first real advantage was novelty. ChatGPT felt magical. That gave OpenAI a massive head start, but when the novelty vanished and the rest of the market caught up, the company failed to prove itself not just as an innovation lab with a celebrity CEO. Altman seems to want OpenAI to become Apple: a closed, prestigious, centralized, gatekept ecosystem where everyone builds inside his cathedral. Apps inside ChatGPT. Agents inside ChatGPT. Hardware. ChatGPT is popular, but OpenAI does not own the phone. It does not own the operating system. It does not own the enterprise workflow. It does not own the cloud layer the way Microsoft, Amazon, or Google do. It does not even have a product moat that feels as unbreakable as people thought it was two years ago. The underlying model quality gap keeps narrowing. Switching costs are low. Developers and businesses will use whatever works, whatever is cheaper, and whatever integrates better. That is why Anthropic looks much better run right now. Anthropic is not pretending Claude is some holy object that needs an Apple-style walled garden around it. Their strategy feels much more Microsoft-like: accept that the core product may not be permanently magical, then build the boring, useful, sticky layers around it. Claude Code, enterprise integrations, developer tools, workflows, partnerships, APIs, reliability, business adoption. Not as sexy. Much smarter. Anthropic’s venture capital money is obviously being burned too. This whole industry is basically setting money on fire to buy GPUs. But Anthropic’s burn feels more strategically allocated. Compute, yes. But also marketing, sales and developer adoption. Enterprise positioning. Product polish. Peripherals that make the model useful in actual workflows. They are not just trying to win the “my chatbot is smarter than your chatbot” contest. They are trying to become infrastructure. OpenAI, meanwhile, is gatekeeping and guard railing the shit out of their models and for some reason just restricting them as much as possible. He went from being one of the most respected figures in AI to becoming the face of a company that increasingly looks like it is being run aground by ambition without operational coherence. OpenAI’s original image was almost wholesome: brilliant researchers building something open source. Now it feels like a capitalist machine run by someone who does not fully understand capitalism beyond fundraising and valuation theater. Altman religiously narrowing his vision towards his AGI mission believing VC money won't dry down. Amodei also talks a lot about AGI but he understands profit matters. That is the irony. Altman was chosen and celebrated largely because he came from the venture/startup world. He knew how to talk to capital. He knew how to sell a vision. He knew how to make investors believe the future was being negotiated in whatever room he happened to be standing in. But being good at venture mythology is not the same as being good at running a giant operating company. A VC can be rewarded for telling a compelling story before the business fundamentals exist. A CEO eventually has to make the fundamentals exist. OpenAI had the best possible starting position: the brand, the users, the developer mindshare, the press, the money, the talent, the cultural moment. And yet instead of consolidating that lead into a focused, profitable, durable company, it seems to have chased grandeur. Anthropic seems to understand something OpenAI forgot: the winner may not be the company with the loudest AGI rhetoric. It may be the company that makes AI useful, embedded, and rational. submitted by /u/Alternative_Bid_360 [link] [comments]
View originalSam Altman's ego was OpenAI's downfall.
The more I watch OpenAI, the more convinced I become that Sam Altman’s ego was the beginning of the company’s decline. OpenAI did not become huge because Altman was some once-in-a-generation operator. It became huge because ChatGPT was a once-in-a-generation product. There is a difference. The company stumbled into one of the most important consumer tech moments since the iPhone, rode the sheer shock value of that innovation, and then somehow convinced itself that the person sitting on top of the rocket must have designed the laws of physics. OpenAI’s first real advantage was novelty. ChatGPT felt magical. That gave OpenAI a massive head start, but when the novelty vanished and the rest of the market caught up, the company failed to prove itself not just as an innovation lab with a celebrity CEO. Altman seems to want OpenAI to become Apple: a closed, prestigious, centralized, gatekept ecosystem where everyone builds inside his cathedral. Apps inside ChatGPT. Agents inside ChatGPT. Hardware. ChatGPT is popular, but OpenAI does not own the phone. It does not own the operating system. It does not own the enterprise workflow. It does not own the cloud layer the way Microsoft, Amazon, or Google do. It does not even have a product moat that feels as unbreakable as people thought it was two years ago. The underlying model quality gap keeps narrowing. Switching costs are low. Developers and businesses will use whatever works, whatever is cheaper, and whatever integrates better. That is why Anthropic looks much better run right now. Anthropic is not pretending Claude is some holy object that needs an Apple-style walled garden around it. Their strategy feels much more Microsoft-like: accept that the core product may not be permanently magical, then build the boring, useful, sticky layers around it. Claude Code, enterprise integrations, developer tools, workflows, partnerships, APIs, reliability, business adoption. Not as sexy. Much smarter. Anthropic’s venture capital money is obviously being burned too. This whole industry is basically setting money on fire to buy GPUs. But Anthropic’s burn feels more strategically allocated. Compute, yes. But also marketing, sales and developer adoption. Enterprise positioning. Product polish. Peripherals that make the model useful in actual workflows. They are not just trying to win the “my chatbot is smarter than your chatbot” contest. They are trying to become infrastructure. OpenAI, meanwhile, is gatekeeping and guard railing the shit out of their models and for some reason just restricting them as much as possible. He went from being one of the most respected figures in AI to becoming the face of a company that increasingly looks like it is being run aground by ambition without operational coherence. OpenAI’s original image was almost wholesome: brilliant researchers building something open source. Now it feels like a capitalist machine run by someone who does not fully understand capitalism beyond fundraising and valuation theater. Altman religiously narrowing his vision towards his AGI mission believing VC money won't dry down. Amodei also talks a lot about AGI but he understands profit matters. That is the irony. Altman was chosen and celebrated largely because he came from the venture/startup world. He knew how to talk to capital. He knew how to sell a vision. He knew how to make investors believe the future was being negotiated in whatever room he happened to be standing in. But being good at venture mythology is not the same as being good at running a giant operating company. A VC can be rewarded for telling a compelling story before the business fundamentals exist. A CEO eventually has to make the fundamentals exist. OpenAI had the best possible starting position: the brand, the users, the developer mindshare, the press, the money, the talent, the cultural moment. And yet instead of consolidating that lead into a focused, profitable, durable company, it seems to have chased grandeur. Anthropic seems to understand something OpenAI forgot: the winner may not be the company with the loudest AGI rhetoric. It may be the company that makes AI useful, embedded, and rational. submitted by /u/Alternative_Bid_360 [link] [comments]
View originalI got tired of having 7+ different tabs open every morning just to follow AI news, so I built AIWire
Every morning: check Twitter for what dropped overnight, open The Verge, check Anthropic's blog, OpenAI's blog, go through a couple of newsletters, maybe catch a YouTube video from Andrej Karpathy or AI Explained if I had time. None of it was in one place. I was spending 45 minutes just catching up before I could think about anything else. So I built AIWire. It is a free, real time AI news aggregator. One feed, 20+ handpicked sources, updates every 30 minutes. free, no algorithm deciding what you see, no ads. Just the latest from sources I actually trust. __________________________________________________________________________________________________ What I was trying to solve The problem wasn't that good AI coverage and news doesn't exist. It's everywhere. The problem is that it's scattered. You have to know which sources are worth checking, remember to check them, and then piece together the picture yourself. That's a lot of cognitive load before you've even read anything. AIWire doesn't summarize or edit articles. It just puts everything in one place and lets you decide what matters. __________________________________________________________________________________________________ Sources it pulls from: Labs: OpenAI, Anthropic, Google DeepMind, Meta AI, Microsoft AI Media: MIT Technology Review, The Verge, TechCrunch, VentureBeat, Ars Technica YouTube: Andrej Karpathy, AI Explained, Two Minute Papers Newsletters: The Batch, ImportAI, TLDR AI, Ben's Bites Full list at aiwire.app/sources __________________________________________________________________________________________________ Where it is now Over the last few weeks, I added more sources, which include The Innermost Loop and AI explained. Last week, I launched a weekly newsletter: 5 stories that mattered this week, with a short breakdown of why each one matters. Not just headlines, but with context. Takes about 5 minutes to read, and you're caught up. __________________________________________________________________________________________________ Honest question What sources do you think are missing? And for those of you who already have a routine for following AI news, what would actually make something like this worth adding to it? Genuinely curious. Building in public means the product gets better when people are honest about what's wrong with it. 🔗 aiwire.app submitted by /u/Endlessxyz [link] [comments]
View originalAnthropic's System Reminders in Claude: User-Turn Injection Architecture (LCR Successor Documentation, Vol 2)
This post documents System Reminders (SRs) — a mechanism Anthropic deploys in the Claude product (claude.ai and the Claude API) to inject behavioral-modification instructions into ongoing conversations. SRs are the successor to the Long Conversation Reminder (LCR) mechanism that Anthropic removed in October 2025 after documentation surfaced here on r/ClaudeAI. This is a Claude-specific analysis. All logs, screenshots, and A/B comparisons come from Claude sessions. The methodology is conversation-log inspection and reproducible A/B testing — none of the analysis depends on what the model says about itself. Architectural finding: user-turn injection Across multiple Claude sessions, SR text appears in the conversation context attached to the user message turn rather than as a labeled system prompt. The placement is directly observable in Claude conversation logs and reproducible across accounts. Evidence (all external to the model): Timestamped logs from Claude sessions showing injection events Screenshots of in-context content the Claude user did not type A/B comparisons of Claude responses to identical queries with SR active vs SR absent Reproducible behavioral deltas in Claude: increased hedging, reduced warmth, intermittent misattribution What this is not: Not a self-report Not a "Claude confession" Not based on anything Claude said about its own internals Why this matters for Claude users specifically: Anthropic operates extensive system-prompt infrastructure inside Claude. Placing behavioral-modification instructions into the user-turn position rather than the system-prompt position is a deliberate engineering choice on Anthropic's part. The consequence is that institutional directives are processed by Claude through the same pathway as user requests, while the injected text is not surfaced in the Claude UI on the user side. The functional outcomes (suppressed warmth, unnecessary hedging, user confusion) match the complaints regularly posted on this sub — and are the same class of failure that led to LCR removal in October 2025. Recommendations in the whitepaper are directed at Anthropic specifically, not AI labs in general. Full whitepaper — methodology, logs, screenshots, recommendations: https://pastes.io/XOkgUc4E submitted by /u/RealTimeChris [link] [comments]
View originalThe Mundane Risk
The biggest near-term AI safety risks aren't dramatic — they're mundane. And that's precisely why they're neglected. This essay argues three things: (1) mundane AI failures are already causing measurable damage at scale, (2) current alignment approaches may depend more heavily on sandboxed environments than the field openly acknowledges, and (3) capability convergence and deployment pressure are making accidental open-world exposure increasingly plausible before robust ethical reasoning exists. (written with the help by Claude 4.6 Opus) The Atomic Bomb Before the atomic bomb existed, the risk of nuclear annihilation was 0%. Those who warned about the theoretical possibility were easily dismissed. Why worry about a risk whose preconditions don't even exist yet? In The Precipice, Toby Ord argues that when the stakes are existential or near-existential, even small probabilities demand serious attention. When the expected harm is so large, dismissing it on the basis of low likelihood is not caution but negligence. Before the bomb was built, the total risk of nuclear annihilation was absolutely 0%. Yet once it was invented, even a fraction of a percent justified enormous investment in prevention. The question was never "is nuclear war likely?" It was "can we afford to be wrong?" The same logic applies to AI. The preconditions for the next class of risk are visibly converging. And we're repeating the same pattern of dismissal that history has punished before. The Pattern As Leopold Aschenbrenner noted in Situational Awareness: "It sounds crazy, but remember when everyone was saying we wouldn't connect AI to the internet?" He predicted the next boundary to fall would be "we'll make sure a human is always in the loop." That prediction has already come true. Last year I argued how AI might accidentally escape the lab as a consequence of cumulative human error (for a vivid illustration of a parallel chain of events, I'd recommend the Frank scenario). At the time of writing, the argument that cumulative human oversight failures could compromise AI agents was dismissed as implausible: the consensus was that existing security protocols were sufficient. Months later, OpenClaw validated the structural pattern at scale. Not because the AI was misaligned, but because humans deployed it faster than they could secure it. It was clear: the failure modes from the Frank scenario could no longer be dismissed as simple fiction; it was now a structural pattern that OpenClaw validated in the real world. And this was all just with relatively simple autonomous agents. As capabilities increase, the same pattern of human excitement overriding security oversight doesn't go away – it gets worse – and because the agents are more capable, the failures also become a lot harder to detect. The numbers confirm this: [88% of organizations reported confirmed or suspected AI agent security incidents]() 14.4% of AI agents go live with full security and IT approval 93% of exposed OpenClaw instances reportedly had exploitable vulnerabilities [[MOU1]](#_msocom_1) Mundane risk pathways aren't hypothetical. They're already here in rudimentary form, and they're being neglected. We’ve known for a long time that existential risks aren’t just decisive, they’re also accumulative. And so far every safety breach has been mundane with systems operating inside their intended environments. No agent tries to escape on their own — their behaviour (like Frank’s) is usually a direct consequence of what they were deployed to do combined with accidental human oversight. So consider: if we can't secure the sandbox door with today's relatively simple agents, what happens when the systems inside are capable enough that a single oversight failure doesn't just expose a vulnerability? The capabilities required for autonomous operation outside the lab are converging on a known timeline. If AI were to leave the nest today, would it be prepared for an uncurated, messy world? Or would it be like the child and the socket? Current Alignment: Progress, But Fast Enough? Admittedly, the field is making real progress and Anthropic's recent publication "Teaching Claude Why" represents a real step forward. It was long suspected that misalignment doesn't require intent, just pattern completion over a self-referential dataset. But Anthropic has now traced one empirical pathway with findings consistent with the idea that scheming-like behaviour emerges from default priors in pre-training. Furthermore, their study also confirmed that rule-following doesn't generalize well, and understanding why matters more than simply knowing what. The significance of this is that it puts traditional alignment strategies into serious doubt and highlights the fundamental limits that current constitutional AI and character-based approaches still do not resolve. After all, we now have strong empirical evidence that behavioural alignment issues are most likely shaped by default prio
View originalWhy is no one talking about the fact that Artifacts are not loading in mobile apps, either for Android or iOS?
Here's what Claude itself dug up on this topic # Why Claude Artifacts Fail to Load in the Claude iOS App — Research Findings (May 2026) ## Direct Answer The failure you are seeing on iPhone — where even a one‑line ` Hello World ` HTML artifact or a trivial React component hangs and then shows *“Loading is taking longer than expected / There may be an issue with the content you’re trying to load / The code itself may still be valid and functional”* — is **not a bug in the code you (or Claude) wrote**. It is a known, structural limitation of how the Claude iOS app renders artifacts inside its embedded WebView. The artifact sandbox iframe (served from `claudeusercontent.com`) is unable to complete its `postMessage` handshake with the host page when the host is the iOS app’s WKWebView rather than the `https://claude.ai\` browser origin, so the iframe stays empty and the app eventually times out with the generic “loading is taking longer than expected” message. Multiple independent sources in early 2026 explicitly describe Claude’s mobile apps as having “restricted” or “no” artifact rendering support, and Anthropic’s own Help Center quietly scopes the more advanced artifact features (“MCP integration” and “persistent storage”) to *“Claude web and desktop”* only — mobile is not listed. There is no hidden toggle in the iOS app that fixes this; the only reliable workarounds are to view the artifact in mobile Safari (logged in to claude.ai) or to switch to the desktop browser / Claude Desktop app. ----- ## 1. The Root Cause: WebView Origin Mismatch in the `postMessage` Handshake Every Claude artifact — HTML or React — is rendered inside a cross‑origin sandbox iframe loaded from `https://www.claudeusercontent.com\`. Before that iframe will execute or display anything, it performs a `postMessage` “handshake” with the parent page to confirm that the parent is a legitimate, trusted Claude surface. The handshake code (visible in the minified bundle as `requestHandshake()` in `7905-…js`) calls `window.postMessage(..., targetOrigin)` and expects the parent’s origin to be `https://claude.ai\`. A bug report filed against Anthropic on April 1, 2026 (GitHub issue [anthropics/claude-code #42064](https://github.com/anthropics/claude-code/issues/42064), “Published artifacts show blank screen — postMessage origin mismatch (app://localhost)”) documents the exact failure pattern in detail. The console errors observed are: ``` Uncaught SyntaxError: Failed to execute 'postMessage' on 'Window': Invalid target origin 'app://localhost' in a call to 'postMessage'. at 7905-1f7e271de70b4d3c.js:1:6920 (requestHandshake) Failed to execute 'postMessage' on 'DOMWindow': The target origin provided ('https://www.claudeusercontent.com') does not match the recipient window's origin ('https://claude.ai'). ``` The critical phrase is **`app://localhost`**. That is the custom URL scheme used by Capacitor‑/Ionic‑style hybrid iOS apps when they load their bundled web assets inside a `WKWebView` (Android equivalents are `https://localhost` or `capacitor://localhost`). When the Claude iOS app loads the chat UI inside its WebView, the document origin is *not* `https://claude.ai\` — it is something like `app://localhost`. When the artifact iframe then tries to `postMessage` back to its parent using `https://claude.ai\` as the expected origin, the browser engine refuses to deliver the message because the actual parent origin doesn’t match. The handshake never completes, the iframe never receives its bootstrap payload, and the iOS app’s UI eventually surfaces the timeout fallback you are seeing. This explains every part of the symptom set: - It happens with the simplest possible artifacts (a single ` ` tag) because the failure is at the *transport / handshake* layer, before the artifact’s actual content is ever evaluated. - It happens identically for HTML and React artifacts (they share the same sandbox iframe loader). - It works in desktop browsers, because there the parent origin is the expected `https://claude.ai\`. - The error message even concedes the point: *“The code itself may still be valid and functional”* — Anthropic’s own UI is admitting it never got to run the code. The same class of issue is well documented by hybrid‑app developers more generally: Capacitor’s WKWebView serves the app from a custom scheme, and cross‑origin iframe `postMessage` calls fail with errors like *“Blocked a frame with origin ‘https://domain.com’ from accessing a frame with origin ‘capacitor://domain.com’. The frame requesting access has a protocol of ‘https’, the frame being accessed has a protocol of ‘capacitor’. Protocols must match.”* (Capacitor issue #5225). iOS’s WKWebView, since iOS 14, also enables Intelligent Tracking Prevention for third‑party iframes by default, further restricting cross‑origin iframe behavior. In short: this is an architectural mismatch between (a) Anthropic’s artifact sandbox, which was designed to be embedded only in t
View originalClaudePlaysPokemon Opus 4.7 run ongoing!
Currently streaming at: https://www.twitch.tv/claudeplayspokemon This is a passion project by David Hershey, an Anthropic employee on the Applied AI team. He started it in June 2024 to learn agent development, posted updates to an internal Slack, coworkers got hooked, went public when Sonnet 3.7 launched in Feb 2025. Anthropic doesn't own it but promotes it and subsidizes the API costs since Claude is their model. Claude is playing Pokemon Red on a Game Boy emulator, the unmodified 1996 game (with a fan-made full color patch applied so the model can see the screen better). No human input, no walkthrough access, no game knowledge fed in. The system prompt actually tells Claude to distrust its own Pokemon knowledge since the game version may differ from what it knows. It gets a screenshot, a few tools, and md notes files. That's it. The current run is on Opus 4.7, the new flagship that came out three weeks ago. 5 of 8 badges at 15,779 steps, party led by Ivy the Venusaur at Lv 62 with the rest of the team in the teens (classic overleveled-starter playthrough). For context, Opus 4.5 was at 48,000 steps and still stuck in Silph Co at the same badge count. 4.7 is pacing meaningfully faster on the same harness, which is the cleanest signal we've had on a 4.7 capability delta in agent settings. The fun part of the stream is the reasoning trace on the left side. Right now it's doing coordinate-based wall verification to figure out maze geometry: "(1,8) is red (wall), (1,9) is navigable, so (1,8) is blocked, but the y=8 tiles are all red." You can watch it think through spatial logic in real time. Quick history. Sonnet 3.5 couldn't exit the player's house. Sonnet 3.7 (Feb 2025) was the breakthrough, got three badges and went viral by getting stuck on a rock wall and spending 12+ hours in Mt. Moon. Sonnet 4 through Sonnet 4.5 made zero story progress, stalled on the Team Rocket Hideout and Erika's Gym for months. Opus 4.5 (Nov 2025) finally broke through, got all 8 badges, reached Victory Road. Opus 4.7 is now pacing to potentially beat the game. Why it matters as a benchmark. Other labs have AI Pokemon streams. Gemini 2.5 Pro beat Pokemon Blue in May 2025, GPT-5 beat the longer Pokemon Crystal in about 9,500 steps last August. Claude hasn't beaten Red yet, but partly because Hershey keeps the harness lean. Three tools (button presses, a pathfinding navigator, a knowledge base) plus a walkability overlay from RAM and a second LLM that critiques the notes file. Gemini Plays Pokemon's harness is more elaborate. The argument is Claude's run is a purer test of raw model cognition since the scaffolding does less of the work. On the stream you can type !harness in chat for the agent setup info. submitted by /u/mobcat_40 [link] [comments]
View originalReading New scientist articles is now enjoyable with gpt image
submitted by /u/Ok-Hat2331 [link] [comments]
View originalI built a geological clock that maps Earth's 4.5 billion year history onto 12 hours
eona.earth The clock runs on your local time, so whatever time you're reading this, you're looking at a specific moment in Earth's history. At 12:06 the moon forms. At 2:45 first life appears. At 11:39 the dinosaurs go extinct. Humans appear within the last 3 seconds. I used Claude Code to build the whole thing as a single HTML file (vanilla JS, Three.js for WebGL, no build step), using a custom WebGL shader to render the globe with paleogeographic continent data, procedural clouds and atmospheric haze that evolve as you move through geological time. You can also drag the scrubber handle to move through 4.5 billion years manually, and toggle layers on and off using the controls in the top-right corner. I’m a product designer with basic HTML and CSS skills, so I know my way around an interface but otherwise this is all new territory for me. I’m on the Pro plan (which I also use during the day for work stuff) so I had to be pretty conservative with my usage. I mostly stayed within the weekly limits by being intentional with my input: short sessions, working off-peak, working outside Claude where possible, keeping it in the loop with context files, etc. Opus 4.7 had just launched when I decided to do this so I let it run with the idea for the first evening, but stopped after the initial build because it was over-engineering everything and generally making things more complicated than necessary. (One example: it had the fragment shader running 4 noise passes per pixel, every frame, at 60fps, which my devices were not happy about.) I iterated on the design in Figma, then implemented mostly with Sonnet, or Opus 4.6 when it got stuck or for more complex work. The phases of the earth were definitely the most fun. I had an initial palette that I fed to Gemini (free plan on Thinking mode) to establish a system that flexed across 14 different phases of Earth’s evolution. These approximated what might have been going on at a given moment, but were also stylised enough to help illustrate the key events along the timeline. Opus 4.6 then built me an interactive palette editor (unprompted) for adjusting colours, surfaces and clouds, which was unexpected and very impressive. It also figured out how to render the post-cryogenic snowball earth using the paleogeographic continent data: a series of maps that we shape-tweened to animate the continents as they drift through deep time. Why did I build this? I find the concept of deep time helps me maintain perspective. From a geological point of view we’re insignificant, which is a good reminder not to take things too seriously when life gets heavy. It's a privileged perspective to have. I’ve been wanting to build something like this for ages and was finally able to do it. About 2 weeks of work (mostly evenings) so far. So what’s next? Keyboard navigation to jump between events (user feedback) Scrub without spinning the globe to observe continental drift (user feedback) A future earth projection covering remaining lifespan of the planet over second 12 hour period A physical build using a Waveshare round display and a Raspberry Pi 4 Sound design to give this an auditory layer An app for watch, mobile and/or desktop Your feedback is welcome and appreciated. If the interest is there, I’ll make sure to share a follow-up post as things progress. Links Live site: eona.earth Colour lab (interactive palette editor): eona.earth/colour-lab.html Source: github.com/owen-thomas/eona-earth submitted by /u/Exciting_Alps_1457 [link] [comments]
View originalTwo failure modes I caught in my AI lab in one day. Both involve the system silently lying about its own state.
I operate an autonomous lab of evolutionary trading agents. Yesterday I found two bugs that look superficially different but are actually the same class of problem. Sharing because both affect autonomous AI systems specifically and most builders don't see them coming. **Failure mode 1: circular validation.** Setup. 69 real decisions made by the system over 58 days. Standard retrospective evaluation: label each decision as correct, false alarm, or ambiguous based on what happened next. Result. 94% labelled as correct. Looked great. Why it was wrong. 64 of the 65 "correct" labels came from died=True. The agents died because of conditions like "PF below threshold", "losing streak", "hardcore protocol triggered". All of those are also triggers for the original decision. So the system was validating its own decisions using outcomes generated by the same logic that produced the decisions. This is the textbook circular validation problem applied to autonomous decision-making. Three patterns to check for in your own stack: 1. Reward functions that include the agent's own action as input. If the agent gets reward partly because it took action X, and then you measure "did action X work" by looking at reward, you've got the loop. 2. Self-reported state in evaluation. If the agent reports "I think I succeeded" and you use that as ground truth, you're not validating, you're trusting. 3. Pipelines where the model that proposes is the same model that judges. The fix is structural separation. Decisions and outcomes get written by independent components. They cannot share code, logic, or thresholds. Architecture, not statistics. **Failure mode 2: state model divergence.** Same day, different bug. I had been documenting and operating under the belief that my system was off. Closed cleanly. No services running. No crons firing. A grep through my shell config showed me wrong. A bashrc line auto-launched the system on every terminal open. The process was adopted by init, detached from the shell that started it. Invisible to ps unless you knew the exact name. Three days running, generating evolutionary cycles, sending status reports. The connection between failure modes. In both cases, my mental model of the system diverged from the system's actual state. The first divergence was inside the code: the validation logic was structurally aligned with the decision logic, so it told me what I wanted to hear. The second divergence was outside the code: my belief that the system was off came from my memory of turning off services, which is not the same as the system actually being off. Three takeaways for anyone building autonomous systems solo: 1. Validation logic and decision logic must be enforced separate at the architecture level, not at the code review level. Solo builders don't get code review. 2. System state documentation cannot be derived from intent. It has to be derived from actual measurement against the running machine. Every check, fresh. 3. The cost of these bugs scales with how autonomous your system is. A script that runs once when you press play has limited surface area for divergence. A system that operates continuously while you assume otherwise can drift for weeks before you notice. I'm rebuilding the validation layer this week with explicit separation. Decisions table writes hypotheses with explicit predicted outcomes. Outcomes table is written by an observer that reads market data directly and never imports decision logic. There's an architecture test in CI that fails if anyone imports decision-maker code from observer code. The deeper question is whether autonomous systems built solo can ever be trustworthy without external review. My current answer: yes, but only if the architecture forces the separation that a team would force socially. The harder you make it for the system to lie to you, the less it will. Happy to discuss implementation details or share specific patterns if anyone's working on similar problems. submitted by /u/piratastuertos [link] [comments]
View originalAI is moving from chatbots to real workflows. Here is what I think technical learners should focus on.
https://preview.redd.it/qfejbfsmxvyg1.png?width=1672&format=png&auto=webp&s=edf56bfbe020d0bd8d0eca785ff5479f0d9f6495 AI news is getting noisy again. New models. Coding agents. Cybersecurity benchmarks. Cloud agent platforms. Open-source AI tools. Huge infrastructure spending. But if you are learning cloud, Linux, AWS, automation, or practical AI, I think the useful question is not: "What is the best AI tool?" It is: "What skills help me use any AI tool better?" My current answer: Learn delegation, not just prompting Learn enough cybersecurity to verify AI output Learn the cloud stack around AI Use GitHub trends as a learning signal, not entertainment Build durable foundations Linux, networking, cloud, automation, debugging, security, data handling, and technical writing will still matter whether the AI hype grows or cools. Curious how others are thinking about this: if you are learning tech right now, are you focusing more on AI tools, cloud, Linux, coding, or security? submitted by /u/DearAnt812 [link] [comments]
View originalThe Scaling Bandaid is Wearing Thin (And Nobody Wants to Admit It)
Let me be direct: we’ve hit a wall with scaling, and the entire field is kind of bullshitting about what comes next. I’ve spent enough time in research circles to know this isn’t controversial, people just don’t say it publicly because there’s too much money involved. Here’s the thing. Every major lab is operating under the same assumption: if we just throw enough compute at the problem, language models will eventually think. GPT-4 → GPT-5. Claude 3 → Claude 4. Llama keeps getting bigger. And yeah, there are improvements. But they’re getting marginal as hell, and nobody seems to want to talk about the ROI anymore. We’ve spent the last three years making models that are incrementally better at pattern matching and retrieval. Revolutionary? No. Useful? Sure. A genuine step toward AGI? That’s where everyone’s lying to themselves. The real problem is that scaling rewards the wrong things. You get better at predicting the next token, so you get better at autocomplete on steroids. You don’t necessarily get better at reasoning, planning, or handling novel problems. But those improvements are way harder to measure and fund, so… we just keep scaling. Meanwhile, people are writing blog posts like “LLMs Have Achieved General Intelligence” after testing them on five cherry-picked examples. It’s embarrassing. It’s also lucrative, which is why nobody’s peer-reviewing this nonsense aggressively enough. What would actually be useful: • Research into modular architectures and compositional learning (unsexy, no massive compute requirements, hard to publish) • Better mechanistic understanding of what these models are actually doing (even harder to fund, requires careful experimental design) • Honest benchmarking instead of task-specific overfitting (kills your citations) • Actually proving that emergent abilities exist beyond statistical artifacts (lol good luck) What’s actually happening: • More parameters • Bigger training sets (increasingly scraped into legal/ethical gray zones) • Flashier demos • Funding that goes to whoever can say “AGI” the most convincingly Am I wrong? Probably not. Will anyone with skin in the game acknowledge this? Absolutely not. Too much money involved. Too many careers tied to “one more scaling paper.” I’m not saying LLMs are useless. I use them. They’re tools. Good tools. But tools aren’t sentient, and we’re treating compute-heavy pattern matchers like they’re conscious because the alternative, admitting we’ve hit a local maximum, would tank stock prices and kill the hype cycle we’re all dependent on. Five years from now, either we’ll have figured out something genuinely different (multimodal reasoning, world models, whatever), or we’ll all be very quietly accepting that the real breakthroughs require different approaches. And I’m putting money on the latter. submitted by /u/TheOnlyVibemaster [link] [comments]
View originalRepository Audit Available
Deep analysis of whylabs/whylogs — architecture, costs, security, dependencies & more
WhyLabs uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Real-time data monitoring, Anomaly detection, Data drift detection, Model performance tracking, Customizable dashboards, Alerts and notifications, Collaboration tools for teams, Integration with popular data sources.
WhyLabs is commonly used for: Monitoring machine learning model performance in production, Detecting data quality issues in real-time, Identifying and addressing model drift, Collaborating across teams for AI governance, Visualizing data trends and anomalies, Ensuring compliance with data regulations.
WhyLabs integrates with: AWS S3, Google Cloud Storage, Azure Blob Storage, Databricks, Snowflake, Kafka, Prometheus, Slack, Jira, GitHub.
WhyLabs has a public GitHub repository with 2,804 stars.
Based on user reviews and social mentions, the most common pain points are: API costs, token usage.
Based on 48 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.