Your daily dose of AI research from AK
Papers with Code receives praise for its extensive catalog of machine learning research papers coupled with code implementations, making it a valuable resource for both learning and project development. Users appreciate the integration of code, which aids in practical understanding and application of theoretical work. However, a few users note that some papers lack comprehensive code examples or have discrepancies between reported and reproduced results. While it is generally seen as a free and indispensable tool for researchers and developers, there are mentions of resource constraints potentially limiting its expansiveness.
Mentions (30d)
42
15 this week
Reviews
0
Platforms
2
Sentiment
21%
23 positive
Papers with Code receives praise for its extensive catalog of machine learning research papers coupled with code implementations, making it a valuable resource for both learning and project development. Users appreciate the integration of code, which aids in practical understanding and application of theoretical work. However, a few users note that some papers lack comprehensive code examples or have discrepancies between reported and reproduced results. While it is generally seen as a free and indispensable tool for researchers and developers, there are mentions of resource constraints potentially limiting its expansiveness.
Features
Use Cases
Industry
research
Employees
3
5,748
GitHub followers
13
GitHub repos
2
npm packages
4
HuggingFace models
TIL you can ship a Claude Code skill inside a GitHub repo so anyone who clones it gets architectural guardrails baked in
I've been building a local AI ops platform and wanted Claude to be able to extend it without ever accidentally touching core files. So I added a .claude/skills/ directory to the repo with a plain Markdown file that gives Claude: - the architecture contract ("every feature is a worker, the core is off-limits") - a decision tree for scaffolding (what files to create, in what order) - hard rules that Claude has to surface as an explicit gap rather than paper over with a silent core edit When anyone opens the repo in Claude Code, the skill loads automatically. Ask it "create a new worker" and it follows the contract without being told any of this upfront. The interesting part: the skill is just Markdown. No Claude-specific syntax. Which means you can copy it into an AGENTS.md for Codex, or paste it into any assistant's system prompt, and it works the same way. If you're building something others will extend with AI assistance, shipping the architectural contract as a skill seems like a cleaner pattern than hoping contributors read the docs. PS: as suggest a reader, if not done automatically, include the main guidelines in the CLAUDE.md such as, when the context get very big, these directive remains effective (it happens the skill get ignored in such conditions Repo if you want to see how the skill is structured: https://github.com/ccascio/BFrost submitted by /u/EmoticonGuess [link] [comments]
View originalBackprop-free Pong: PC + distributional Hebbian plasticity vs. PPO: 57% vs. 59%, ~1500 lines from scratch [P]
Wanted to see how close a fully bio-plausible agent could get to PPO on Pong. Setup Custom Pong environment (pygame, no gym) PPO baseline: paper-faithful, from scratch Hebbian agent: PPO policy replaced with Hebbian value estimation engineered features → 61% BioAgent: Predictive Coding for feature learning + distributional Hebbian plasticity for value (Dabney et al. 2020) → 57% Zero backprop anywhere in the pipeline. Key observations The 2% gap is real but small. The bottleneck wasn't the lack of backprop because it was catastrophic forgetting under non-stationary opponent dynamics during self-play. Distributional value encoding (à la Dabney) helped stability vs. a scalar Hebbian baseline, but not enough to match PPO under self-play. Self-play exposed the plasticity–stability dilemma hard: Hebbian rules that adapt fast forget fast. This is the real wall for bio-plausible RL in non-stationary settings. Not claiming novelty in the architecture as this is a from-scratch exploration of whether bio-plausible rules can handle a real RL task. Short answer: yes, mostly, with one clear failure mode. Code: github.com/nilsleut/Biologically-Plausible-RL-Plays-Pong Happy to answer questions about the PC implementation, the Hebbian value estimator, or the self-play setup. submitted by /u/ConfusionSpiritual19 [link] [comments]
View originalI built ContextAtlas: A new take on context carry over and helps claude pick up new sessions where it left off in scope of your previous design decisions while saving your tokens avoiding rediscovery
When the "Build with Opus 4.7" hackathon was announced, I had been obsessing over the tokenomics of agents and how to make sessions go further without burning context on rediscovery work. We all have probably hit a session limit and wondered how it went so fast. I applied with that thesis, didn't get in, but I built it anyway over the last four weeks. I am proud to share that v1.0 ships today. Note up front: this is specifically a tool for development users. If you're using claude.ai web or Projects, ContextAtlas won't plug in directly. But if Claude Code is your main work flow or you utilize the Anthropic API, this tool was made for you. The pain: Claude Code learns your codebase fresh every session. "Where is OrderProcessor?" triggers a flurry of greps. "What depends on AuthMiddleware?" is another round of file reads. On a mid-sized codebase, an architectural question can burn 40+ tool calls and a lot of tokens before Claude has enough context to reason well. And the architectural rules in your ADRs and design docs? Claude has no path to those, so it confidently suggests changes that break constraints you may have documented elsewhere in your repo. What I built: ContextAtlas is an MCP server that pre-computes a curated atlas of your codebase (symbols, ADR-extracted architectural intent, git history, test coverage) and serves it to Claude Code in one call at query time in a smaller, token saving compact shape via a few lightweight mcp tools. Initial indexing happens once; querying is local and free. Example of what comes back when Claude calls get_symbol_context("OrderProcessor"): SYM OrderProcessor@src/orders/processor.ts:42 class SIG class OrderProcessor extends BaseProcessor INTENT ADR-07 hard "must be idempotent" RATIONALE "All order processing must be safely retryable." REFS 23 [billing:14 admin:9] GIT hot last=2026-03-14 TESTS src/orders/processor.test.ts (+11) Claude sees the idempotency constraint before proposing changes, not after a review catches the violation. https://i.redd.it/0ons3o28t32h1.gif Numbers: 45-72% token reduction on architectural prompts across three benchmark repos (TypeScript, Python, Go), with zero quality regression on measured axes. Full methodology and paired-t confidence intervals in the linked write-up. I wanted measurements, not vibes. Honest limits: single-judge model at v1.0 (cross-vendor panel is post-launch work). Quantitative claims bounded to three benchmark repos. Tie-bucket and trick-bucket prompts routinely show ContextAtlas net-negative; that's reported inline rather than buried. Install (two ways): In Claude Code: /index-atlas and /generate-adrs skills. No API key needed; runs under your subscription. Via CLI: uses Anthropic API for indexing. npm install -g contextatlas contextatlas init && contextatlas index # then add the MCP server entry to your Claude Code config (snippet in the README) Both produce structurally identical atlases. Supported languages at v1.0: TypeScript (tsserver), Python (Pyright), Go (gopls), Ruby (ruby-lsp). Rust, Java, and C# are next on the roadmap; the adapter interface is small enough that they're realistic community contributions. What's next: v1.1 thesis is shaping up around developer onboarding flows and quality-validation work that was deferred from v0.8. And integrating external documentation of your code base into pre-indexing workflow. Full write-up: https://www.contextatlas.io/blog/v1.0.0 Repo: https://github.com/traviswye/ContextAtlas Also launching on DevHunt today: https://devhunt.org/tool/contextatlas; votes are very appreciated if you find ContextAtlas useful or an interesting approach. Built solo, hackathon-shaped scope, not pretending it's a full blown research paper, but did attempt to treat methodology as seriously. Happy to answer anything in the comments. Star the repo if you want to follow along, file an issue if it breaks for you on your codebase, and please be honest; this only gets better with feedback from people running it on real repos. submitted by /u/Kitchen-Leg8500 [link] [comments]
View originalAnthropic just bought the company that generates most production MCP servers
Anthropic acquired Stainless on Monday for a reported $300M+. Most coverage is framing this as a developer tools acquisition. Stainless is best known for generating the official Python and Node SDKs that ship with OpenAI, Google, Meta, Cloudflare, and Anthropic. The SDK story is real. The MCP side is the part that matters here. Stainless was one of the first vendors to extend their compiler to produce MCP servers from the same OpenAPI specs that produce their SDKs. MCP hit ~97M monthly SDK downloads by December 2025 and around 10,000 production servers by early 2026. A lot of that production code was Stainless-generated. Anthropic now owns the dominant MCP server generator. What actually changed hands on Monday: The engineering team. Roughly 40-50 people including founder Alex Rattray, who previously built Stripe's patented SDK generation system. Now reporting to Katelyn Lesse in Anthropic's Platform Engineering org. The technology. The generator, the templates, the language-specific runtimes, the OpenAPI extensions Stainless invented for SDK-specific edge cases. The hosted product is winding down. New signups stopped Monday. New SDK and MCP server generations stopped Monday. Existing customers keep what they've already generated but the pipeline is closed. My read: this is closer to what Google did with Kubernetes than to a normal acquisition. Anthropic created MCP. Anthropic donated MCP to the Linux Foundation last December. Anthropic now owns the dominant implementation toolchain. The protocol is vendor-neutral on paper. The implementation toolchain isn't. Six months of Anthropic M&A starts looking less coincidental: December 2025: Bun, the JS runtime, pulled into Claude Code February 2026: Vercept, computer-use AI April 2026: Coefficient Bio, ~$400M healthcare AI May 2026: Stainless, SDK and MCP plumbing They're not buying training infrastructure or GPU clusters. They're buying the integration layers around the model. The bet seems to be that frontier models are converging faster than anyone expected, so the moat is everywhere except the model. If you're building on MCP today, tooling quality probably improves. Stainless's generator was already the cleanest in the space and the team that built it is now at Anthropic. Patterns will standardize faster as Stainless-derived templates become the de facto reference. The flip side is concentration risk. Cloudflare's MCP server framework, Pulse MCP, and the open-source generators Stainless released during the transition all become strategically important if you want any diversity in your stack. Sources: Anthropic announcement Why Anthropic actually did this, and migration math Curious whether Stainless ending up inside Anthropic reads as good news (better tooling) or concentration risk (one company owns the standard and the reference implementation) from your seat. submitted by /u/Ok-Constant6488 [link] [comments]
View originalSub-JEPA: a simple fix to LeCun group's LeWorldModel that consistently improves performance [P]
World models learn compact latent representations for planning without pixel reconstruction. LeWorldModel (LeWM), from LeCun's group at NYU, achieves stable end-to-end JEPA training by enforcing an isotropic Gaussian prior over the full latent space. The flaw: real environment dynamics live on low-dimensional manifolds, so a global high-dimensional Gaussian is an overly rigid prior — mismatched to the task geometry. LeWM itself struggles most on low-intrinsic-dimension tasks like Two-Room. Our fix (Sub-JEPA): apply the Gaussian regularization inside multiple frozen random orthogonal subspaces instead. This relaxes the global constraint while keeping the anti-collapse benefit. No new hyperparameters, same two-term objective. Sub-JEPA consistently outperforms LeWM across all four benchmarks, with up to +10.7 pp on Two-Room. We also observe straighter latent trajectories and better physical state decodability as emergent benefits.   🌐 Project: https://kaizhao.net/sub-jepa 💻 Code: https://github.com/intcomp/sub-jepa 📄 Paper: https://arxiv.org/pdf/2605.09241 submitted by /u/kai-zhao [link] [comments]
View originalReviving PapersWithCode (by Hugging Face) [P]
Hi, Niels here from the open-source team at Hugging Face. Like many others, I was a huge fan of paperswithcode. Sadly, that website is no longer maintained after its acquisition by Meta. Hence, I've been working on reviving it. I obviously use AI agents to parse papers at scale and automatically generate leaderboards (for now I'm the one verifying results). So far, I've only parsed high-impact papers for which I know they're SOTA, like Qwen 3.5 and 3.6, RF-DETR for object detection, DINOv3, SOTA embedding models from the MTEB leaderboard, the Open ASR Leaderboard for automatic speech recognition models, etc. For now, it includes the following: trending papers by default based on Github star velocity categorization by domain, e.g., OCR methods, which PwC used to have, e.g., RLVR eval results for high-impact papers, see e.g., Qwen 3.5 at the bottom leaderboards for each domain, e.g., MMTEB or COCO val 2017 support for citation counts (you can also see the most cited papers by domain!) automated linked Github, project page URLs, and artifacts (+ multiple repos are supported on a paper page) support for external papers beyond Arxiv, see e.g., DeepSeek v4 Harness reports for coding agent benchmarks, e.g., Terminal Bench "Sign in with HF" and Storage Buckets are used to store humbnails, paper PDFs, and overall data backups. I'm curious about your feedback + feature requests! Try it at paperswithcode.co https://preview.redd.it/whwji560fw1h1.png?width=3452&format=png&auto=webp&s=55bb7a30c1be58d140f7efcb07a31c6dac5693c7 See e.g. the SOTA leaderboard for Terminal Bench 2.0: https://preview.redd.it/98w9pi89fw1h1.png?width=3456&format=png&auto=webp&s=408fb64b0ba85ba24f55daa81d547d7c68e73951 A paper page looks like this: https://paperswithcode.co/paper/2602.15763 https://preview.redd.it/fiizit6dfw1h1.png?width=3450&format=png&auto=webp&s=9ea05a77ca5583a2fb395dccc95ba52c433362c5 submitted by /u/NielsRogge [link] [comments]
View originalScaling LLMs horizontally: hidden-state coupling without weight modification [R]
Residual Coupling (RC) connects frozen language models in parallel using small, learned linear bridge projections. These bridges read hidden states from one model and inject additive updates into the residual stream of another at intermediate layers. In bilateral setups, simultaneous return bridges form a feedback loop that stabilizes both streams without altering base weights. This architecture establishes a two-step paradigm where base models function as memorizers, while lightweight linear bridges handle cross-domain generalization. Constraining the bridges to purely linear maps prevents overfitting because they can only map existing geometric relationships between the frozen representation spaces. As the bridges are optimized against ground-truth target data, they have no incentive to map ungrounded features such as individual models' hallucinations. Keeping the base weights completely frozen eliminates catastrophic forgetting. The system maintains operational closure, transforming inputs through its existing structure rather than changing to accommodate them. Evaluating bilateral RC against Mixture-of-Experts (MoE) routing across the same frozen models shows these results: Medical (3-model): Reduces perplexity to 11.02, compared to 56.80 for MoE and 57.08 for the frozen baseline. This represents an 80.7% reduction. TruthfulQA Health (MC1): Improves accuracy by 9.1 percentage points over the baseline. Independent models have uncorrelated hallucinations, allowing the bridge gates to amplify consistent cross-model updates while suppressing individual errors. Coding Test: CodeGPT-small-py and GPT-2 use different tokenizers, causing a 7-million baseline perplexity on mismatched text. MoE reaches 878, but RC achieves 5.91 by reading hidden states before the output projection collapses. This framework introduces a horizontal scaling axis for multi-model systems, moving beyond vertical scaling via larger monolithic models. Latency remains bounded by the slowest single model. Specialists can be added or removed without retraining the remaining system. In some scenarios, this architecture could replace multi-turn text prompting in agentic workflows with a single parallel forward pass, allowing models and/or bridges to run on separate nodes or edge devices without a central bottleneck. By decoupling memorization from relational alignment, RC bridges provide a framework for scaling multi-model systems and offer a path toward native multi-modal integration. Paper: https://ssrn.com/abstract=6746521 Code: https://github.com/pfekin/residual-coupling/ submitted by /u/kertara [link] [comments]
View original5 Claude patterns that helped non-technical users get better results
Over the past six months I’ve been helping non-technical users get more out of Claude, while making plenty of mistakes myself. These are the patterns that consistently gave the biggest quality lift. 1. Ask Claude to plan first, then execute Instead of: Write me a sales email Try: Before writing, list the 4 things this email needs to do well. Then write it. Same model, better scaffolding. 2. Paste examples, not adjectives “Write in a friendly tone” is vague. Pasting 2–3 paragraphs you’ve written yourself and saying “match this voice” works much better. Examples teach Claude implicitly. Adjectives make it guess. 3. State what not to do Claude often defaults toward average internet/business language: “unlock”, “revolutionize”, “in today’s fast-paced world”, etc. Tell it directly: Avoid these words and phrases: [paste list] Negative instructions often improve voice more than positive ones. 4. Use Projects or persistent context If you keep re-explaining your job, company, audience, product, or codebase every time, you’re wasting the best part of Claude. Use Claude Projects, or AGENTS.md / CLAUDE.md if you use Claude Code, so every conversation starts with the right context. 5. When Claude invents things, add source material If you ask: Find me a study on X you may get hallucinated citations. If you say: Here is the paper. Based only on this source, answer X. you get a much better result. A lot of “hallucination” problems are really “no source material was provided” problems. Bonus: ask Claude to disagree with you Claude can be overly agreeable. Try: Critique this plan. What would have to be true for it to fail in six months? That single instruction often makes the answer much more useful. I also built a free AI index over the past few months using Claude Code. It includes prompts, plain-English glossary entries, beginner guides, tool comparisons, and practical workflows across writing, research, sales, marketing, HR, dev, and productivity. Posting here because I think beginners/non-technical users are probably the exact people who would benefit most from it. I'll put the links in the comments in case anyone wants to check it out. Hope it comes in handy. submitted by /u/Annual-Ad-2495 [link] [comments]
View originalIs chatGPT Plus overkill for what I need? Would Go suffice?
Hello all, I am a consultant and my current role requires me to sift through dense pages of writing, and lots of pages at that (each proposal, brief, or project will be at least 70 pages). I've been using the free chatGPT version but the lower usage limits (itll usually run a bunch of prompts while analyzing my projects or papers ive uploaded to it) is really what bugs me. At the same time, I also feel like some of the responses it gives me are not as in-depth, or is not really analyzing the text well. Would this be fixed by the "expanded memory across chats", "advanced models," "projects", and "expanded deep research" under the Plus version? For more specifics, I need it to analyze and answer a series of client questions and concerns while considering the mission/objectives of their company, and provide the most tailored response. However, most of these tailored responses, or at least the building blocks of them, should come from the papers that I am uploading into the "projects" to pull from. So, do yall think the Plus version would really help me with this task? Or would Go suffice? I also know this is a forum specifically for ChatGPT, but for my instance, does anyone think maybe Claude would be better for me? Thanks! PS: I do not need to code. My job does not require coding at all. submitted by /u/ze_best23 [link] [comments]
View originalBuilt an agentic RAG over my Obsidian vault so Claude could read engineering books I never have time for. Then I built the eval harness to check Claude wasn't lying to me.
For context, I posted on Medium a while back about burning through Claude Code's weekly limit in 3 days. The token bleed problem from that post is what kicked off this project. Short version of the workflow: Convert engineering PDFs to markdown, drop them in an Obsidian vault Cheap agent (Kimi K2.5) does BM25 retrieval over the vault Claude only sees the relevant chunks, not the whole book Token cost per question dropped from ~50k to ~5k That part worked. The new problem: the agent was sometimes confidently wrong, and I couldn't tell. Saying things like "Marcus Aurelius wrote about death in Book IX section 3" when the canonical passage was actually in Book IV section 5. Plausible enough that I wouldn't catch it unless I went and verified manually. So I built an eval harness. Most of the work ended up being on the LLM judge. I used Claude Sonnet 4.6 as the judge, deliberately a different model family from the Kimi agent so the judge isn't grading its own output. First rubric had four discrete buckets including a 0.7 "thin but not wrong." On hand-grading, my human grader (me, blind, on a different day) also collapsed everything borderline into 0.7. Judge and human were both reaching for the same wrong bucket. The agreement number looked respectable but was actually measuring shared bias. Four rubric iterations later, the version that worked collapsed the middle bucket entirely and added a 0.9 bucket for one specific case: "right answer, wrong chunk." This is when retrieval missed the canonical source but the agent answered correctly from an equivalent passage. Before that bucket, this case was either a false positive (1.0 papering over a retrieval miss) or a false negative (0.4 punishing a correct answer). The split is what fixed it. Under the new rubric, judge agreement with human on 18 rows went from 7/18 (39%) to 17/18 (94%). Caveats so I'm honest about it: 18 rows is a small sample. Adversarial slice is the next round of work. Single grader. Inter-grader reliability not established. BM25 isn't novel. I picked it because in technical and literary corpora, query/document vocabulary overlap is high enough that embeddings don't add much. I also have one negative result that surprised me: the same chunking technique that lifted one corpus by 33pp regressed another by 17pp on the same eval. The harness caught it on the first run. Wrote up why. Full writeup with the four-iteration rubric story, the calibration worksheet showing per-row shifts, and the negative-result note (GitHub repo is linked at the bottom of the post): https://medium.com/@kunalbhardwaj598/i-gave-claude-full-engineering-books-to-read-then-built-the-eval-harness-to-check-it-wasnt-lying-e9354bf6fa96 Specifically curious about: anyone else here using Claude Sonnet as their judge for their own RAG/agent setups, what rubric you landed on, and how you're handling the inter-grader reliability problem with a single human in the loop. submitted by /u/More-Hunter-3457 [link] [comments]
View originalClaude Desktop to rule them ALL! Share your Claude exploration!
For quite some time I was using all the different AIs for “vibe-coding” (actually, tbh being the Beta tester for AI 🤓🤣) and I tried them all - from Qwen CLI to ChatGPT and Gemini and all in between, what ever my hands laid on, omnivore style! and somehow I was always going around Claude, don’t ask me why. Last week I’ve learned about Claude Desktop, did my research and decided to go with it. Since day one I was impressed how easily was handling every single task I gave it as testing and experimenting the capabilities, so after a few days I’ve decided to try out a big one, which was struggling me for more than a few weeks with other AIs. Essentially, I’m playing some Basketball strategy game which doesn’t have enough statistics displayed in-game, to be exact you could see them all, but not in one place and if you’re using pen and paper, most probably you will still end up with nothing 🤦🏼♂️🤣 All of those interesting statistics are scattered around the app, hidden behind hundreds of clicks, so I’ve decided to use good old http-toolkit to get all of those API Endpoints and round them up. Finally I had all the necessary information and last night I gave all those “pure gold information” to Claude and made a pretty large prompt, explaining how should APIscraper work, how should frontend look, how parsing should be handled and every little detail I could think of I wrote (it took me more than one hour write it all down). And finally I clicked SEND! Claude proposed me a plan, for which I had one or two small corrections and after a 20 minutes or so the app was up and running live on my GitHub Actions and GitHub Pages!!! I was more than impressed and overwhelmed! I only had something like 3 iterations just to fix some aesthetics and cosmetics and that was all!!! (See the screenshots ☝🏻) I think last night was my best ever experience with AI. It was smooth and easy and it was really enjoyable watching it work, although I was scared because of my previous experiences with AI messing up everything, especially when it has to handle so many different tasks at once. It was always sooo messy that I was either drop the project or starting it all over again from scratch. If you had enough patience and stayed long enough to read all of this I would love it if you could share some of your rough project ideas or success stories! Let’s show everyone (especially newcomers like me) what Claude is truly capable of building and expand our ideas even further! 🚀 submitted by /u/No-Performer-1408 [link] [comments]
View originalAdaptive Markdown
I’ve been working on an open-source document format / viewer idea I’m calling Adaptive Markdown. The basic idea is: instead of a document being static text it's controlled by coding agents. You interact with the document more like a live workspace. This has different implications depending on what you are doing. I made a short video demo here: https://youtu.be/xf6jxf-hyP4 The thing I’m most excited about is academic / technical reading. In a few years I don’t think people will just read papers passively. I think they’ll translate passages, ask questions, generate examples, explore alternate proofs, run code, attach notes, convert math to Lean when possible, and keep all of that inside the document instead of scattered across chats and notebooks. This is trivial to do inside a browser with coding agent that has access to JS, CSS etc. Some possible use cases I’m thinking about: Any document is just a starting point! You can project it however you want. Turning articles and books into personalized learning objects lecture notes with automatically maintained structure documents with embedded code, tables, consoles, images, audio, or video Incorporate Adaptive Markdown into automated work flows eventually, things like automatically recording audio in lectures and taking a picture of a blackboard and turning it into LaTeX notes inside the document It’s very early, but the workflow already feels surprisingly useful to me. GitHub: https://github.com/SemiSimpleMath/Adaptive-Markdown Curious whether this seems useful to anyone else, or whether I’m just overexcited because I built it. So far it's only configured for Anthropic coding-agent SDK and Codex. The goal is to have this run entirely locally someday. submitted by /u/IDefendWaffles [link] [comments]
View originalFrom Marine Biology to Accidental Developer: Don’t know how to feel about it
A bit of background: I did my bachelor’s and master’s in marine biology. After a while working in the field, I started noticing a lot of inefficiencies in my day-to-day work — the endless paper sheets, the lack of centralised data, the manual everything. So one day, out of boredom (and frustration), I decided to build a management app for our lab. We work with fish, and the goal was simple: ditch the paper, get everything documented digitally, with a proper dashboard and live graphics. It’s still in the final stages of development, but it’s nearly there. Then something unexpected happened. While I was still building my own app, someone in the field reached out and offered me a job — they needed someone to build them an app, and they wanted a person with Python experience and domain knowledge in the area. Knowing I could pull it off, I applied. And I got the job. The main reason? My marine biology background. The technical skills mattered, but it was the combination — understanding the science and being able to build the tool — that sealed it. They also mentioned the potential for a long-term relationship on future products, which is exciting. Here’s where it gets weird. The client expected the project to take about a month. I finished it in 5 hours max, using Claude Code. The app is built. It’s in the bug-fixing stage now. And I’ve been deliberately slowing things down because I was moving so fast it started to look suspicious. I genuinely don’t know how to feel about this. Part of me wants to just deliver fast, own the efficiency, and use it as a competitive advantage. The other part wonders if I’m undervaluing my work by moving too quickly — or if the client will feel like they overpaid for something that “only took a few hours.” So my actual questions for this community: • How do you handle the delivery timing? Do you go fast and own it, or do you pace yourself? • And how do you price and position yourself when AI is doing a significant chunk of the heavy lifting? submitted by /u/Nithien0 [link] [comments]
View originalarXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors, such as hallucinated references or results. [N]
From Thomas G. Dietterich (arXiv moderator for cs.LG) on 𝕏 (thread): https://x.com/tdietterich/status/2055000956144935055 https://xcancel.com/tdietterich/status/2055000956144935055 "Attention arXiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. If generative AI tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s). We have recently clarified our penalties for this. If a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can't trust anything in the paper. The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. Examples of incontrovertible evidence: hallucinated references, meta-comments from the LLM ("here is a 200 word summary; would you like me to make any changes?"; "the data in this table is illustrative, fill it in with the real numbers from your experiments")." submitted by /u/Nunki08 [link] [comments]
View originalAdaptive Markdown
I’ve been working on an open-source document format / viewer idea I’m calling Adaptive Markdown. The basic idea is: instead of a document being static text it's controlled by coding agents. You interact with the document more like a live workspace. This has different implications depending on what you are doing. I made a short video demo here: https://youtu.be/H4MnFs8irm8 The thing I’m most excited about is academic / technical reading. In a few years I don’t think people will just read papers passively. I think they’ll translate passages, ask questions, generate examples, explore alternate proofs, run code, attach notes, convert math to Lean when possible, and keep all of that inside the document instead of scattered across chats and notebooks. This is trivial to do inside a browser with coding agent that has access to JS, CSS etc. Some possible use cases I’m thinking about: -Turning articles and books into personalized learning objects - lecture notes with automatically maintained structure -documents with embedded code, tables, consoles, images, audio, or video -AI-generated alt text and descriptions Incorporate Adaptive Markdown into automated work flows eventually, things like automatically recording audio in lectures and taking a picture of a blackboard and turning it into LaTeX notes inside the document It’s very early, but the workflow already feels surprisingly useful to me. GitHub: https://github.com/SemiSimpleMath/Adaptive-Markdown Curious whether this seems useful to anyone else, or whether I’m just overexcited because I built it. So far it's only configured for Anthropic coding-agent SDK, but in couple of days we will have it running on Codex as well. submitted by /u/IDefendWaffles [link] [comments]
View originalPapers with Code uses a subscription + tiered pricing model. Visit their website for current pricing details.
Key features include: Daily email updates with trending papers, Searchable database of research papers, Code implementations linked to papers, Benchmark datasets associated with research, User-friendly interface for easy navigation, Filtering options by categories and tags, Collaboration tools for researchers, Citation tracking for papers.
Papers with Code is commonly used for: Staying updated on the latest AI research, Finding code implementations for academic papers, Identifying benchmark datasets for experiments, Collaborating with peers on research projects, Conducting literature reviews efficiently, Exploring trending topics in AI research.
Papers with Code integrates with: GitHub for code repositories, Google Scholar for citation tracking, Mendeley for reference management, Slack for team notifications, Twitter for sharing trending papers, ResearchGate for academic networking, Zotero for bibliographic management, Medium for publishing summaries of papers.
Based on user reviews and social mentions, the most common pain points are: token usage, API costs.
Based on 108 social mentions analyzed, 21% of sentiment is positive, 74% neutral, and 5% negative.