深度求索(DeepSeek),成立于2023年,专注于研究世界领先的通用人工智能底层模型与技术,挑战人工智能前沿性难题。基于自研训练框架、自建智算集群和万卡算力等资源,深度求索团队仅用半年时间便已发布并开源多个百亿级参数大模型,如DeepSeek-LLM通用大语言模型、DeepSeek-Coder代
DeepSeek Coder appears to be praised for its robust AI coding capabilities, though details on its specific strengths are sparse from the available social mentions. There are no significant complaints directly associated with DeepSeek Coder, though unrelated AI limit and performance issues suggest a broader context of user frustrations with AI tools. The pricing sentiment is unclear, as there are no explicit references to costs in the mentions. Overall, while DeepSeek Coder has some visibility, it lacks detailed feedback on its reputation compared to other more frequently discussed AI tools.
Mentions (30d)
1
1 this week
Reviews
0
Platforms
2
GitHub Stars
22,960
2,747 forks
DeepSeek Coder appears to be praised for its robust AI coding capabilities, though details on its specific strengths are sparse from the available social mentions. There are no significant complaints directly associated with DeepSeek Coder, though unrelated AI limit and performance issues suggest a broader context of user frustrations with AI tools. The pricing sentiment is unclear, as there are no explicit references to costs in the mentions. Overall, while DeepSeek Coder has some visibility, it lacks detailed feedback on its reputation compared to other more frequently discussed AI tools.
Features
Use Cases
Industry
information technology & services
Employees
170
87,547
GitHub followers
32
GitHub repos
22,960
GitHub stars
20
npm packages
40
HuggingFace models
[R] Which LLMs are actually best for bleeding-edge Linux/ML debugging workflows in 2026? [R]
I’m trying to optimize an AI workflow for bleeding-edge Linux/ML debugging (Arch/CachyOS, CUDA, Python, unsloth, etc.). Current stack: - Claude = deep reasoning/mastermind - Gemini 3.1 Pro = execution/logistics - Perplexity = retrieval Main problem: Gemini often gives high-friction or impractical fixes and degrades badly in long troubleshooting sessions. Example: suggested a long Podman workflow for an unsloth/Python issue where micromamba solved it much faster. I also have access to hosted open models: - Qwen 3 Coder 30B - Qwen 3.5 122B - Mistral Large 675B - DeepSeek R1 Distill 70B etc. Question: For people doing real-world Linux/ML/debugging workflows (not benchmarks), what currently works best as the “execution/logistics” model with strong web/recent-ecosystem awareness? I care more about: - practical fixes - low friction - stable long sessions - debugging quality than benchmark scores. submitted by /u/minaco5mko [link] [comments]
View originalClaude Limit Extender
Ok so I know people are complaining about the limit reductions. These aren't going away, no matter who unsubscribes or complains. The influx of consumer subs after the GPT exodus killed their compute capacity. They have to keep things running for the enterprise and API-only customers. Mythos is live. They don't make money off of subs. They most likely over-quantized Opus recently to save on compute as well. Here's what I do to conserve usage (I'm only on a pro account and i never run out): The biggest thing is use other models to build out the bulk of the codebase. Openrouter is great. You have access to not only Claude API but also GPT and Grok and many many others. You can run other models through Claude Code's official harness on VSCode, Antigravity, etc. it just takes a couple of changes to your settings.json in .claude/ I use Chinese models to take care of most of it. Deepseek is pretty much the gold standard in terms of quality and uptime. Minimax 2.7, Kimi K2.5, GLM-5 (4.7 is fast and pretty capable as well), Qwen 3.6, Kat Coder Pro. You can use their API, or through openrouter. If you use OpenCode you don't even have to edit settings.json you just add keys (including Openrouter, Anthropic, OpenAI, etc). Openrouter is pretty no frills so in order to boost up agents and mcp and hooks you have to read docs but you have to read docs for anything nowadays. Furthermore, Deepseek, Qwen, Kimi, Minimax, GLM all have free chat interfaces on their websites with access to their bigger models. You just can't do agentic work. Kimi has some basic agentic but it's not what you want for beefy stuff. Mistral and Llama... They are fine but I do not recommend them over Chinese models. Claude is your finisher. I actually stopped using Opus, and stick with Sonnet for 90% of my ending pass. You can also take your codebase and stick it into Claude Projects. It can take in a ton of files and uses RAG. Claude desktop with Filesystem also works well. You do lose access to agents. If you need agents, Claude Code in VSCode harness, run whatever model you need. If you add $10 to your openrouter account you get 1000 daily requests to free models as well and there are a few really spicy free models. Just know uptime is a concern on those. You will get prioritized last and potentially just kicked out. Paid models remain the same on priority. Chinese models are CHEAP, guys. Like pennies per project., Deepseek 3.2/Speciale with reasoning and agents will chew up tokens but even then you're still looking at sub-dollar projects. It's slower than Opus but it's not terrible. Most models nowadays are more than capable. Use Claude as the finisher to sand the edges and get those kinks (if any) worked out. I also run multiple instances of different models like Deepseek, Qwen, Minimax, and GLM for the same spec sheet and see what things look like at the end and compare. This is something *I* do. It's intensive but I like seeing how they make decisions differently. You get really cool approaches from one model that the others might miss. Your limits aren't coming back, at least not anytime soon. Adapt or remain Old Man Yells At Cloud. Openrouter even has very-recent-but-older models. It has Claude and GPT (like Opus 4.5 and pretty much every freaking GPT including some Codex). Grok 4.20 has a 2m token window. There are options. If you only want to use subscription Claude... your limits are gone. One note about Chinese models... if you're worried about safety (ie you don't want Chinese servers looking at your info or your employer won't allow it...) go with other American models on Openrouter. Llama and Mistral (French) are light work alternatives. Change your keys regularly (even daily, like I do). Do with this what you will. submitted by /u/zeezytopp [link] [comments]
View originalI run 3 experiments to test whether AI can learn and become "world class" at something
I will write this by hand because I am tried of using AI for everything and bc reddit rules TL,DR: Can AI somehow learn like a human to produce "world-class" outputs for specific domains? I spent about $5 and 100s of LLM calls. I tested 3 domains w following observations / conclusions: A) code debugging: AI are already world-class at debugging and trying to guide them results in worse performance. Dead end B) Landing page copy: routing strategy depending on visitor type won over one-size-fits-all prompting strategy. Promising results C) UI design: Producing "world-class" UI design seems required defining a design system first, it seems like can't be one-shotted. One shotting designs defaults to generic "tailwindy" UI because that is the design system the model knows. Might work but needs more testing with design system I have spent the last days running some experiments more or less compulsively and curiosity driven. The question I was asking myself first is: can AI learn to be a "world-class" somewhat like a human would? Gathering knowledge, processing, producing, analyzing, removing what is wrong, learning from experience etc. But compressed in hours (aka "I know Kung Fu"). To be clear I am talking about context engineering, not finetuning (I dont have the resources or the patience for that) I will mention world-class a handful of times. You can replace it be "expert" or "master" if that seems confusing. Ultimately, the ability of generating "world-class" output. I was asking myself that because I figure AI output out of the box kinda sucks at some tasks, for example, writing landing copy. I started talking with claude, and I designed and run experiments in 3 domains, one by one: code debugging, landing copy writing, UI design I relied on different models available in OpenRouter: Gemini Flash 2.0, DeepSeek R1, Qwen3 Coder, Claude Sonnet 4.5 I am not going to describe the experiments in detail because everyone would go to sleep, I will summarize and then provide my observations EXPERIMENT 1: CODE DEBUGGING I picked debugging because of zero downtime for testing. The result is either wrong or right and can be checked programmatically in seconds so I can perform many tests and iterations quickly. I started with the assumption that a prewritten knowledge base (KB) could improve debugging. I asked claude (opus 4.6) to design 8 realistic tests of different complexity then I run: bare model (zero shot, no instructions, "fix the bug"): 92% KB only: 85% KB + Multi-agent pipeline (diagnoser - critic -resolver: 93% What this shows is kinda suprising to me: context engineering (or, to be more precise, the context engineering in these experiments) at best it is a waste of tokens. And at worst it lowers output quality. Current models, not even SOTA like Opus 4.6 but current low-budget best models like gemini flash or qwen3 coder, are already world-class at debugging. And giving them context engineered to "behave as an expert", basically giving them instructions on how to debug, harms the result. This effect is stronger the smarter the model is. What this suggests? That if a model is already an expert at something, a human expert trying to nudge the model based on their opinionated experience might hurt more than it helps (plus consuming more tokens). And funny (or scary) enough a domain agnostic person might be getting better results than an expert because they are letting the model act without biasing it. This might be true as long as the model has the world-class expertise encoded in the weights. So if this is the case, you are likely better off if you don't tell the model how to do things. If this trend continues, if AI continues getting better at everything, we might reach a point where human expertise might be irrelevant or a liability. I am not saying I want that or don't want that. I just say this is a possibility. EXPERIMENT 2: LANDING COPY Here, since I can't and dont have the resources to run actual A/B testing experiments with a real audience, what I did was: Scrape documented landing copy conversion cases with real numbers: Moz, Crazy Egg, GoHenry, Smart Insights, Sunshine.co.uk, Course Hero Deconstructed the product or target of the page into a raw and plain description (no copy no sales) As claude oppus 4.6 to build a judge that scores the outputs in different dimensions Then I run landing copy geneation pipelines with different patterns (raw zero shot, question first, mechanism first...). I'll spare the details, ask if you really need to know. I'll jump into the observations: Context engineering helps writing landing copy of higher quality but it is not linear. The domain is not as deterministic as debugging (it fails or it breaks). It is much more depending on the context. Or one may say that in debugging all the context is self-contained in the problem itself whereas in landing writing you have to provide it. No single config won across all products. Instead, the
View originalIs there something I can do about my prompts? [Long read, I’m sorry]
Hello everyone, this will be a bit of a long read, i have a lot of context to provide so i can paint the full picture of what I’m asking, but i’ll be as concise as possible. i want to start this off by saying that I’m not an AI coder or engineer, or technician, whatever you call yourselves, point is I’m don’t use AI for work or coding or pretty much anything I’ve seen in the couple of subreddits I’ve been scrolling through so far today. Idk anything about LLMs or any of the other technical terms and jargon that i seen get thrown around a lot, but i feel like i could get insight from asking you all about this. So i use DeepSeek primarily, and i use all the other apps (ChatGPT, Gemini, Grok, CoPilot, Claude, Perplexity) for prompt enhancement, and just to see what other results i could get for my prompts. Okay so pretty much the rest here is the extensive context part until i get to my question. So i have this Marvel OC superhero i created. It’s all just 3 documents (i have all 3 saved as both a .pdf and a .txt file). A Profile Doc (about 56 KB-gives names, powers, weaknesses, teams and more), A Comics Doc (about 130 KB-details his 21 comics that I’ve written for him with info like their plots as well as main cover and variant cover concepts. 18 issue series, and 3 separate “one-shot” comics), and a Timeline Document (about 20 KB-Timline starting from the time his powers awakens, establishes the release year of his comics and what other comic runs he’s in [like Avengers, X-Men, other character solo series he appears in], and it maps out information like when his powers develop, when he meets this person, join this team, etc.). Everything in all 3 docs are perfect laid out. Literally everything is organized and numbered or bulleted in some way, so it’s all easy to read. It’s not like these are big run on sentences just slapped together. So i use these 3 documents for 2 prompts. Well, i say 2 but…let me explain. There are 2, but they’re more like, the foundation to a series of prompts. So the first prompt, the whole reason i even made this hero in the first place mind you, is that i upload the 3 docs, and i ask “How would the events of Avengers Vol. 5 #1-3 or Uncanny X-Men #450 play out with this person in the story?” For a little further clarity, the timeline lists issues, some individually and some grouped together, so I’m not literally asking “_ comic or _ comic”, anyways that starting question is the main question, the overarching task if you will. The prompt breaks down into 3 sections. The first section is an intro basically. It’s a 15-30 sentence long breakdown of my hero at the start of the story, “as of the opening page of x” as i put it. It goes over his age, powers, teams, relationships, stage of development, and a couple other things. The point of doing this is so the AI basically states the corrects facts to itself initially, and not mess things up during the second section. For Section 2, i send the AI’s a summary that I’ve written of the comics. It’s to repeat that verbatim, then give me the integration. Section 3 is kind of a recap. It’s just a breakdown of the differences between the 616 (Main Marvel continuity for those who don’t know) story and the integration. It also goes over how the events of the story affects his relationships. Now for the “foundations” part. So, the way the hero’s story is set up, his first 18 issues happen, and after those is when he joins other teams and is in other people comics. So basically, the first of these prompts starts with the first X-Men issue he joins in 2003, then i have a list of these that go though the timeline. It’s the same prompt, just different comic names and plot details, so I’m feeding the AIs these prompts back to back. Now the problem I’m having is really only in Section 1. It’ll get things wrong like his age, what powers he has at different points, what teams is he on. Stuff like that, when it all it has to do is read the timeline doc up the given comic, because everything needed for Section 1 is provided in that one document. Now the second prompt is the bigger one. So i still use the 3 docs, but here’s a differentiator. For this prompt, i use a different Comics Doc. It has all the same info, but also adds a lot more. So i created this fictional backstory about how and why Marvel created the character and a whole bunch of release logistics because i have it set up to where Issue #1 releases as a surprise release. And to be consistent (idek if this info is important or not), this version of the Comics Doc comes out to about 163 KB vs the originals 130. So im asking the AIs “What would it be like if on Saturday, June 1st, 2001 [Comic Name Here] Vol. 1 #1 was released as a real 616 comic?” And it goes through a whopping 6 sections. Section 1 is a reception of the issue and seasonal and cultural context breakdown, Section 2 goes over the comic plot page by page and give real time fan reactions as they’re reading it for the first time. Se
View originalHawkeye - open-source flight recorder & guardrails for AI agents, with drift detection and mobile monitoring
I built with Caode-code an observability tool for AI coding agents that runs 100% locally. The problem: AI agents (Claude Code, Aider, AutoGPT...) can silently drift off-task, burn tokens, or touch sensitive files. You only notice when it's too late. Hawkeye records every action and evaluates drift in real-time: - Heuristic scorer (zero-cost, always on) — detects dangerous commands, suspicious paths, error loops, token burn without progress - LLM scorer (optional) — uses your local Ollama model (llama3.2, mistral, deepseek-coder, phi3...) to check if actions match the objective. No data leaves your machine - Guardrails — file protection, command blocking, cost limits, directory scoping, network restrictions - Auto-pause when drift goes critical - Web dashboard, session replay, MCP server for agent self-awareness One thing I'm stuck on: the cost/token tracking is unreliable. When agents like Claude Code don't expose token counts in their hooks, I'm left estimating from input/output text length. Anyone dealt with this? How do you track actual token usage across different agents/providers? No cloud dependency. SQLite storage. Everything stays local. npm install -g hawkeye-ai Npm: https://www.npmjs.com/package/hawkeye-ai?activeTab=readme GitHub: github.com/MLaminekane/hawkeye submitted by /u/Ok-Idea9032 [link] [comments]
View original🚨BREAKING: GPT-5.4 has the worst score on the SM-Bench among OpenAI’s models, ranking ahead only of GPT-5.2
submitted by /u/cloudinasty [link] [comments]
View originalRepository Audit Available
Deep analysis of deepseek-ai/DeepSeek-Coder — architecture, costs, security, dependencies & more
Key features include: Supports multiple programming languages, Advanced code completion, Context-aware suggestions, Integration with popular IDEs, Customizable coding styles, Real-time collaboration tools, Built-in debugging assistance, API access for integration with other tools.
DeepSeek Coder is commonly used for: Automating repetitive coding tasks, Enhancing productivity for software developers, Assisting in code reviews, Generating boilerplate code, Facilitating learning for new programmers, Improving code quality through suggestions.
DeepSeek Coder integrates with: Visual Studio Code, JetBrains IDEs (e.g., IntelliJ IDEA, PyCharm), GitHub, GitLab, Bitbucket, Slack, JIRA, Trello, CircleCI, Docker.
DeepSeek Coder has a public GitHub repository with 22,960 stars.
Based on user reviews and social mentions, the most common pain points are: token usage.
Based on 11 social mentions analyzed, 9% of sentiment is positive, 73% neutral, and 18% negative.