ML/AI research engineer. Ex stats professor.
Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)
While waiting for DeepSeek V4 we got two very strong open-weight LLMs from India yesterday.
There are two size flavors, Sarvam 30B and Sarvam 105B model (both reasoning models).
Interestingly, the s
The user shares news about two new open-weight LLMs from India, highlighting their different attention mechanisms.
sebastian raschka15ai9llms6llm620264architecture4openclaw3research3distillation3attention2qwen32mini2layout2table2scroll2gqa2mla2kv cache2gemini2reasoning2open-weight models2gratitude2qwen3.52sequel2moe2
Recent Posts
news
LLMs 2026: Expert Insights with Sebastian Raschka on AI ...
llmsai insights
news
LLM researcher Sebastian Raschka: OpenClaw is a ...
ai safetyautonomous ai
web
Sebastian Raschka (@rasbt) / Posts / X
ml researchai
web
Sebastian Raschka
deep learningtraining
web
Sebastian Raschka rasbt
ai researchllms
web
Sebastian Raschka
llmsresearch
web
Sebastian Raschka, LLM Research Engineer | Sebastian ...
ai articlesresources
web
Sebastian Raschka, PhD - ML/AI research engineer. ...
ai experienceresearch
web
Sebastian Raschka, PhD
llm expertiseai
news
He says he wouldn't install the autonomous AI assistant on ...
book releasemachine learning
news
Sebastian Raschka, LLM Research Engineer | Sebastian ...
personal websiteai resources
news
Here's my conversation all about AI in 2026, ...
ai communitymachine learning
substack
A Visual Guide to Attention Variants in Modern LLMs
3/22/2026attentionllms
twitter
@karpathy Was just reading your program.md (aka SKILLS.md) and was surprised the agent doesn't just brute-force it by making the architecture bigger.
Really captures the spirit of a researcher here.
58 3/9/2026researcharchitecture
twitter
@karpathy This is great. The era of graduate student descent is over. Grad students can focus on the actual science again (versus babysitting runs)!
94 3/8/2026graduate studentsscience
twitter
This is for illustration purposes, so I am only focused on math tasks. E.g., consider the MATH dataset with 12,500 math problems. If the 12,000 samples that are not in MATH-500 (which is the test set)
3/8/2026math tasksdataset
twitter
@BobbyKuzma @_xpn_ wow, that's cool to hear! 😊
1 3/8/2026appreciationexcitement
twitter
Hope you are enjoying it! Re distillation, Chapter 7 on supervised SFT is essentially DeepSeek-style distillation.
Coincidentally, I am also currently wrapping up the distillation chapter for the seq
71 3/8/2026distillationsupervised sft
twitter
@sriramk @steipete Interesting! Is (1) the Mini using a model on the Spark via API call or (2) are you running two separate agents?
If (1) why not running openclaw on the Spark directly?
34 3/7/2026technologyai
twitter
@steipete @openclaw This is the true replacement of SWE-Bench Verified & Tau2 😆