DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
[2025/12] DeepSpeed Core API updates: PyTorch-style backward and low-precision master states [2025/10] SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips [2025/10] Study of ZenFlow and ZeRO offload performance with DeepSpeed CPU core binding [2025/08] ZenFlow: Stall-Free Offloading Engine for LLM Training [2025/06] Arctic Long Sequence Training (ALST) with DeepSpeed: Scalable And Efficient Training For Multi-Million Token Sequences DeepSpeed has been used to train many different large-scale models. Below is a list of several examples that we are aware of (if you’d like to include your model please submit a PR): DeepSpeed has been integrated with several different popular open-source DL frameworks such as: DeepSpeed is an integral part of Microsoft’s AI at Scale initiative to enable next-generation AI capabilities at scale. DeepSpeed welcomes your contributions! Please see our contributing guide for more details on formatting, testing, etc. This project welcomes contributions and suggestions. Most contributions require you to agree to a Developer Certificate of Origin (DCO)[https://wiki.linuxfoundation.org/dco] stating that they agree to the terms published at https://developercertificate.org for that particular contribution. DCOs are per-commit, so each commit needs to be signed off. These can be signed in the commit by adding the -s flag. DCO enforcement can also be signed off in the PR itself by clicking on the DCO enforcement check. Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, Minjia Zhang. (2024) Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training arXiv:2406.18820
Vast.ai
Real-Time GPU Pricing
Vast.ai is a GPU compute marketplace founded on one idea: whoever controls compute controls AI. We exist to make sure that power stays distributed. Christian Horne — a fellow thinker and builder who also published on LessWrong — shared Jake's view that the compute scaling thesis had profound implications, not just for AI development, but for who would control it. Both saw the same thing: if whoever controlled the most compute controlled the most powerful AI, then the future of artificial general intelligence would be determined by who had the deepest pockets, not who had the best ideas. On June 28, 2016, they incorporated Vast.ai. The founding thesis fit on a napkin: the world was full of underutilized GPU hardware — in gaming rigs, mining farms, research labs, and small data centers — and the people who needed that compute most couldn't afford the hyperscaler rates. But the motivation was never purely commercial. A world where compute flows freely to thousands of independent researchers is a fundamentally different world than one where it is locked behind the pricing walls of a few incumbents. “A world where compute flows freely to thousands of independent researchers is a fundamentally different world than one where it is locked behind the pricing walls of AWS, GCP, and Azure.” What Jake predicted. What the team built. How the field caught up. Jake Cannell publishes a series of essays on LessWrong arguing that intelligence is fundamentally a function of compute — not clever algorithms or hand-engineered modules. Christian Horne (lahwran), a fellow LessWrong contributor, shares the same conviction. The two become collaborators. AlexNet breaks ImageNet benchmarks by scaling a known neural network architecture on GPUs — exactly as the scaling hypothesis predicted. The deep learning revolution begins. Jake publishes his landmark essay arguing that the human brain is a single, general-purpose learning algorithm — not a zoo of specialized circuits. He predicts AlphaGo two years before it happens and forecasts human-level vision (~2024±3) and language via scaled deep learning. Jake Cannell and Christian Horne incorporate Vast.ai as a Delaware C Corporation. The founding thesis: the world is full of underutilized GPU hardware, and the people who need that compute most can’t afford hyperscaler rates. The market needs a two-sided platform. For two years, Jake and Christian build the marketplace platform end-to-end: host onboarding, search interface, pricing engine, Docker-based instance management — engineered to work across heterogeneous hardware and wildly different network conditions. Vast.ai launches — not with a press release, but the way honest products launch: to friends, family, and a post on Hacker News. GPU compute 3–5x cheaper than AWS, available in seconds, no enterprise contract required. Early independent hosts join the platform. The marketplace concept is validated — developers get cheaper GPUs, hosts monetize idle har
DeepSpeed
Vast.ai
DeepSpeed
Vast.ai
Pricing found: $3.75 /hr, $2.81, $9.06/hr, $0.37 /hr, $0.02
Only in DeepSpeed (1)
Only in Vast.ai (10)
DeepSpeed
Vast.ai