Is Unsloth open source?

Unsloth has a public GitHub repository with 63,241 stars.

What is the overall sentiment around Unsloth?

Based on 12 social mentions analyzed, 8% of sentiment is positive, 92% neutral, and 0% negative.

Unsloth

mlopsfine-tuningtiered

Unsloth is an open-source, no-code web UI for training, running and exporting open models in one unified local interface.

Reviews and social mentions of Unsloth suggest that its main strength lies in its integration capabilities and user-friendly interface, which attract positive feedback. However, there are few explicit user complaints or discussions about the software, indicating a potential gap in awareness or limited critical engagement among the existing user base. The lack of detailed user opinions on pricing sentiments makes it hard to assess the financial aspect, but overall, Unsloth appears to have a neutral to positive reputation largely due to its limited high-profile mentions.

Website

Mentions (30d)

Reviews

Platforms

GitHub Stars

63,241

5,534 forks

Pain Score: 3/10015 integrations8 featuresSeed

Voices Discussing Unsloth

Ollama

Project at Ollama

3 mentions

Hugging Face

Company at Hugging Face

3 mentions

Jason Liu

Creator at Instructor (structured outputs)

2 mentions

Share:Twitter LinkedIn

Product Screenshots

AI Summary

Features & Use Cases

Features

No-code web UI for easy model training and managementSupport for running Google's Gemma 4 modelsAbility to train and run Qwen3.5 Small and Medium LLMsSupport for NVIDIA's 4B and 120B modelsMoE LLM training up to 12x faster with reduced VRAM usageLocal hardware utilization for enhanced performance and privacyCustomizable training parameters for tailored model performanceMulti-GPU support for scalable training solutions

Use Cases

Training custom AI models for specific business needsFine-tuning pre-trained models for niche applicationsRunning large language models for natural language processing tasksDeveloping AI-driven applications without extensive codingExperimenting with different model architectures locallyOptimizing model performance for resource-constrained environments

Company Intel

Industry

information technology & services

Employees

Funding Stage

Seed

Total Funding

$0.6M

Developer Ecosystem

63,241

GitHub stars

npm packages

HuggingFace models

Top Mention

reddit@retarded_77030 engagement4/26/2026

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

Following up on something I posted a few days back about fine-tuning for multi-task reasoning. Read a lot since then, and I've moved past the dense 3B vs 7B question — landing on Nemotron 3 Nano (the 30B-A3B hybrid Mamba-Attention-MoE NVIDIA released recently) instead. Architecture maps to the multi-task structure I'm trying to train better than a dense base. Problem is I've only ever read about dense transformer fine-tuning, so I don't know what the hybrid Mamba+MoE arch actually breaks in the standard LoRA recipe. Still self-taught, no formal ML background, been working with LLMs via API for about a year. First time actually fine-tuning anything end-to-end. **Why Nemotron 3 Nano specifically (in case the choice itself is the mistake):** * 23 Mamba-2 + 23 sparse MoE + 6 GQA attention layers, 128 experts per MoE layer with top-6 routing * 30B total / \~3.6B active — capacity without per-token compute blowup * Mamba-2 layers seemed like the right structural fit for state-aware reasoning across longer context * Open weights under NVIDIA Open Model License, clean for what I want to do **What I'm trying to fine-tune for (LoRA, distilling reasoning traces from a stronger teacher):** 1. Reading what's structurally happening in a situation vs. what's being stated on the surface 2. Holding multiple legitimate perspectives without collapsing to one too early 3. Surfacing the load-bearing thread when input has multiple tangled problems 4. Conditioning output on a small set of numeric input features describing context state 40-80k examples planned, generated by Sonnet 4.6 with selective Opus 4.7 on the hardest 20%. ORCA-style explanation tuning, not just I/O pairs. **Hardware:** dropping the M4 Mac plan from my last post — Nemotron 3 Nano needs more memory than 24gb unified can hold even just for weights. Renting H100 80GB on RunPod for training. \~$120 budget across 5-6 iterations. **What I'm specifically worried about (because the hybrid arch isn't covered in any standard fine-tuning tutorial I've found):** * **Router under LoRA.** Can you LoRA the MoE router weights safely, or do you freeze the router and only LoRA the expert FFNs + attention? If you freeze, does multi-task specialization still emerge or does everything pile into the same experts? * **Mamba-2 layers under low-rank adaptation.** Standard LoRA tutorials assume pure attention. Mamba-2 has selective SSM state and different projection structure — does standard LoRA on the input/output projections work cleanly, or are there gotchas (state init, recurrence stability under low-rank perturbation) that vanilla guides don't cover? * **Load-balancing loss + multi-task imbalance.** If my 4 capabilities have different example counts, does the auxiliary load-balancing loss fight task-specific gradients? Known failure modes here? * **Catastrophic forgetting on a 30B sparse base.** With LoRA adapters on the experts, does base reasoning degrade the way it does for dense fine-tunes, or does sparse routing structurally protect more of it? * **Eval granularity under expert specialization.** A single capability could quietly degrade while aggregate metrics look fine if different experts handle different tasks. What's the right held-out eval design for sparse MoE under multi-task? **Stack:** planning to use Unsloth (their Nemotron 3 Nano support shipped recently), per-capability held-out eval sets built and frozen before Batch 1, batch API + prompt caching on the teacher side to keep dataset cost in check. **Not looking for:** * "just try it and see" — first run is already going to be wrong, want to know which dimensions are most likely to surprise me * "use a smaller dense model first" — already weighed; the hybrid arch is specifically why I want this one * Generic LoRA tutorials — comfortable with the dense-transformer LoRA literature, the gap is Mamba+MoE specifics **Looking for:** * War stories from anyone who's actually fine-tuned Mamba+MoE hybrids (Nemotron, Jamba, Mixtral if relevant) and can tell me where it went sideways * Papers I might be missing on multi-task LoRA on sparse MoE specifically — most of the multi-task literature I've found assumes dense * Pitfalls around router gradients under low-rank adaptation * Whether the standard LoRA rank sweet spots (8-32) still hold, or if MoE+Mamba shifts what works Happy to write up what I find — first-time projects produce useful negative results even when they fail, and there's basically no public writeup yet on solo-developer-scale Nemotron 3 fine-tuning.

Unsloth

Compare Unsloth With