Last updated: April 9, 2026
Smelt is a Python CLI and cloud platform that takes any open-source large language model — dense or Mixture-of-Experts, 7 billion to 1 trillion parameters — and produces optimized smaller versions that fit your target hardware. You specify your constraints (available RAM, target task, quality threshold) and Smelt handles the rest: quantization to any GGUF format, MoE expert surgery to prune inactive experts, depth and width pruning, and an AI route selector that picks the optimal compression recipe for your model and hardware. The independent LLM compression market has been gutted by acquisitions (Neural Magic, OctoAI, Deci AI, Predibase — all absorbed), MoE models now power over 60% of frontier architectures with zero production compression tools available, and 50%+ of production deployments still run uncompressed because the tooling is too fragmented. Smelt closes that gap with a single command: smelt run.
Any open-source LLM on Hugging Face — Llama, Qwen, DeepSeek, Mistral, Gemma, Command R, DBRX, Mixtral, and more. Dense and Mixture-of-Experts architectures are both fully supported.
Smelt outputs standard GGUF files compatible with Ollama, llama.cpp, LM Studio, vLLM, and any GGUF-compatible runtime. No lock-in, no proprietary formats.