FAQ

Frequently Asked Questions

What models does Smelt support?

Any open-source LLM on Hugging Face — dense and Mixture-of-Experts architectures are both supported.

What is MoE expert surgery?

MoE models contain hundreds of specialist sub-networks, but only a fraction activate per token. Smelt removes the least-used experts to dramatically reduce size while preserving quality.

Do I need a GPU?

No. Quantization and pruning run on CPU. Cloud jobs run on Smelt infrastructure — no GPU required on your end.

How much quality is lost?

Typical results retain 85-95% of original benchmark scores. Smelt generates a quality report after every compression.

What runtimes are compatible?

Smelt outputs standard GGUF files — compatible with Ollama, llama.cpp, LM Studio, vLLM, and any GGUF runtime.

What happens to my models?

Cloud jobs process and deliver. We do not retain, train on, or redistribute your models. Local CLI usage never leaves your machine.

How is Smelt different from llama.cpp?

llama.cpp handles quantization only. Smelt combines quantization with MoE expert surgery, pruning, and an AI agent that picks the optimal compression recipe.