Distillation — Training a smaller model to mimic a larger one. The large ‘teacher’ model generates outputs that the small ‘student’ model learns from. This is how DeepSeek R1 was distilled from 685B to 7B parameters while retaining most reasoning ability. Makes frontier AI accessible on smaller hardware.
Why It Matters
Understanding Distillation is critical for developers and decision-makers working with AI systems. As the technology evolves rapidly, knowing these fundamentals separates informed decisions from costly mistakes.
Learn More
Explore the full AI Glossary with 30+ terms explained, browse 70+ AI providers, or verify AI tool reliability with real-time trust scores for 15,000+ MCP servers.

Leave a Reply