Through multiple underlying technology breakthroughs, MiMo-V2-Flash achieves inference efficiency surpassing similar closed-source models while maintaining low cost.
Uses a 5:1 alternating structure of sliding window and global attention. KV Cache reduced by 6x, solving the quadratic complexity of long texts.
Built-in lightweight prediction module generating multiple Draft Tokens in one forward pass. No extra I/O, optimized for H200.
Multi-teacher online policy distillation. Matches teacher performance with 1/50 compute, supports "Step-by-Step" thinking mode.
Surpasses GPT-5 High and Claude 3.5 Sonnet in Coding and Agent tasks
| Benchmark | MiMo-V2-Flash | Claude Sonnet 4.5 | DeepSeek-V3.2 | GPT-5 (High) |
|---|---|---|---|---|
| SWE-Bench Verified (Coding) | 73.4% | 77.2% | 73.1% | 74.9% |
| AIME 2025 (Math) | 94.1% | 87.0% | 93.1% | 94.6% |
| GPQA-Diamond (Reasoning) | 83.7% | 83.4% | 82.4% | 85.7% |
| LongBench V2 (Context) | 60.6 | 61.8 | 58.4 | - |
Recommend using SGLang for inference to get full MTP & SWA acceleration.
# Install SGLang
pip install sglang
# Launch Server
python -m sglang.launch_server --model-path XiaomiMiMo/MiMo-V2-Flash --port 30000