Ling-mini-2.0
Ant Group / InclusionAI
MIT
MoE
sigmoid-routing
aux-loss-free
MTP
Benchmarks
| Benchmark |
Category |
Measured |
Claimed |
Setup |
| GPQA Diamond |
reasoning |
37.88 |
— |
0-shot |
| GSM8K |
math |
80.89 |
— |
5-shot |
| MMLU-Pro |
knowledge |
53.34 |
— |
5-shot |
| HumanEval+ |
code |
72.56 |
— |
0-shot |
| AIME 2024 |
math |
16.70 |
— |
0-shot |
First independent benchmark of this model. Custom bailing_moe architecture requires transformers==4.57.0 and vllm==0.10.0 with bailing_moe_v2 patch. Claims to match 7-8B dense performance with only 1.4B active params.