Ling-mini-2.0

Ant Group / InclusionAI

MIT MoE sigmoid-routing aux-loss-free MTP

Type

MoE

Total params

16.0B

Active params

1.4B

Sparsity

1/32

Context

32,768

Train tokens

20.0T

Benchmarks

Benchmark	Category	Measured	Claimed	Setup
GPQA Diamond	reasoning	37.88	—	0-shot
GSM8K	math	80.89	—	5-shot
MMLU-Pro	knowledge	53.34	—	5-shot
HumanEval+	code	72.56	—	0-shot
AIME 2024	math	16.70	—	0-shot

First independent benchmark of this model. Custom bailing_moe architecture requires transformers==4.57.0 and vllm==0.10.0 with bailing_moe_v2 patch. Claims to match 7-8B dense performance with only 1.4B active params.