Overview
Paper provides clear ablations and multiple baselines showing consistent trends, but experiments stop at 14B and some datasets or code URLs are not fully released.
Citations0
Evidence Strength0.70
Confidence0.85
Risk Signals10
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 4/4
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 70%
Production readiness: 70%
Novelty: 50%
Why It Matters For Business
You can substantially improve African-language quality and document translation by continued pretraining a strong open base model with a curated data mix instead of training from scratch.
Who Should Care
Summary TLDR
This paper builds AfriqueLLM, a suite of open models continued-pretrained (CPT) on 26B tokens to adapt 5 base LLMs to 20 African languages. The core finding: what you train on matters more than model size. Mixing monolingual African text with code, math, and high-quality synthetic translations (CMS) consistently improves accuracy and reasoning. Qwen 3 bases showed the largest relative gains after CPT (up to +78.8% rel.), and CPT also improved long-context document translation (e.g., +12.4 d-chrF over an SFT baseline). Models and configs will be released on Hugging Face.
Problem Statement
Open LLMs lag on African languages because pretraining corpora lack domain coverage (math, code, curated topical content). Continued pre-training can help but often degrades reasoning or high-resource language (HRL) performance when data is imbalanced or noisy. The paper asks: which data mixes and base-model choices yield the best CPT outcomes for African languages?
Main Contribution
AfriqueLLM: CPT-adapted models for 20 African languages using a 26B-token corpus.
Systematic CPT ablation across five base models (Gemma 3, Llama 3.1, Qwen 3) and multiple data mixtures.
Key Findings
CPT data composition is the single strongest driver of gains.
Adding math and code recovers and improves reasoning degraded by raw web text.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| AfroBench overall (combined tasks) | AfriqueQwen-14B 63.79 | Qwen3-14B base 39.88 | +23.91 abs (+60.0% rel) | AfroBench-Lite (see Table 3) | Table 3: overall scores and ∆ % | Table 3 |
| AfriMGSM (math) | AfriqueQwen-14B 45.01 | Qwen3-14B base 16.6 | +28.41 abs | AfriMGSM (8-shot CoT) | Table 3 AfriMGSM column | Table 3 |
What To Try In 7 Days
Run a short CPT pass on your base model using a CMS mix: monolingual African text + ~1B tokens each of code and math + filtered synthetic translations.
Limit high-resource languages per UniMax-like sampling (≈1B tokens) to avoid domination by English/French.
Use a 16k context window if you need document-level capabilities and test with d-chrF or SSA-COMET on representative docs.
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Covers 20 African languages; many languages remain unsupported.
Model sizes limited to ≤14B; dynamics may change at 30B+.
When Not To Use
If your target language is not in the 20 covered languages (limited transfer to unseen languages).
When instruction-following behavior is required immediately—these are base CPT checkpoints, not instruction-tuned models.
Failure Modes
Catastrophic forgetting on high-resource languages if HRLs are excluded or uncapped.
Quality-sensitive: noisy parallel corpora can harm larger models (12B+).

