Overview
Production Readiness
0.4
Novelty Score
0.5
Cost Impact Score
0.6
Citation Count
1
Why It Matters For Business
EthioLLM and EthioBenchmark make practical NLP for major Ethiopian languages possible with open models and data, lowering development time for local products like moderation, news categorization, and information extraction.
Summary TLDR
This paper releases EthioLLM, a family of encoder-only and encoder-decoder LLMs (small/base/large) trained to support five Ethiopian languages (Amharic, Ge'ez, Afaan Oromo, Somali, Tigrinya) plus English, and EthioBenchmark — merged datasets for news, MT, hate speech, sentiment, NER and POS. Models were trained from XLM-R and mT5 checkpoints, with focused cleaning and long training steps. On classification, NER and hate-speech tasks EthioLLM models are competitive with or exceed Afro-centric baselines. Machine translation lags behind larger SOTA MT models. All models, tokenizers and benchmark data are open-sourced.
Problem Statement
Ethiopian languages are underrepresented in large language models and benchmarks. There are many spoken languages but few pre-trained models and cross-language datasets. This gap blocks practical NLP for Ethiopian languages and slows local research and products.
Main Contribution
Released EthioLLM models (small/base/large) covering five Ethiopian languages and English.
Built EthioBenchmark by merging existing datasets into task-specific collections: EthioNEWS, EthioMT, EthioHate, EthioSenti, EthioNER, EthioPOS.
Evaluated models across news classification, MT, hate speech, sentiment, NER, and POS and reported baselines.
Open-sourced models, tokenizers, training corpus and benchmark datasets on GitHub/HuggingFace.
Key Findings
EthioLLM-large achieves competitive or better results on news classification for Amharic.
EthioLLM-large outperforms prior models on Amharic NER.
Hate speech detection improved noticeably with EthioLLM-large.
Part-of-speech tagging for Amharic is strong with the large model.
Machine translation quality is lower than large SOTA MT models.
Zero-shot transfer to Ge'ez shows promising results despite tiny Ge'ez corpus.
Results
weighted F1 (news, MasakhaNEWS, amh)
weighted F1 (NER, amh)
weighted F1 (hate speech, amh)
weighted F1 (hate speech, orm)
weighted F1 (POS, amh)
weighted F1 (sentiment, tir)
sacreBLEU (eng->amh)
weighted F1 (NER zero-shot, gez from amh)
Who Should Care
What To Try In 7 Days
Install EthioLLM-small from the repo and test Amharic news classification on your data.
Run EthioLLM-large for Amharic NER to see entity extraction improvements vs your current pipeline.
Evaluate EthioLLM-large for Amharic hate-speech detection in a moderation pilot and compare F1 to your baseline.
Optimization Features
Token Efficiency
- Accuracy
Model Optimization
- vocabulary tuning (70K and 250K tokenizers)
- model variants: small/base/large sizes
Training Optimization
- language-adaptive fine-tuning (LAFT)
- long training runs (up to 1M steps for seq2seq)
Reproducibility
Code Urls
Data Urls
- https://github.com/EthioNLP/EthioLLM
- EthioNLP HuggingFace repository (paper)
Code Available
Data Available
Open Source Status
- yes
Risks & Boundaries
Limitations
- Covers only 5 of >85 Ethiopian languages due to scarce corpora.
- Ge'ez training data is tiny (≈1M tokens), limiting model strength for that script.
- Machine translation quality is clearly below large SOTA MT models.
- Benchmark combines existing datasets; heterogeneity and source overlap may bias results despite cleaning.
When Not To Use
- As a production-grade MT system where high BLEU is required.
- For languages not included in the five-language set.
- When legal/auditable model provenance is required and full license details are needed.
Failure Modes
- Poor MT quality compared to NLLB/M2M100 for many language pairs (large BLEU gaps).
- Lower performance on languages with little training data (Ge'ez); zero-shot may still fail on domain shifts.
- Potential dataset leakage risks if downstream data overlapped with model pretraining—authors state they verified but caution remains.
Core Entities
Models
- EthioLLM-small
- EthioLLM-base
- EthioLLM-large
- EthioMT5-small
- XLM-R
- AfroXLMR
- AfroLM
- AfriTeVa
- AfriMT5
- M2M100
- NLLB
Metrics
- weighted F1
- sacreBLEU
Datasets
- EthioBenchmark
- EthioNEWS
- EthioMT
- EthioHate
- EthioSenti
- EthioNER
- EthioPOS
- MasakhaNEWS
- MasakhaNER
- AfriSenti
- Flores-200
- HornMT
Benchmarks
- MasakhaNEWS
- MasakhaNER
- AfriSenti
- EthioBenchmark

