EthioLLM: open multilingual LLMs and a new EthioBenchmark for five Ethiopian languages plus English

March 20, 20247 min

Overview

Decision SnapshotNeeds Validation

Models and datasets are open-sourced and show competitive results on many classification tasks, but language coverage is limited to five languages and MT quality lags SOTA, so treat them as strong research and pilot assets rather than turnkey production systems.

Citations1

Evidence Strength0.80

Confidence0.85

Risk Signals10

Trust Signals

Findings with numeric evidence: 6/6

Findings with evidence refs: 6/6

Results with explicit delta: 6/8

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 60%

Production readiness: 40%

Novelty: 50%

Authors

Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew Ali Ayele, Ebrahim Chekol Jibril, Michael Melese Woldeyohannis, Olga Kolesnikova, Philipp Slusallek, Dietrich Klakow, Shengwu Xiong, Seid Muhie Yimam

Links

Abstract / PDF / Code / Data

Why It Matters For Business

EthioLLM and EthioBenchmark make practical NLP for major Ethiopian languages possible with open models and data, lowering development time for local products like moderation, news categorization, and information extraction.

Who Should Care

Summary TLDR

This paper releases EthioLLM, a family of encoder-only and encoder-decoder LLMs (small/base/large) trained to support five Ethiopian languages (Amharic, Ge'ez, Afaan Oromo, Somali, Tigrinya) plus English, and EthioBenchmark — merged datasets for news, MT, hate speech, sentiment, NER and POS. Models were trained from XLM-R and mT5 checkpoints, with focused cleaning and long training steps. On classification, NER and hate-speech tasks EthioLLM models are competitive with or exceed Afro-centric baselines. Machine translation lags behind larger SOTA MT models. All models, tokenizers and benchmark data are open-sourced.

Problem Statement

Ethiopian languages are underrepresented in large language models and benchmarks. There are many spoken languages but few pre-trained models and cross-language datasets. This gap blocks practical NLP for Ethiopian languages and slows local research and products.

Main Contribution

Released EthioLLM models (small/base/large) covering five Ethiopian languages and English.

Built EthioBenchmark by merging existing datasets into task-specific collections: EthioNEWS, EthioMT, EthioHate, EthioSenti, EthioNER, EthioPOS.

Key Findings

EthioLLM-large achieves competitive or better results on news classification for Amharic.

NumbersMasakhaNEWS Amharic weighted F1: EthioLLM-large 94.18 vs XLM-R 93.1

Practical UseUse EthioLLM-large as a drop-in model for Amharic news classification when you need a compact local model competitive with general multilingual LMs.

Evidence RefTable 3

EthioLLM-large outperforms prior models on Amharic NER.

NumbersMasakhaNER Amharic F1: EthioLLM-large 79.42 vs AfroXLMR 78.0

Practical UsePrefer EthioLLM-large for Amharic NER tasks where named-entity extraction quality matters.

Evidence RefTable 8

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
weighted F1 (news, MasakhaNEWS, amh)94.18 (EthioLLM-large)94.4 (AfroXLMR-large)-0.22MasakhaNEWSTable 3 - EthioLLM-large 94.18; AfroXLMR-l 94.4Table 3
weighted F1 (NER, amh)79.42 (EthioLLM-large)78.0 (AfroXLMR-large)+1.42MasakhaNERTable 8 - EthioLLM-large 79.42; AfroXLMR-l 78Table 8

What To Try In 7 Days

Install EthioLLM-small from the repo and test Amharic news classification on your data.

Run EthioLLM-large for Amharic NER to see entity extraction improvements vs your current pipeline.

Evaluate EthioLLM-large for Amharic hate-speech detection in a moderation pilot and compare F1 to your baseline.

Optimization Features

Token Efficiency
Accuracy
Model Optimization
vocabulary tuning (70K and 250K tokenizers)model variants: small/base/large sizes
Training Optimization
language-adaptive fine-tuning (LAFT)long training runs (up to 1M steps for seq2seq)

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusYes
LicenseUnknown

Data URLs

https://github.com/EthioNLP/EthioLLMEthioNLP HuggingFace repository (paper)

Risks & Boundaries

Limitations

Covers only 5 of >85 Ethiopian languages due to scarce corpora.

Ge'ez training data is tiny (≈1M tokens), limiting model strength for that script.

When Not To Use

As a production-grade MT system where high BLEU is required.

For languages not included in the five-language set.

Failure Modes

Poor MT quality compared to NLLB/M2M100 for many language pairs (large BLEU gaps).

Lower performance on languages with little training data (Ge'ez); zero-shot may still fail on domain shifts.

Core Entities

Models

EthioLLM-smallEthioLLM-baseEthioLLM-largeEthioMT5-smallXLM-RAfroXLMRAfroLMAfriTeVaAfriMT5M2M100NLLB

Metrics

weighted F1sacreBLEU

Datasets

EthioBenchmarkEthioNEWSEthioMTEthioHateEthioSentiEthioNEREthioPOSMasakhaNEWSMasakhaNERAfriSentiFlores-200HornMT

Benchmarks

MasakhaNEWSMasakhaNERAfriSentiEthioBenchmark