Overview
Demonstrates practical gains from modest continual pretraining plus LoRA on in-domain Urdu translation, but results are limited by small pretraining budget, narrow test scope, and missing detox; larger or more diverse data could change outcomes.
Citations0
Evidence Strength0.65
Confidence0.80
Risk Signals10
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 3/4
Reproducibility
Status: No open assets linked
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 40%
Novelty: 50%
Why It Matters For Business
Targeted continual pretraining plus LoRA fine-tuning can give large in-domain translation gains with modest compute, enabling localized Urdu services without training from scratch.
Who Should Care
Summary TLDR
This paper builds UrduLLaMA 1.0 by continually pretraining Llama-3.1-8B-Instruct on 128M curated Urdu tokens and then fine-tuning with LoRA on 41k Urdu instructions plus ~50k English–Urdu sentence pairs. On three translation test sets, UrduLLaMA improves BLEU vs the base LLaMA model—especially in-domain—though a large multilingual translation model (seamless-m4t-v2-large) still leads on some general datasets. The work shows practical gains from targeted adaptation with limited compute but is limited by token budget, narrow evaluation, and the lack of detoxification.
Problem Statement
Open LLMs underperform on low-resource languages like Urdu because training corpora lack sufficient, clean Urdu data and language-specific preprocessing. The paper asks whether modest continual pretraining plus targeted fine-tuning (using LoRA) can improve Urdu translation and instruction following with limited compute.
Main Contribution
Curated and preprocessed a 1.14B-token Urdu dataset (after filtering/deduplication) and used 128M tokens for continual pretraining.
Continual pretraining of LLaMA-3.1-8B-Instruct on Urdu (128M tokens) followed by LoRA-based instruction tuning (41k instructions) and MT fine-tuning (~50k en-ur pairs).
Key Findings
UrduLLaMA 1.0 raises in-house MT BLEU from 10.87 to 28.01.
On general-domain test sets the gains are smaller and a large multilingual model can still win.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| BLEU | 28.01 | Llama-3.1-8B-Instruct 10.87 | +17.14 | In-house MT test split | UrduLLaMA 28.01 vs Llama-3.1-8B-Instruct 10.87 (Table 6) | Table 6 |
| BLEU | 13.12 | Llama-3.1-8B-Instruct 10.04 | +3.08 | TICO-19 | UrduLLaMA 13.12 vs Llama-3.1-8B-Instruct 10.04; seamless-m4t-v2-large 19.22 (Table 6) | Table 6 |
What To Try In 7 Days
Collect a small, domain-focused Urdu corpus and run the paper's preprocessing (language filtering, normalization, dedup).
Apply LoRA to an open LLaMA-style 7–8B checkpoint for instruction tuning using ~10k–50k translated/task examples.
Fine-tune on a modest in-domain parallel set and evaluate with BLEU plus a 100-sentence blind human check.
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Reproducibility
Risks & Boundaries
Limitations
Continual pretraining used only 128M tokens due to compute limits; coverage is incomplete.
Detoxification was not applied — model can produce harmful or offensive outputs.
When Not To Use
For safety-critical or moderated deployments without detox controls.
As a drop-in replacement for general-purpose multilingual translation where broad domain coverage is needed.
Failure Modes
Generates offensive or harmful content because detox was not applied.
Underperforms on out-of-domain or culturally nuanced Urdu content due to limited pretraining coverage.

