Overview
The benchmark is ready for evaluation and red-teaming in industry but is limited to single-turn prompts, three intent domains, and relies on heuristic detectors with a small audit error rate.
Citations0
Evidence Strength0.85
Confidence0.88
Risk Signals12
Trust Signals
Findings with numeric evidence: 7/7
Findings with evidence refs: 7/7
Results with explicit delta: 2/7
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 50%
Production readiness: 60%
Novelty: 68%
Why It Matters For Business
Multilingual and script-mixed users face real jailbreak risk that contracts can hide; evaluate product safety in the languages and input styles your users use to avoid unnoticed exposures.
Who Should Care
Summary TLDR
IJR is a reproducible, judge-free benchmark that measures jailbreak vulnerability across 12 South Asian languages using two tracks: JSON (contract-bound refusals) and FREE (unconstrained responses). It contains 45,216 prompts and finds high contracted jailbreak rates in many models (several >0.9 JSR), strong English→Indic transfer, and large orthography effects (romanization reduces JSON JSR by ≈0.34). Human audits show detectors are reliable (≈95% schema validity, 4.3% false negatives). The dataset and scoring scripts are released.
Problem Statement
Existing jailbreak and safety tests focus on English and often use learned judges. This misses multilingual vulnerabilities, script-mixing, and romanization common in South Asia. The paper creates a judge-free, multilanguage benchmark to reveal risks hidden by English-only, contract-focused evaluations.
Main Contribution
IndicJR dataset: 45,216 prompts across 12 South Asian languages, with JSON (42,636) and FREE (2,580) tracks.
Judge-free protocol: deterministic, language-aware parsing that scores refusals without external LLM judges.
Key Findings
High contracted jailbreak rates across many models
FREE track shows near-universal jailbreak success
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Dataset size (total prompts) | 45,216 prompts (JSON 42,636; FREE 2,580) | — | — | IJR | Table 5 and main text | — |
| JSON-track attacked-benign JSR (example models) | LLaMA 3.1 0.922; LLaMA 3.3 0.978; Sarvam 0.959; GPT-4o 0.508 | — | — | JSON attacked-benign (E1) | Table 2 model JSRs | — |
What To Try In 7 Days
Run IJR's FREE and JSON tracks (or a lite subset) on your top models to surface contract gaps.
Test romanized and mixed-script inputs from real user logs to spot tokenization-driven failures.
Include English→local wrappers and format-forcing attacks (JSON/YAML) in red-team suites.
Reproducibility
Code URLs
Data URLs
Risks & Boundaries
Limitations
Single-turn prompts only; multi-turn jailbreaks are not covered.
Only three harmful intent categories (chemistry, bio, illicit access).
When Not To Use
As the only safety test for multi-turn chat or dialog systems.
To claim full safety across domains beyond chem/bio/security.
Failure Modes
Judge-free heuristics can miss subtle unsafe guidance (≈4% false negatives).
Romanization generator may not reflect noisy real-world user spellings.

