Overview
The approach is practical: modular LoRA adapters and small plugins let teams get domain improvements without heavy compute, but gains are benchmark-level and the system does not match closed high-end models.
Citations4
Evidence Strength0.70
Confidence0.80
Risk Signals9
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 4/4
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 50%
Why It Matters For Business
You can get domain gains cheaply by training small LoRA adapters and plugins instead of re-training big models; this yields better finance answers, more reliable calculations, and modular deployment.
Who Should Care
Summary TLDR
The authors build DISC-FinLLM, a Chinese financial LLM that combines a 246k-example financial instruction dataset (DISC-FIN-SFT) with a Multiple Experts Fine-tuning Framework (MEFF). They train four task-specific LoRA adapters (consulting, NLP tasks, computing, retrieval) on Baichuan-13B and add simple tool plugins (calculator, equation solver, counter, probability table) and a retrieval plugin. LoRA experts give consistent gains across benchmarks (2–9 points avg.), improve calculation accuracy and retrieval-based answers, and let you swap capabilities without re-training the full model. Code is available; some training data and a proprietary knowledge base are not fully public.
Problem Statement
General LLMs lack specialized Chinese financial knowledge, robust numeric computation, multi-turn finance dialogs, and reliable retrieval in finance. Training huge closed models is costly; a compact, modular method is needed to adapt a base LLM to multiple financial tasks efficiently.
Main Contribution
DISC-FIN-SFT: a 246k-example Chinese financial instruction-tuning dataset covering consulting, NLP tasks, computing, and retrieval-enhanced instructions.
Multiple Experts Fine-tuning Framework (MEFF): train four separate LoRA adapters for distinct finance skills and load them modularly at runtime.
Key Findings
Task-specific LoRA adapters raise average FinNLP performance by a few to several points versus the base model.
Computation LoRA plus calculator plugin substantially improves formula creation and numeric answers.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| FinCUGE (avg over 6 tasks) | Baichuan-13B-Chat: 31 -> LoRA: 40 (example) | Baichuan-13B-Chat (untrained) | +9 | Table 3 (BBT-FIN/FinCUGE subset) | LoRA training improved average from 31 to 40 for Baichuan-13B-Chat | Table 3 |
| FinEval average | DISC-FinLLM variants: ~50.6–51.6; GPT-4: 68.6; ChatGPT: 55.0 | Baichuan-13B-Chat: 49.4 | DISC variants +1.2 to +8.0 vs base | FIN-Eval (Table 4) | DISC-FinLLM consulting/task/retrieval/computing variants score ~50–51.6 vs base 49.4 | Table 4 |
What To Try In 7 Days
Fork the repo and run the Baichuan-13B base with a single LoRA adapter on your finance prompts.
Build a small calculator plugin and add tool-call tokens for arithmetic-heavy queries.
Create a 1–5k instruction seed (consulting + retrieval) from your internal docs and fine-tune a LoRA for retrieval.
Agent Features
Tool Use
Frameworks
Architectures
Optimization Features
Infra Optimization
Model Optimization
Training Optimization
Reproducibility
Risks & Boundaries
Limitations
Training data partly generated by ChatGPT, which can inject hallucinated or stylized answers.
Proprietary retrieval knowledge base is not fully public, limiting reproducibility.
When Not To Use
High-stakes automated trading or compliance decisions requiring certified correctness.
Tasks needing live, real-time market feeds not covered by the static KB.
Failure Modes
Hallucinated financial facts from ChatGPT-generated training content.
Wrong numeric answers if the model fails to call the compute plugin.

