Overview
Production Readiness
0.6
Novelty Score
0.5
Cost Impact Score
0.6
Citation Count
4
Why It Matters For Business
You can get domain gains cheaply by training small LoRA adapters and plugins instead of re-training big models; this yields better finance answers, more reliable calculations, and modular deployment.
Summary TLDR
The authors build DISC-FinLLM, a Chinese financial LLM that combines a 246k-example financial instruction dataset (DISC-FIN-SFT) with a Multiple Experts Fine-tuning Framework (MEFF). They train four task-specific LoRA adapters (consulting, NLP tasks, computing, retrieval) on Baichuan-13B and add simple tool plugins (calculator, equation solver, counter, probability table) and a retrieval plugin. LoRA experts give consistent gains across benchmarks (2–9 points avg.), improve calculation accuracy and retrieval-based answers, and let you swap capabilities without re-training the full model. Code is available; some training data and a proprietary knowledge base are not fully public.
Problem Statement
General LLMs lack specialized Chinese financial knowledge, robust numeric computation, multi-turn finance dialogs, and reliable retrieval in finance. Training huge closed models is costly; a compact, modular method is needed to adapt a base LLM to multiple financial tasks efficiently.
Main Contribution
DISC-FIN-SFT: a 246k-example Chinese financial instruction-tuning dataset covering consulting, NLP tasks, computing, and retrieval-enhanced instructions.
Multiple Experts Fine-tuning Framework (MEFF): train four separate LoRA adapters for distinct finance skills and load them modularly at runtime.
Calculation and retrieval plugins: four small computation tools and a retrieval pipeline integrated via instruction data and tool-call tokens.
Comprehensive evaluation: experiments on FinCUGE/FinEval-like benchmarks, manual calculation set, and a current-affairs retrieval test show measurable gains over base models.
Key Findings
Task-specific LoRA adapters raise average FinNLP performance by a few to several points versus the base model.
Computation LoRA plus calculator plugin substantially improves formula creation and numeric answers.
Retrieval-enhanced adapter improves human-judged utility and linguistic quality slightly over base chat model.
Results
FinCUGE (avg over 6 tasks)
FinEval average
Accuracy
Retrieval human-judged scores
Who Should Care
What To Try In 7 Days
Fork the repo and run the Baichuan-13B base with a single LoRA adapter on your finance prompts.
Build a small calculator plugin and add tool-call tokens for arithmetic-heavy queries.
Create a 1–5k instruction seed (consulting + retrieval) from your internal docs and fine-tune a LoRA for retrieval.
Agent Features
Tool Use
- Expression calculator
- Equation solver
- Counter
- Probability table
- Retrieval plugin
Frameworks
- LoRA
- Toolformer-style invocation
- Chain-of-Thought
- Chain-of-Retrieval
Architectures
- LoRA
- plugin-enabled model (tool calls)
Optimization Features
Infra Optimization
- LoRA
Model Optimization
- LoRA
Training Optimization
- Task-specific adapter training to avoid full fine-tuning
Reproducibility
Code Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Training data partly generated by ChatGPT, which can inject hallucinated or stylized answers.
- Proprietary retrieval knowledge base is not fully public, limiting reproducibility.
- Evaluations show modest gains; DISC-FinLLM still trails GPT-4 on benchmarks.
When Not To Use
- High-stakes automated trading or compliance decisions requiring certified correctness.
- Tasks needing live, real-time market feeds not covered by the static KB.
- When you must match or exceed GPT-4-level performance.
Failure Modes
- Hallucinated financial facts from ChatGPT-generated training content.
- Wrong numeric answers if the model fails to call the compute plugin.
- Relevant documents missed by retrieval and not surfaced in answers.
Core Entities
Models
- Baichuan-13B
- ChatGLM
- ChatGLM2
- GPT-3.5
- GPT-4
- FinGPT-v3
- BloombergGPT
- LLaMA-2-Chat-70B
Metrics
- Accuracy
- F1
- ROUGE
- usefulness
- linguistic quality
- reflectiveness
Datasets
- SFT
- FiQA
- FPB
- FNSC
- Wealth-alpaca
- SmoothNLP
- FinCUGE
- FinEval
- FinFE
- FinQA
- FinCQA
- FinNA
- FinRE
- FinESE
Benchmarks
- FinCUGE (subset used)
- FinEval
- BBT-FIN (as shown in paper tables)

