Overview
Production Readiness
0.5
Novelty Score
0.45
Cost Impact Score
0.6
Citation Count
1
Why It Matters For Business
Domain pre-training plus instruction tuning yields measurable accuracy gains on finance QA and exam tasks; adding a calculator reduces numeric errors—useful for advisory, research automation, and computation-heavy workflows.
Summary TLDR
SNFinLLM is a Chinese financial assistant built by (1) continuous domain pre-training on ~100B unsupervised tokens (25B finance), (2) 550k supervised instruction examples for full-parameter fine-tuning, and (3) a Direct Preference Optimization (DPO) step. The authors add a Python-executable calculator expression to handle numeric tasks. On standard benchmarks (FinEval, FinanceIQ) and internal finance test sets, SNFinLLM variants beat an open-source baseline and several Chinese financial LLMs on many tasks. Calculator integration raises finance-computation accuracy; DPO improves some exam-style metrics but can hurt computation and MRC. Code and data availability are not stated.
Problem Statement
Generic and existing finance LLMs suffer hallucinations, weak calculation accuracy, and limited instruction-following in Chinese finance tasks. The paper aims to build a Chinese financial LLM that (a) learns domain knowledge, (b) follows finance-style instructions, and (c) performs reliable numeric computations.
Main Contribution
A three-stage training pipeline: continuous domain pre-training, full-parameter supervised fine-tuning (550k instructions), then DPO alignment.
Curated financial corpora: 25B finance tokens plus general data to total ~100B unsupervised tokens and 7,689 new domain tokens.
Built computation instruction data and a Python-executable [Calculator(expression)->result] format to ensure correct numeric results.
Empirical evaluation on FinEval, FinanceIQ and five internal finance datasets showing consistent gains over an open-source baseline and other Chinese finance LLMs.
Ablation studies showing benefits of domain pre-training and calculator tool, and mixed effects from DPO alignment.
Key Findings
Domain continuous pre-training raises benchmark accuracy.
Instruction fine-tuning produces a stronger instruction-following assistant and improves exam-style tasks vs peers.
Adding a calculator tool improves finance computation accuracy.
DPO alignment helps some exam metrics but can reduce computation and MRC scores.
Pretraining on domain data matters; skipping it degrades performance.
Results
Accuracy
Accuracy
Finance computing (FinC)
Qualification Exam QA (qEQA)
Who Should Care
What To Try In 7 Days
Collect a targeted finance corpus (news, reports, papers) and tokenize with SentencePiece, adding domain tokens.
Fine-tune a base LLM with a curated 10k–100k instruction-style dataset to test instruction-following gains.
Prototype a calculator integration (emit Python expressions and execute) for numeric QA to avoid arithmetic hallucinations.
Agent Features
Tool Use
- calculator integration (Python executable)
Optimization Features
Training Optimization
- SFT
- cosine LR decay with warmup
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- No public release of code or datasets reported in the paper.
- Some key improvements are modest (single-digit percentage points) and task-dependent.
- DPO alignment can hurt computation and MRC performance; tuning required per use case.
- Complex MRC (cMRC) still lags and needs further research.
When Not To Use
- If you require out-of-the-box, reproducible models (code/data not provided).
- For tasks with complex multi-document reasoning where cMRC performance is critical.
- When you cannot afford full-parameter fine-tuning or large-domain pretraining costs.
Failure Modes
- Arithmetic hallucinations if calculator integration is disabled or misused.
- Reduced factual/computational accuracy after DPO if not validated on target tasks.
- Overfitting to domain patterns if domain/general data ratio is poorly chosen.
Core Entities
Models
- SNFinLLM
- SNFinLLM-base
- SNFinLLM-chat
- SNFinLLM-dpo
- SNFinLLM-cal
- opensource-base
- opensource-refine
- Tongyi-Finance-14B
- XuanYuan-13B
Metrics
- Accuracy
Datasets
- FinEval
- FinanceIQ
- qEQA
- FinC
- KQA
- MRC
- cMRC
Benchmarks
- FinEval
- FinanceIQ

