Overview
The paper provides broad empirical evidence across many datasets and clear training recipes, but claims of beating closed-source models are limited to selected tasks and 8B models; practitioners should validate on their own data before production use.
Citations4
Evidence Strength0.85
Confidence0.85
Risk Signals11
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 5/5
Reproducibility
Status: Code + data available
Open source: Yes
License: MIT
At A Glance
Cost impact: 60%
Production readiness: 70%
Novelty: 60%
Why It Matters For Business
Models trained on large finance-specific corpora plus multimodal tuning make practical tasks—report parsing, numeric QA, and chart/table extraction—work better out of the box for analysts and automation.
Who Should Care
Summary TLDR
This paper introduces Open-FinLLMs: FinLLaMA (continual pre-trained on a 52B-token finance corpus), FinLLaMA-Instruct (finetuned on 573K financial instructions), and FinLLaVA (multimodal tuned with 1.43M image/table/chart pairs). The authors open-source code, data, and models and report broad gains on 14 financial task types (30 datasets) and 4 multimodal tasks. Key wins: stronger zero-/few-shot financial NLP, better numeric reasoning vs. general LLMs, and state-of-the-art open-source chart/table understanding (TableBench=72.4). Models are 8B-parameter LLaMA3 derivatives; limits include English-only evaluation and model size capped at 8B.
Problem Statement
General LLMs lack deep financial knowledge and weakly handle non-text financial data (tables, time series, charts). Prior financial models used small domain corpora or remained text-only, leaving zero/few-shot, multimodal reasoning, and decision-making underexplored. Open-FinLLMs aims to fill that gap by combining large continual pretraining, instruction tuning, and multimodal alignment for finance.
Main Contribution
FinLLaMA: continual pretraining of LLaMA3-8B on a 52 billion token finance-focused corpus (text, tables, time series).
FinLLaMA-Instruct: instruction finetuning with 573K curated financial instructions to boost domain task performance.
Key Findings
Large finance-focused continual pretraining improves zero/few-shot task performance.
Instruction tuning with a large math/finance instruction mix improves numeric understanding.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Sentiment F1 (zero-shot) | 81 (FinLLaMA on TSA) | 75 (LLaMA3-8B) | +6 | TSA | Table 5 reports FinLLaMA TSA=81, LLaMA3-8B=75 | Table 5 |
| Accuracy | 0.69 (FinLLaMA-Instruct) | 0.63 (GPT-4) | +0.06 | Number understanding (ConvFinQA/FinQA aggregated) | Table 7 lists NU: FinLLaMA-Instruct=0.69, GPT-4=0.63 | Table 7 |
What To Try In 7 Days
Download FinLLaMA-Instruct and test it on your financial QA and numeric-extraction prompts.
Feed a few example tables/charts to FinLLaVA to validate OCR + table extraction on your reports.
Run a quick few-shot comparison: your current model vs FinLLaMA on 5 representative tasks (sentiment, NER, numeric QA).
Agent Features
Memory
Tool Use
Frameworks
Architectures
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Reproducibility
Risks & Boundaries
Limitations
Models are only 8B parameters; larger-scale behavior is untested.
Evaluations are English-only; multilingual performance unknown.
When Not To Use
For automated high-stakes financial advice without human oversight.
When multilingual or non-English coverage is required.
Failure Modes
Hallucinations in numeric or regulatory claims when source data absent.
OCR or table-parsing errors on low-resolution images.

