Overview
The paper shows clear benchmark gains and human preference evidence, but expert judgments are limited (6 experts, 30 items) and some evaluation datasets are proprietary.
Citations24
Evidence Strength0.70
Confidence0.80
Risk Signals10
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 7/7
Reproducibility
Status: Partial assets available
Open source: Partial
License: Adopts LLaMA license for released model parameters
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 40%
Why It Matters For Business
A small, high-quality instruction set can turn an open foundation model into a capable finance assistant, offering a lower-cost, open alternative to closed commercial finance LLMs while enabling on-premise control and inspection.
Who Should Care
Summary TLDR
InvestLM is a financial-domain LLM built by instruction-tuning LLaMA-65B on a small, manually curated set of 1,335 finance-focused instructions (sources: CFA, SEC filings, textbooks, StackExchange, journals, etc.). Using LoRA and context-extension to 8,192 tokens, the authors show InvestLM improves performance on 8/9 financial NLP tasks vs. the untuned LLaMA, yields large gains for 7B models (avg +138.4%) and moderate gains for 65B (avg +28.2%), and is judged by six finance experts as comparable to or better than GPT-3.5/GPT-4 while trailing Claude-2 in some comparisons. The model parameters are released under LLaMA terms.
Problem Statement
Closed commercial finance LLMs (e.g., BloombergGPT) block open research. Smaller public finance-tuned models generalize poorly. Can a small, high-quality instruction set turn a strong foundation model into a useful open financial assistant?
Main Contribution
Build InvestLM by instruction-tuning LLaMA-65B on a manually curated 1,335-example finance instruction set covering CFA, SEC filings, textbooks, journals, StackExchange, and crafted investment Q&A.
Use LoRA (rank=16) and Linear RoPE scaling to extend context to 8,192 tokens, enabling long-document finance tasks.
Key Findings
Instruction-tuning LLaMA-65B with ~1,300 curated finance instructions improves most finance tasks.
Smaller models benefit more from domain instruction tuning than larger models.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| FinSent Micro-F1 | InvestLM 0.79 | LLaMA-65B 0.71 | +0.08 | FinSent | Table 3: InvestLM vs LLaMA | Table 3 |
| FPB Micro-F1 | InvestLM 0.71 | LLaMA-65B 0.38 | +0.33 | Financial PhraseBank (FPB) | Table 3: InvestLM vs LLaMA | Table 3 |
What To Try In 7 Days
Run InvestLM on a sample of your firm's SEC filings and compare summaries to analyst notes.
Fine-tune a 7B model with a few hundred curated domain instructions and measure micro-F1 on a key classification task.
Avoid mixing large generic instruction sets; test domain-only vs. mixed instruction tuning and compare results.
Agent Features
Memory
Optimization Features
Model Optimization
Training Optimization
Reproducibility
Code URLs
Risks & Boundaries
Limitations
Expert evaluation is small-scale (six experts, 30 questions) and subjective.
Some evaluation datasets are proprietary, limiting external verification.
When Not To Use
Do not rely on InvestLM for automated trading decisions without human oversight.
Avoid using the model as a sole source of investment advice or legal compliance guidance.
Failure Modes
May still produce incorrect or risky investment suggestions; requires human validation.
Performance can degrade if generic instructions are mixed into a small domain-tuning set.

