Overview
The system shows strong backtest gains on the presented test split and clear user-preference wins, but results come from a specific Chinese-focused dataset and synthetic data steps; expect more validation before live deployment.
Citations7
Evidence Strength0.60
Confidence0.78
Risk Signals9
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 4/4
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 50%
Novelty: 40%
Why It Matters For Business
Combining a domain-tuned LLM with retrieval of up-to-date reports and news can improve decision-support outputs and backtested portfolio returns compared to off-the-shelf models on this dataset.
Who Should Care
Summary TLDR
This paper releases AlphaFin, a multi-part financial dataset (reports, news, StockQA, research data) and presents Stock-Chain: a two-stage system that fine-tunes an LLM (StockGPT) with LoRA and augments it with a vector DB-based RAG pipeline for stock trend prediction and financial Q&A. On an out-of-sample AlphaFin test set, Stock-Chain reported higher annualized returns (30.8% ARR) and better human/GPT-4 preference scores than several baselines. The work focuses on Chinese financial sources, uses ChatGPT for data augmentation and summaries, and emphasizes reducing hallucinations via retrieval. Code and data are linked on the project GitHub.
Problem Statement
Current stock models either predict price movement from time-series data (ML/DL) without explanations or use LLMs that lack real-time facts and hallucinate. The field lacks high-quality financial training data and a practical pipeline that combines reasoning, real-time knowledge, and explainable predictions for investors.
Main Contribution
AlphaFin dataset suite combining research datasets, StockQA (prices + Q&A), financial news, financial reports, and 200 hand-written chain-of-thought (CoT) examples.
Stock-Chain system: two-stage pipeline (StockGPT fine-tuned on AlphaFin; RAG-powered vector DB retrieval for real-time knowledge) for stock trend prediction and conversational financial Q&A.
Key Findings
Stock-Chain achieved substantially higher backtested annualized return than baselines.
Fine-tuning with AlphaFin data raises LLM trading performance over vanilla models.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Annualized Rate of Return (ARR) | 30.8% | FinGPT 17.5% | +13.3 pp | AlphaFin-Test (financial report subset) | Table 2 shows ARR values for models | Table 2 |
| Accuracy | 55.7% | XGBoost 55.9% | -0.2 pp | AlphaFin-Test | Table 2 accuracy column | Table 2 |
What To Try In 7 Days
Build a small vector DB of company reports and news; add semantic embeddings (e.g., BGE) and cosine retrieval.
Fine-tune an existing instruction-tuned LLM with a handful of report-based Q&A pairs and a few CoT examples using LoRA.
Run a simple monthly backtest: pick stocks the model predicts 'up' and weight by market cap to compare ARR against an index.
Agent Features
Memory
Tool Use
Frameworks
Architectures
Optimization Features
Infra Optimization
Model Optimization
Training Optimization
Reproducibility
Risks & Boundaries
Limitations
Data and evaluation focus on Chinese markets and Chinese text sources, limiting geographic generality.
Some training data (StockQA, summaries) were generated or augmented with ChatGPT, which can introduce bias or leakage.
When Not To Use
As a sole automated trading engine without rigorous live testing and risk controls.
For high-frequency or intraday trading, since the method is monthly and uses reports/news.
Failure Modes
Hallucinations when relevant documents are missing or retrieval fails.
Outdated knowledge if vector DB is not continuously updated.

