AlphaFin dataset + Stock-Chain: a RAG-enabled LLM system for stock prediction and financial Q&A

March 19, 20247 min

Overview

Decision SnapshotNeeds Validation

The system shows strong backtest gains on the presented test split and clear user-preference wins, but results come from a specific Chinese-focused dataset and synthetic data steps; expect more validation before live deployment.

Citations7

Evidence Strength0.60

Confidence0.78

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 4/4

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 50%

Novelty: 40%

Authors

Xiang Li, Zhenyu Li, Chen Shi, Yong Xu, Qing Du, Mingkui Tan, Jun Huang, Wei Lin

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Combining a domain-tuned LLM with retrieval of up-to-date reports and news can improve decision-support outputs and backtested portfolio returns compared to off-the-shelf models on this dataset.

Who Should Care

Summary TLDR

This paper releases AlphaFin, a multi-part financial dataset (reports, news, StockQA, research data) and presents Stock-Chain: a two-stage system that fine-tunes an LLM (StockGPT) with LoRA and augments it with a vector DB-based RAG pipeline for stock trend prediction and financial Q&A. On an out-of-sample AlphaFin test set, Stock-Chain reported higher annualized returns (30.8% ARR) and better human/GPT-4 preference scores than several baselines. The work focuses on Chinese financial sources, uses ChatGPT for data augmentation and summaries, and emphasizes reducing hallucinations via retrieval. Code and data are linked on the project GitHub.

Problem Statement

Current stock models either predict price movement from time-series data (ML/DL) without explanations or use LLMs that lack real-time facts and hallucinate. The field lacks high-quality financial training data and a practical pipeline that combines reasoning, real-time knowledge, and explainable predictions for investors.

Main Contribution

AlphaFin dataset suite combining research datasets, StockQA (prices + Q&A), financial news, financial reports, and 200 hand-written chain-of-thought (CoT) examples.

Stock-Chain system: two-stage pipeline (StockGPT fine-tuned on AlphaFin; RAG-powered vector DB retrieval for real-time knowledge) for stock trend prediction and conversational financial Q&A.

Key Findings

Stock-Chain achieved substantially higher backtested annualized return than baselines.

NumbersARR 30.8% for Stock-Chain vs 17.5% for FinGPT

Practical UseIntegrate retrieval and domain fine-tuning to materially improve medium-term backtested returns versus off-the-shelf FinLLMs on this test set.

Evidence RefTable 2

Fine-tuning with AlphaFin data raises LLM trading performance over vanilla models.

NumbersChatGLM2: ARR 8.1% → w/raw_data 15.8% → Stock-Chain 30.8%

Practical UseAdd domain-specific reports and simple Q&A examples to LLM fine-tuning before deploying financial prediction models.

Evidence RefTable 3

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Annualized Rate of Return (ARR)30.8%FinGPT 17.5%+13.3 ppAlphaFin-Test (financial report subset)Table 2 shows ARR values for modelsTable 2
Accuracy55.7%XGBoost 55.9%-0.2 ppAlphaFin-TestTable 2 accuracy columnTable 2

What To Try In 7 Days

Build a small vector DB of company reports and news; add semantic embeddings (e.g., BGE) and cosine retrieval.

Fine-tune an existing instruction-tuned LLM with a handful of report-based Q&A pairs and a few CoT examples using LoRA.

Run a simple monthly backtest: pick stocks the model predicts 'up' and weight by market cap to compare ARR against an index.

Agent Features

Memory
retrieval memory (vector DB, continuously updated)
Tool Use
vector DB retrievalsentence embedding (BGE)
Frameworks
RAGLoRARefGPT
Architectures
two-stage (predict + conversational) pipelineRAG with vector DB plus LLM

Optimization Features

Infra Optimization
single A800 80GB reported for training
Model Optimization
LoRA
Training Optimization
staged fine-tuning (reports then CoT examples)bf16 training

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Data and evaluation focus on Chinese markets and Chinese text sources, limiting geographic generality.

Some training data (StockQA, summaries) were generated or augmented with ChatGPT, which can introduce bias or leakage.

When Not To Use

As a sole automated trading engine without rigorous live testing and risk controls.

For high-frequency or intraday trading, since the method is monthly and uses reports/news.

Failure Modes

Hallucinations when relevant documents are missing or retrieval fails.

Outdated knowledge if vector DB is not continuously updated.

Core Entities

Models

Stock-ChainStockGPTFinGPTFinMAChatGPTChatGLM2LSTMGRUXGBoostRandomforest

Metrics

ARRACCAERRANVOLSharpe RatioMaximum DrawdownCalmar RatioMDDROUGE-1ROUGE-2ROUGE-L

Datasets

AlphaFinAlphaFin-TestFPBFinQAConvFinQAHeadlineStockQAFinancial NewsFinancial ReportsDataYesTushareAKshare

Benchmarks

AlphaFin-Test