Overview
Production Readiness
0.45
Novelty Score
0.6
Cost Impact Score
0.5
Citation Count
11
Why It Matters For Business
FINCON shows that structuring LLMs like a small investment team plus two-tiered risk controls can raise backtested returns and Sharpe ratios while reducing chatter. This suggests a practical path for building LLM-based decision pipelines for small active portfolios and research prototypes.
Summary TLDR
FINCON is a multi-agent system that organizes specialist LLM analysts under a single manager. It adds a dual-level risk-control: a daily CVaR alert that forces risk-averse actions within episodes, and Conceptual Verbal Reinforcement (CVRF) that updates manager and analyst prompts across episodes using overlap-based textual gradient steps. On backtests (Jan 2022–Jun 2023, GPT-4-Turbo backbone) FINCON outperforms several LLM and DRL baselines on cumulative return and Sharpe ratio for single-stock trading and small portfolio tasks. The system reduces peer-to-peer chatter by selective belief propagation and uses procedural/episodic memory for retrieval.
Problem Statement
LLM agents can trade but they struggle to (1) integrate many data sources cheaply, (2) control risk over time, and (3) refine policies across episodes using natural-language updates. Existing systems either overload one agent or require high communication and lack long-term belief updates.
Main Contribution
A Manager–Analyst hierarchy that assigns uni-modal analyst agents (news, filings, audio, tabular) to distill signals and a single manager that makes trades.
A dual-level risk-control: within-episode CVaR alerts for same-day risk aversion and over-episode Conceptual Verbal Reinforcement (CVRF) that updates prompts using episode comparisons and an overlap-based learning rate.
A modular agent design with working/procedural/episodic memory and selective propagation of conceptual beliefs to limit unnecessary agent-to-agent communication.
Empirical evaluation on multi-modal backtests showing higher cumulative returns and Sharpe ratios versus LLM and DRL baselines for single-stock and small portfolio tasks.
Key Findings
FINCON produces much higher cumulative returns on tested stocks than baselines.
FINCON yields strong risk-adjusted returns on portfolios in tests.
Within-episode CVaR alerts improve stability and returns.
Over-episode belief updates (CVRF) drive faster learning across episodes.
System is more robust in high-volatility tests than baselines.
Results
TSLA Cumulative Return (CR%)
Portfolio1 Cumulative Return (CR%)
Portfolio1 Sharpe Ratio (SR)
Within-episode CVaR impact (Portfolio CR%)
Belief-update overlap increase
Who Should Care
What To Try In 7 Days
Build a manager–analyst prompt layout: assign one LLM agent per data source (news, filings, prices, audio) and a manager to consolidate outputs.
Add a daily CVaR check (tail-average of worst 1% PnL). If CVaR drops, force conservative manager actions that day.
Implement simple episodic belief updates: compare two recent trajectories, distill top winning/losing reasons, and edit manager prompt by a small overlap-based step.
Agent Features
Memory
- Working memory (short-term summarization)
- Procedural memory (per-step records, decay by agent)
- Episodic memory (manager-level trajectories and beliefs)
Planning
- Manager consolidates analyst outputs into sequential trading actions
- Episode-level self-reflection (manager)
Tool Use
- Audio transcription (Whisper)
- External convex optimizer for portfolio weights
- Memory retrieval and Guardrails AI
Frameworks
- Textual gradient-descent style prompt optimizer (CVRF)
- CVaR-based within-episode risk control
Is Agentic
true
Architectures
- Manager-Analyst hierarchy
- Uni-modal specialist analyst agents
Collaboration
- Hierarchical synthesis (manager as sole decision maker)
- Selective back-propagation of conceptual beliefs to relevant analysts
Optimization Features
Token Efficiency
- Reduced peer-to-peer communication to save token and latency cost
System Optimization
- Selective propagation of conceptual beliefs to limit messages
Training Optimization
- Textual prompt updates using overlap-based learning rate
Reproducibility
Code Urls
Code Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Occasional hallucinations (e.g., non-existent memory indices) noted in portfolio tasks.
- Demonstrated only on small portfolios (3 assets); scaling to tens of assets not validated.
- Relies on GPT-4-Turbo (cost and latency concerns for production).
- Backtests confined to one historical window; live-market behavior uncertain.
When Not To Use
- As-is for large-scale portfolios (tens of assets) without redesign for context limits.
- Direct live deployment with real capital before careful out-of-sample and paper trading.
- Settings that require low-cost, low-latency inference at scale using heavy LLMs.
Failure Modes
- Hallucinated factual outputs leading to erroneous trades in long-context, multi-asset cases.
- Outdated or misleading memory retrieval if procedural memory ranking is wrong.
- Overfitting to the training window causing poor generalization to unseen regimes.
- High cost or latency from repeated GPT-4-Turbo calls causing throttling or stale decisions.
Core Entities
Models
- GPT-4-Turbo (backbone for all LLM agents)
- Whisper API (audio transcription used by ECC agent)
Metrics
- Cumulative Return (CR%)
- Sharpe Ratio (SR)
- Max Drawdown (MDD%)
Datasets
- Multi-modal dataset Jan 3 2022 – Jun 10 2023 (Yahoo Finance prices, Refinitiv/Alpaca news, SEC 10-Q/
- Training period: Jan 3 2022 – Oct 4 2022; Testing: Oct 5 2022 – Jun 10 2023
Benchmarks
- Buy-and-Hold
- GENERATIVE AGENT (GA)
- FINGPT
- FINMEM
- FINAGENT
- A2C
- PPO
- DQN
- MarkowitzMV
- FinRL-A2C
- Equal-Weighted ETF

