Overview
Backtests show large gains on several tickers and portfolios, but evaluation is limited to one historical window, uses GPT-4-Turbo (costly), and scaling to many assets or live trading remains untested.
Citations11
Evidence Strength0.70
Confidence0.70
Risk Signals11
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 5/5
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 50%
Production readiness: 45%
Novelty: 60%
Why It Matters For Business
FINCON shows that structuring LLMs like a small investment team plus two-tiered risk controls can raise backtested returns and Sharpe ratios while reducing chatter. This suggests a practical path for building LLM-based decision pipelines for small active portfolios and research prototypes.
Who Should Care
Summary TLDR
FINCON is a multi-agent system that organizes specialist LLM analysts under a single manager. It adds a dual-level risk-control: a daily CVaR alert that forces risk-averse actions within episodes, and Conceptual Verbal Reinforcement (CVRF) that updates manager and analyst prompts across episodes using overlap-based textual gradient steps. On backtests (Jan 2022–Jun 2023, GPT-4-Turbo backbone) FINCON outperforms several LLM and DRL baselines on cumulative return and Sharpe ratio for single-stock trading and small portfolio tasks. The system reduces peer-to-peer chatter by selective belief propagation and uses procedural/episodic memory for retrieval.
Problem Statement
LLM agents can trade but they struggle to (1) integrate many data sources cheaply, (2) control risk over time, and (3) refine policies across episodes using natural-language updates. Existing systems either overload one agent or require high communication and lack long-term belief updates.
Main Contribution
A Manager–Analyst hierarchy that assigns uni-modal analyst agents (news, filings, audio, tabular) to distill signals and a single manager that makes trades.
A dual-level risk-control: within-episode CVaR alerts for same-day risk aversion and over-episode Conceptual Verbal Reinforcement (CVRF) that updates prompts using episode comparisons and an overlap-based learning rate.
Key Findings
FINCON produces much higher cumulative returns on tested stocks than baselines.
FINCON yields strong risk-adjusted returns on portfolios in tests.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| TSLA Cumulative Return (CR%) | 82.871% | Buy-and-Hold 6.425% | +76.446 pp | Single-stock test (TSLA), test period Oct 5 2022 – Jun 10 2023 | Table 2 single-stock results | Table 2 |
| Portfolio1 Cumulative Return (CR%) | 113.836% | Markowitz MV 12.636% | +101.200 pp | Portfolio management (TSLA, MSFT, PFE) | Table 3 portfolio results | Table 3 |
What To Try In 7 Days
Build a manager–analyst prompt layout: assign one LLM agent per data source (news, filings, prices, audio) and a manager to consolidate outputs.
Add a daily CVaR check (tail-average of worst 1% PnL). If CVaR drops, force conservative manager actions that day.
Implement simple episodic belief updates: compare two recent trajectories, distill top winning/losing reasons, and edit manager prompt by a small overlap-based step.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
System Optimization
Training Optimization
Reproducibility
Code URLs
Risks & Boundaries
Limitations
Occasional hallucinations (e.g., non-existent memory indices) noted in portfolio tasks.
Demonstrated only on small portfolios (3 assets); scaling to tens of assets not validated.
When Not To Use
As-is for large-scale portfolios (tens of assets) without redesign for context limits.
Direct live deployment with real capital before careful out-of-sample and paper trading.
Failure Modes
Hallucinated factual outputs leading to erroneous trades in long-context, multi-asset cases.
Outdated or misleading memory retrieval if procedural memory ranking is wrong.

