A manager–analyst LLM multi-agent that uses verbalized, episode-level belief updates (CVRF) plus daily CVaR alerts to improve trading and (小

Overview

Decision SnapshotNeeds Validation

Backtests show large gains on several tickers and portfolios, but evaluation is limited to one historical window, uses GPT-4-Turbo (costly), and scaling to many assets or live trading remains untested.

Citations11

Evidence Strength0.70

Confidence0.70

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 5/5

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 50%

Production readiness: 45%

Novelty: 60%

Authors

Yangyang Yu, Zhiyuan Yao, Haohang Li, Zhiyang Deng, Yupeng Cao, Zhi Chen, Jordan W. Suchow, Rong Liu, Zhenyu Cui, Zhaozhuo Xu, Denghui Zhang, Koduvayur Subbalakshmi, Guojun Xiong, Yueru He, Jimin Huang, Dong Li, Qianqian Xie

Links

Abstract / PDF / Code

Why It Matters For Business

FINCON shows that structuring LLMs like a small investment team plus two-tiered risk controls can raise backtested returns and Sharpe ratios while reducing chatter. This suggests a practical path for building LLM-based decision pipelines for small active portfolios and research prototypes.

Who Should Care

Product Manager ML Engineer Founder Data Scientist

Summary TLDR

FINCON is a multi-agent system that organizes specialist LLM analysts under a single manager. It adds a dual-level risk-control: a daily CVaR alert that forces risk-averse actions within episodes, and Conceptual Verbal Reinforcement (CVRF) that updates manager and analyst prompts across episodes using overlap-based textual gradient steps. On backtests (Jan 2022–Jun 2023, GPT-4-Turbo backbone) FINCON outperforms several LLM and DRL baselines on cumulative return and Sharpe ratio for single-stock trading and small portfolio tasks. The system reduces peer-to-peer chatter by selective belief propagation and uses procedural/episodic memory for retrieval.

Problem Statement

LLM agents can trade but they struggle to (1) integrate many data sources cheaply, (2) control risk over time, and (3) refine policies across episodes using natural-language updates. Existing systems either overload one agent or require high communication and lack long-term belief updates.

Main Contribution

A Manager–Analyst hierarchy that assigns uni-modal analyst agents (news, filings, audio, tabular) to distill signals and a single manager that makes trades.

A dual-level risk-control: within-episode CVaR alerts for same-day risk aversion and over-episode Conceptual Verbal Reinforcement (CVRF) that updates prompts using episode comparisons and an overlap-based learning rate.

Key Findings

FINCON produces much higher cumulative returns on tested stocks than baselines.

NumbersTSLA CR 82.871% vs buy-and-hold 6.425% (Table 2)

Practical UseFor small-scale, backtested trading, using a manager+specialist analyst architecture with CVaR and CVRF can materially raise returns versus simple LLM agents or buy-and-hold on these sample tickers.

Evidence RefTable 2 (single-stock results)

FINCON yields strong risk-adjusted returns on portfolios in tests.

NumbersPortfolio1 CR 113.836%, SR 3.269 vs Markowitz CR 12.636%, SR 0.614 (Table 3)

Practical UseFor small portfolios (3 assets here), FINCON's combined distillation + episodic belief updates produced much higher profit and Sharpe; try this architecture for small active portfolios before scaling.

Evidence RefTable 3 (portfolio results)

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
TSLA Cumulative Return (CR%)	82.871%	Buy-and-Hold 6.425%	+76.446 pp	Single-stock test (TSLA), test period Oct 5 2022 – Jun 10 2023	Table 2 single-stock results	Table 2
Portfolio1 Cumulative Return (CR%)	113.836%	Markowitz MV 12.636%	+101.200 pp	Portfolio management (TSLA, MSFT, PFE)	Table 3 portfolio results	Table 3

What To Try In 7 Days

Build a manager–analyst prompt layout: assign one LLM agent per data source (news, filings, prices, audio) and a manager to consolidate outputs.

Add a daily CVaR check (tail-average of worst 1% PnL). If CVaR drops, force conservative manager actions that day.

Implement simple episodic belief updates: compare two recent trajectories, distill top winning/losing reasons, and edit manager prompt by a small overlap-based step.

Agent Features

Memory

Working memory (short-term summarization)Procedural memory (per-step records, decay by agent)Episodic memory (manager-level trajectories and beliefs)

Planning

Manager consolidates analyst outputs into sequential trading actionsEpisode-level self-reflection (manager)

Tool Use

Audio transcription (Whisper)External convex optimizer for portfolio weightsMemory retrieval and Guardrails AI

Frameworks

Textual gradient-descent style prompt optimizer (CVRF)CVaR-based within-episode risk control

Is Agentic

Yes

Architectures

Manager-Analyst hierarchyUni-modal specialist analyst agents

Collaboration

Hierarchical synthesis (manager as sole decision maker)Selective back-propagation of conceptual beliefs to relevant analysts

Optimization Features

Token Efficiency

Reduced peer-to-peer communication to save token and latency cost

System Optimization

Selective propagation of conceptual beliefs to limit messages

Training Optimization

Textual prompt updates using overlap-based learning rate

Reproducibility

Code AvailableYes

Data AvailableNo

Open Source StatusPartial

LicenseUnknown

Code URLs

https://github.com/The-FinAI/FinCon

Risks & Boundaries

Limitations

Occasional hallucinations (e.g., non-existent memory indices) noted in portfolio tasks.

Demonstrated only on small portfolios (3 assets); scaling to tens of assets not validated.

When Not To Use

As-is for large-scale portfolios (tens of assets) without redesign for context limits.

Direct live deployment with real capital before careful out-of-sample and paper trading.

Failure Modes

Hallucinated factual outputs leading to erroneous trades in long-context, multi-asset cases.

Outdated or misleading memory retrieval if procedural memory ranking is wrong.

Core Entities

Models

GPT-4-Turbo (backbone for all LLM agents)Whisper API (audio transcription used by ECC agent)

Metrics

Cumulative Return (CR%)Sharpe Ratio (SR)Max Drawdown (MDD%)

Datasets

Multi-modal dataset Jan 3 2022 – Jun 10 2023 (Yahoo Finance prices, Refinitiv/Alpaca news, SEC 10-Q/Training period: Jan 3 2022 – Oct 4 2022; Testing: Oct 5 2022 – Jun 10 2023

Benchmarks

Buy-and-HoldGENERATIVE AGENT (GA)FINGPTFINMEMFINAGENTA2CPPODQNMarkowitzMVFinRL-A2CEqual-Weighted ETF

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

FINCON produces much higher cumulative returns on tested stocks than baselines.

FINCON yields strong risk-adjusted returns on portfolios in tests.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Code URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

Key finding

A dynamic town simulation that tests LLM agents on doing tasks while following local cultural norms

Key finding

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding