A manager–analyst LLM multi-agent that uses verbalized, episode-level belief updates (CVRF) plus daily CVaR alerts to improve trading and (小

July 9, 20249 min

Overview

Decision SnapshotNeeds Validation

Backtests show large gains on several tickers and portfolios, but evaluation is limited to one historical window, uses GPT-4-Turbo (costly), and scaling to many assets or live trading remains untested.

Citations11

Evidence Strength0.70

Confidence0.70

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 5/5

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 50%

Production readiness: 45%

Novelty: 60%

Authors

Yangyang Yu, Zhiyuan Yao, Haohang Li, Zhiyang Deng, Yupeng Cao, Zhi Chen, Jordan W. Suchow, Rong Liu, Zhenyu Cui, Zhaozhuo Xu, Denghui Zhang, Koduvayur Subbalakshmi, Guojun Xiong, Yueru He, Jimin Huang, Dong Li, Qianqian Xie

Links

Abstract / PDF / Code

Why It Matters For Business

FINCON shows that structuring LLMs like a small investment team plus two-tiered risk controls can raise backtested returns and Sharpe ratios while reducing chatter. This suggests a practical path for building LLM-based decision pipelines for small active portfolios and research prototypes.

Who Should Care

Summary TLDR

FINCON is a multi-agent system that organizes specialist LLM analysts under a single manager. It adds a dual-level risk-control: a daily CVaR alert that forces risk-averse actions within episodes, and Conceptual Verbal Reinforcement (CVRF) that updates manager and analyst prompts across episodes using overlap-based textual gradient steps. On backtests (Jan 2022–Jun 2023, GPT-4-Turbo backbone) FINCON outperforms several LLM and DRL baselines on cumulative return and Sharpe ratio for single-stock trading and small portfolio tasks. The system reduces peer-to-peer chatter by selective belief propagation and uses procedural/episodic memory for retrieval.

Problem Statement

LLM agents can trade but they struggle to (1) integrate many data sources cheaply, (2) control risk over time, and (3) refine policies across episodes using natural-language updates. Existing systems either overload one agent or require high communication and lack long-term belief updates.

Main Contribution

A Manager–Analyst hierarchy that assigns uni-modal analyst agents (news, filings, audio, tabular) to distill signals and a single manager that makes trades.

A dual-level risk-control: within-episode CVaR alerts for same-day risk aversion and over-episode Conceptual Verbal Reinforcement (CVRF) that updates prompts using episode comparisons and an overlap-based learning rate.

Key Findings

FINCON produces much higher cumulative returns on tested stocks than baselines.

NumbersTSLA CR 82.871% vs buy-and-hold 6.425% (Table 2)

Practical UseFor small-scale, backtested trading, using a manager+specialist analyst architecture with CVaR and CVRF can materially raise returns versus simple LLM agents or buy-and-hold on these sample tickers.

Evidence RefTable 2 (single-stock results)

FINCON yields strong risk-adjusted returns on portfolios in tests.

NumbersPortfolio1 CR 113.836%, SR 3.269 vs Markowitz CR 12.636%, SR 0.614 (Table 3)

Practical UseFor small portfolios (3 assets here), FINCON's combined distillation + episodic belief updates produced much higher profit and Sharpe; try this architecture for small active portfolios before scaling.

Evidence RefTable 3 (portfolio results)

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
TSLA Cumulative Return (CR%)82.871%Buy-and-Hold 6.425%+76.446 ppSingle-stock test (TSLA), test period Oct 5 2022 – Jun 10 2023Table 2 single-stock resultsTable 2
Portfolio1 Cumulative Return (CR%)113.836%Markowitz MV 12.636%+101.200 ppPortfolio management (TSLA, MSFT, PFE)Table 3 portfolio resultsTable 3

What To Try In 7 Days

Build a manager–analyst prompt layout: assign one LLM agent per data source (news, filings, prices, audio) and a manager to consolidate outputs.

Add a daily CVaR check (tail-average of worst 1% PnL). If CVaR drops, force conservative manager actions that day.

Implement simple episodic belief updates: compare two recent trajectories, distill top winning/losing reasons, and edit manager prompt by a small overlap-based step.

Agent Features

Memory
Working memory (short-term summarization)Procedural memory (per-step records, decay by agent)Episodic memory (manager-level trajectories and beliefs)
Planning
Manager consolidates analyst outputs into sequential trading actionsEpisode-level self-reflection (manager)
Tool Use
Audio transcription (Whisper)External convex optimizer for portfolio weightsMemory retrieval and Guardrails AI
Frameworks
Textual gradient-descent style prompt optimizer (CVRF)CVaR-based within-episode risk control
Is Agentic

Yes

Architectures
Manager-Analyst hierarchyUni-modal specialist analyst agents
Collaboration
Hierarchical synthesis (manager as sole decision maker)Selective back-propagation of conceptual beliefs to relevant analysts

Optimization Features

Token Efficiency
Reduced peer-to-peer communication to save token and latency cost
System Optimization
Selective propagation of conceptual beliefs to limit messages
Training Optimization
Textual prompt updates using overlap-based learning rate

Reproducibility

Code AvailableYes
Data AvailableNo
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Occasional hallucinations (e.g., non-existent memory indices) noted in portfolio tasks.

Demonstrated only on small portfolios (3 assets); scaling to tens of assets not validated.

When Not To Use

As-is for large-scale portfolios (tens of assets) without redesign for context limits.

Direct live deployment with real capital before careful out-of-sample and paper trading.

Failure Modes

Hallucinated factual outputs leading to erroneous trades in long-context, multi-asset cases.

Outdated or misleading memory retrieval if procedural memory ranking is wrong.

Core Entities

Models

GPT-4-Turbo (backbone for all LLM agents)Whisper API (audio transcription used by ECC agent)

Metrics

Cumulative Return (CR%)Sharpe Ratio (SR)Max Drawdown (MDD%)

Datasets

Multi-modal dataset Jan 3 2022 – Jun 10 2023 (Yahoo Finance prices, Refinitiv/Alpaca news, SEC 10-Q/Training period: Jan 3 2022 – Oct 4 2022; Testing: Oct 5 2022 – Jun 10 2023

Benchmarks

Buy-and-HoldGENERATIVE AGENT (GA)FINGPTFINMEMFINAGENTA2CPPODQNMarkowitzMVFinRL-A2CEqual-Weighted ETF