iAgents: agents mirror human social networks to trade private info and solve group tasks under information asymmetry

June 21, 20248 min

Overview

Production Readiness

0.5

Novelty Score

0.7

Cost Impact Score

0.6

Citation Count

1

Authors

Wei Liu, Chenxi Wang, Yifei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Chen Qian

Links

Abstract / PDF

Why It Matters For Business

iAgents lets one agent per user coordinate across private data without centralizing it, enabling multi-user scheduling, concierge and workflow automation—but expect higher token costs and privacy trade-offs.

Summary TLDR

This paper defines information asymmetry for multi-agent systems (each agent only sees its user's private data) and proposes iAgents: a system of one agent per user that proactively requests and exchanges only necessary human information. Two core ideas: InfoNav, an explicit plan that tracks which facts (rationales) are unknown and guides multi-turn questions; and Mixed Memory, combining exact-span 'Clear Memory' with embedding-based 'Fuzzy Memory' for retrieval. The authors release InformativeBench (5 datasets) and show GPT-4 achieves ~50% on average while smaller LLMs perform worse. Ablations show InfoNav is critical for small-network reasoning and mixed memory + recursive communication is

Problem Statement

Multi-agent systems assume shared context but real human collaborations are asymmetric: each agent only sees its user's private information. That breaks coordination. The challenge is to enable agents to acquire and exchange needed facts without centralizing private data, while scaling retrieval over many messages and keeping multi-turn communication focused.

Main Contribution

Formulate the problem of information asymmetry in multi-agent collaboration and shift focus from a single shared virtual entity to agents that mirror users.

Propose iAgents: integrates InfoNav (plan-driven communication) and Mixed Memory (Clear + Fuzzy) to retrieve and exchange human information without centralizing all data.

Release InformativeBench, a benchmark with five datasets (Needle/Reasoning pipelines) to evaluate agent collaboration under information asymmetry and provide code/data.

Key Findings

GPT-4-backed iAgents solved many tasks but performance varies strongly by dataset difficulty.

NumbersGPT-4: Schedule Easy 56.67%, Schedule Medium 51.00%, Schedule Hard 22.80%, NP 64.00%, FriendsTV 57.94%

iAgents scaled to a large simulated social network and retrieved many messages during runs.

NumbersFriendsTV: 140 nodes, 588 edges; agents searched ~70,000 messages and completed tasks within ~3 minutes

Design components have measurable impact: InfoNav, Mixed Memory, and recursive communication improved accuracy.

NumbersInfoNav adds 15–26% on small reasoning datasets; Mixed Memory adds 2.38–6.34% on FriendsTV; recursive communication gave

Privacy and pretraining knowledge affect performance.

NumbersAnonymizing character names reduced FriendsTV accuracy from 35.71% to 32.54%; privacy-preserving vague outputs reduced 3

Results

Schedule Easy (precision)

Value56.67%

Schedule Medium (precision)

Value51.00%

Schedule Hard (precision)

Value22.80%

Needle in the Persona (precision)

Value64.00%

FriendsTV (precision)

Value57.94%

Accuracy

Value50.48%

Ablation: w/o InfoNav on Schedule

Valuedrops to <=10% accuracy

BaselineiAgents full (36.67% on Schedule Easy)

Impact of recursive communication on FriendsTV

Value12.7% improvement

Baselinew/o recursive comm

Who Should Care

What To Try In 7 Days

Prototype InfoNav prompts on a small 4–6 person calendar use case to test multi-turn info exchange.

Build a mixed memory of exact spans + session summaries and compare retrieval quality.

Run InformativeBench (NP or ScheduleEasy) with your preferred LLM to measure baseline accuracy and token cost.

Agent Features

Memory

  • Mixed Memory: Clear Memory (exact spans)
  • Mixed Memory: Fuzzy Memory (session summaries + embeddings)

Planning

  • InfoNav (explicit plan tracking)
  • Consensus reasoning (plan-based merge)

Tool Use

  • embedding-based retrieval (ANN)
  • LLM summarizer for session-level summaries

Frameworks

  • iAgents (InfoNav + Mixed Memory)
  • InformativeBench

Is Agentic

true

Architectures

  • one-agent-per-user mirroring
  • role-play prompt-created agents

Collaboration

  • recursive inter-agent communication
  • multi-turn autonomous dialogs (max 10 turns in experiments)

Optimization Features

Token Efficiency

  • paper reports ~30k input tokens per task as cost concern

Reproducibility

Code Available

Data Available

Open Source Status

  • yes

Risks & Boundaries

Limitations

  • Privacy vs. utility trade-off: stronger privacy restrictions noticeably reduce accuracy.
  • High token and latency cost: experiments report ~30,000 tokens per task.
  • Dependence on closed LLM backends for top performance; smaller models lag far behind.
  • No statistical error bars reported; results are point estimates on the provided datasets.

When Not To Use

  • When absolute local-only privacy is required (L3 level) and edge models cannot match performance.
  • For tiny tasks where centralizing data is simpler and cheaper.
  • When token cost or latency must be minimal.

Failure Modes

  • Agents hallucinate 'fake solved' rationales and pass incorrect facts into consensus.
  • Pretrained model priors override user-provided evidence, leading to prior-distraction errors.
  • Excessive retrieval noise from fuzzy memory if summaries lose critical span details.

Core Entities

Models

  • gpt-4-0125-preview
  • gpt-3.5-turbo-16k
  • gemini-1.0-pro-latest
  • claude-sonnet 2

Metrics

  • Precision
  • F1
  • IoU

Datasets

  • InformativeBench
  • Needle in the Persona (NP)
  • FriendsTV
  • Schedule (Easy/Medium/Hard)

Benchmarks

  • InformativeBench

Context Entities

Models

  • role-play prompting agents (prior MAS baselines referenced)