A conversational LLM agent that automates buyer and seller workflows on a C2C marketplace, cutting interaction time and automating multi‑tap

September 4, 20258 min

Overview

Decision SnapshotNeeds Validation

Results are promising but come from synthetic datasets and an LLM-based simulator that used the same model as the agent. Field testing is required for production claims.

Citations0

Evidence Strength0.60

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/4

Findings with evidence refs: 4/4

Results with explicit delta: 2/6

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 50%

Authors

Yineng Yan, Xidong Wang, Jin Seng Cheng, Ran Hu, Wentao Guan, Nahid Farahmand, Hengte Lin, Yue Li

Links

Abstract / PDF

Why It Matters For Business

FaMA can reduce seller and buyer time on common tasks, improve scalability of messaging and search, and provide a safer conversational interface that reduces user errors. Measured gains are promising but come from synthetic tests and short timing studies, so expect differences in production.

Who Should Care

Summary TLDR

FaMA is a conversational assistant built on Llama-4 that turns marketplace GUI workflows into natural-language commands. It uses a short-term 'scratchpad' memory, tool calling (listings, search, messaging), and an optional RAG tool for policy/help lookup. In synthetic tests FaMA solved tasks with ~98% success and halved interaction time for bulk replies. The evaluation is synthetic and uses an LLM-based simulator, so real-world gains may vary.

Problem Statement

C2C marketplaces are full of repetitive, multi-step GUI tasks (listing creation/renewal, bulk replies, filtered search). These tasks are slow and error-prone on mobile UIs. Users need a simpler, conversational entry point that can understand natural requests and operate platform tools safely.

Main Contribution

Design and implementation of FaMA: an LLM-based conversational assistant with tool calling, scratchpad short-term memory, and a RAG help tool.

A single-step interactive ReAct-style loop that asks users to confirm each state-changing action for safety.

Key Findings

High automated task success on the synthetic evaluation.

Numbers98% task success rate (synthetic 100-listing eval)

Practical UseExpect strong automation for typical marketplace workflows in controlled settings; validate on real users before rollout because the test used synthetic data and an LLM-based simulator.

Evidence RefSection 4.1; Figure 3

Bulk replies can be much faster with the agent.

NumbersBulk messages: 25s with FaMA vs 50s manual (2x speedup)

Practical UseDeploying a messaging tool can roughly halve time for batch replies, reducing seller time on repetitive communication.

Evidence RefSection 4.2; Table 1

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Overall Task Success Rate (automated eval)98% success on evaluated taskssynthetic 100-listing datasetFaMA achieved ~98%+ overall success in automated evaluationSection 4.1; Figure 3
Inventory Search success98% success; 100% of successful attempts in single optimal stepsynthetic 100-listing datasetSingle-step Inventory Search: 98% success and 100% optimalitySection 4.1; Figure 3

What To Try In 7 Days

Prototype a conversational entry point for one seller workflow (e.g., bulk replies) and measure time saved.

Add a scratchpad-style short-term memory to preserve multi-step state across confirmations.

Wrap three essential platform APIs (search, update listing, messaging) as callable tools for the LLM and test with synthetic scenarios.

Agent Features

Memory
Scratchpad chronological Thought-Action-Observation log (short-term)Ephemeral dialog history (session-based purge)Listings Information Memory (title, desc, ID per session)
Planning
ReAct Thought-Action-Observation planningChain-of-Thought prompting for reasoning
Tool Use
Listing operation tools (create/update/renew)Inventory search tool (marketplace search API)Messaging tools (single and bulk)RAG-as-Tool for help articlesASR front-end for voice
Frameworks
ReActRAGTool callingASR
Is Agentic

Yes

Architectures
Single-step interactive ReAct loop with user confirmationLLM core: Llama-4-Maverick-17B-128E-Instruct

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

Evaluation uses a synthetic 100-listing dataset and an LLM-based user simulator, which can overestimate real-world performance.

Both agent and simulator use the same LLM, creating potential evaluation bias.

When Not To Use

Workflows that require persistent long-term memory across sessions.

High-risk operations needing strict audit trails without human confirmation.

Failure Modes

Misidentifying the target listing from ambiguous user text, especially outside session-stored listings.

LLM hallucinations when calling tools or synthesizing policy answers without reliable RAG grounding.

Core Entities

Models

Llama-4-Maverick-17B-128E-Instruct

Metrics

Task Success RateTask Optimality RateInteraction TimeSpeedup

Datasets

synthetic_100_listings_dataset (LLM-generated)