Overview
Production Readiness
0.6
Novelty Score
0.5
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
FaMA can reduce seller and buyer time on common tasks, improve scalability of messaging and search, and provide a safer conversational interface that reduces user errors. Measured gains are promising but come from synthetic tests and short timing studies, so expect differences in production.
Summary TLDR
FaMA is a conversational assistant built on Llama-4 that turns marketplace GUI workflows into natural-language commands. It uses a short-term 'scratchpad' memory, tool calling (listings, search, messaging), and an optional RAG tool for policy/help lookup. In synthetic tests FaMA solved tasks with ~98% success and halved interaction time for bulk replies. The evaluation is synthetic and uses an LLM-based simulator, so real-world gains may vary.
Problem Statement
C2C marketplaces are full of repetitive, multi-step GUI tasks (listing creation/renewal, bulk replies, filtered search). These tasks are slow and error-prone on mobile UIs. Users need a simpler, conversational entry point that can understand natural requests and operate platform tools safely.
Main Contribution
Design and implementation of FaMA: an LLM-based conversational assistant with tool calling, scratchpad short-term memory, and a RAG help tool.
A single-step interactive ReAct-style loop that asks users to confirm each state-changing action for safety.
Automated evaluation on a synthetic 100-listing dataset showing high task success and a timing study showing up to 2x speedup on common tasks.
Key Findings
High automated task success on the synthetic evaluation.
Bulk replies can be much faster with the agent.
Inventory search and renew workflows complete efficiently and often in minimal steps.
Design trades off autonomy for safety via explicit confirmations.
Results
Overall Task Success Rate (automated eval)
Inventory Search success
Renew Listing success
Bulk Reply success
Interaction time (Bulk Messages Reply)
Interaction time (Inventory Search)
Who Should Care
What To Try In 7 Days
Prototype a conversational entry point for one seller workflow (e.g., bulk replies) and measure time saved.
Add a scratchpad-style short-term memory to preserve multi-step state across confirmations.
Wrap three essential platform APIs (search, update listing, messaging) as callable tools for the LLM and test with synthetic scenarios.
Agent Features
Memory
- Scratchpad chronological Thought-Action-Observation log (short-term)
- Ephemeral dialog history (session-based purge)
- Listings Information Memory (title, desc, ID per session)
Planning
- ReAct Thought-Action-Observation planning
- Chain-of-Thought prompting for reasoning
Tool Use
- Listing operation tools (create/update/renew)
- Inventory search tool (marketplace search API)
- Messaging tools (single and bulk)
- RAG-as-Tool for help articles
- ASR front-end for voice
Frameworks
- ReAct
- RAG
- Tool calling
- ASR
Is Agentic
true
Architectures
- Single-step interactive ReAct loop with user confirmation
- LLM core: Llama-4-Maverick-17B-128E-Instruct
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Evaluation uses a synthetic 100-listing dataset and an LLM-based user simulator, which can overestimate real-world performance.
- Both agent and simulator use the same LLM, creating potential evaluation bias.
- Session-based ephemeral memory limits long-term personalization and persistent workflows.
- Single-step confirmation improves safety but increases interaction overhead for users who want full automation.
When Not To Use
- Workflows that require persistent long-term memory across sessions.
- High-risk operations needing strict audit trails without human confirmation.
- Environments with low API reliability or where tool calls are restricted.
Failure Modes
- Misidentifying the target listing from ambiguous user text, especially outside session-stored listings.
- LLM hallucinations when calling tools or synthesizing policy answers without reliable RAG grounding.
- Degraded performance in real-world, noisy user conversations compared to synthetic simulator.
Core Entities
Models
- Llama-4-Maverick-17B-128E-Instruct
Metrics
- Task Success Rate
- Task Optimality Rate
- Interaction Time
- Speedup
Datasets
- synthetic_100_listings_dataset (LLM-generated)

