Route simple queries straight to fast tools; use memory + planner only for complex job-career requests to cut latency and improve accuracy.

August 19, 20258 min

Overview

Decision SnapshotReady For Pilot

Pilot deployment with real Walmart logs and statistical tests gives moderate-to-strong evidence. Internal data and closed code limit external reproduction.

Citations0

Evidence Strength0.80

Confidence0.80

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 7/7

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 45%

Authors

Qixin Wang, Dawei Wang, Kun Chen, Yaowei Hu, Puneet Girdhar, Ruoteng Wang, Aadesh Gupta, Chaitanya Devella, Wenlai Guo, Shangwen Huang, Bachir Aoun, Greg Hayworth, Han Li, Xintao Wu

Links

Abstract / PDF

Why It Matters For Business

You can keep advanced agentic reasoning for hard requests while giving fast answers for common lookups. That reduces user wait time and session rounds, which likely raises engagement and lowers operational cost.

Who Should Care

Summary TLDR

AdaptJobRec is an agentic conversational job recommender built for Walmart. It classifies incoming queries as simple or complex. Simple queries bypass planner/memory and call fast APIs; complex queries use a few-shot memory filter and a nested task planner that groups parallel subtasks. On Walmart data, this routing plus personalization cuts response latency by ~53% and reduces dialogue rounds while slightly improving ranking metrics.

Problem Statement

Agentic conversational recommenders give richer answers but are slow. Simple queries (e.g., 'check application status') waste time if they always trigger planner and memory modules. The paper asks: can we keep agentic reasoning for hard queries while giving fast responses for simple ones?

Main Contribution

AdaptJobRec: an LLM-powered agent that classifies query complexity and routes simple queries directly to fast tools while reserving full agentic flow for complex queries.

Few-shot memory processing module that filters chat history for only relevant segments, reducing redundant planning.

Key Findings

AdaptJobRec cuts average response latency by about half compared to a RAG baseline in pilot users.

NumbersLatency 498 ms vs RAG 1065 ms (≈53% faster)

Practical UseRoute simple queries to tool APIs to halve user wait times; expect biggest gains where many queries are simple status or lookup requests.

Evidence RefTable 5 (Pilot study)

AdaptJobRec reduces the number of conversation rounds needed to get the target information.

NumbersAvg rounds 3.32 vs RAG 7.10 (≈54% fewer rounds)

Practical UseFiltering memory and better planner grouping reduces back-and-forth; this improves UX and lowers session time.

Evidence RefTable 5 (Pilot study)

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Hit@10 (job recommendation)0.3176Plan & Execute 0.3127+0.004910,014 Walmart usersTable 1: AdaptJobRec vs baselinesTable 1
NDCG@10 (job recommendation)0.0810Plan & Execute 0.0799+0.001110,014 Walmart usersTable 1: AdaptJobRec vs baselinesTable 1

What To Try In 7 Days

Add a lightweight complexity classifier to route simple queries straight to existing APIs.

Implement a small memory filter (few-shot prompt) that returns only relevant chat snippets for complex queries.

Cache frequent tool responses (Redis) to avoid repeated LLM calls for high-frequency lookups.

Agent Features

Memory
Few-shot memory processing module (filters chat history)Integrates profile and recent activity into query
Planning
Task decomposition into nested listsGrouping of asynchronously executable subtasks
Tool Use
Personalized recommendation enginesPredefined Cypher templatesText-to-Cypher generationJob Application Microservice APIs
Frameworks
Model Context Protocol (MCP)People.AI knowledge graph
Is Agentic

Yes

Architectures
LLM-based reasoning agentPlanner + Memory + Tool invocation

Optimization Features

Token Efficiency
Few-shot memory processing to reduce unnecessary context
Infra Optimization
Independent microservices for tools to scale separatelyUse of MCP server for tool execution and Cypher queries
System Optimization
Kafka for streaming and orchestrationRedis caching for frequent queriesCassandra for conversation history
Inference Optimization
Complexity-based routing to avoid planner for simple queriesPlanner parallelization via nested async groupsCache Augmented Generation (Redis) to reuse tool results

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

Evaluation uses internal Walmart data and a small pilot (150 sessions); external generalization is untested.

System depends on a rich People.AI knowledge graph — missing or sparse KG coverage will hurt recommendations.

When Not To Use

You lack a structured knowledge graph or matching recommendation APIs.

Most queries are complex and require deep planning in every session (less benefit from routing).

Failure Modes

Misclassifying a complex query as simple leads to incomplete answers.

Memory filter omits relevant history, causing planner to miss necessary context.

Core Entities

Models

Llama-3.1-8B (fine-tuned as Llama-Capa)DeepSeek-R1-Distill-Qwen-7B (fine-tuned as DeepSeek-Capa)AdaptJobRec (system integrating LLM agent + tools)

Metrics

Hit@10NDCG@10MAP@10Average response latency (s / ms)Average conversation roundsStatistical significance (Welch's t-test, p-values)

Datasets

Walmart Job Recommendation logs (10,014 users)Walmart job transition records (932,854 training)Walmart job transition test set (471,495 records, 2024)

Benchmarks

Job Recommendation (Hit@10, NDCG@10, MAP@10)Career Path Prediction (Hit real transitions, latency)Pilot user study (conversation rounds, response latency)