Route simple queries straight to fast tools; use memory + planner only for complex job-career requests to cut latency and improve accuracy.

Overview

Decision SnapshotReady For Pilot

Pilot deployment with real Walmart logs and statistical tests gives moderate-to-strong evidence. Internal data and closed code limit external reproduction.

Citations0

Evidence Strength0.80

Confidence0.80

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 7/7

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 45%

Authors

Qixin Wang, Dawei Wang, Kun Chen, Yaowei Hu, Puneet Girdhar, Ruoteng Wang, Aadesh Gupta, Chaitanya Devella, Wenlai Guo, Shangwen Huang, Bachir Aoun, Greg Hayworth, Han Li, Xintao Wu

Links

Abstract / PDF

Why It Matters For Business

You can keep advanced agentic reasoning for hard requests while giving fast answers for common lookups. That reduces user wait time and session rounds, which likely raises engagement and lowers operational cost.

Who Should Care

Product Manager ML Engineer Engineering Lead Data Scientist

Summary TLDR

AdaptJobRec is an agentic conversational job recommender built for Walmart. It classifies incoming queries as simple or complex. Simple queries bypass planner/memory and call fast APIs; complex queries use a few-shot memory filter and a nested task planner that groups parallel subtasks. On Walmart data, this routing plus personalization cuts response latency by ~53% and reduces dialogue rounds while slightly improving ranking metrics.

Problem Statement

Agentic conversational recommenders give richer answers but are slow. Simple queries (e.g., 'check application status') waste time if they always trigger planner and memory modules. The paper asks: can we keep agentic reasoning for hard queries while giving fast responses for simple ones?

Main Contribution

AdaptJobRec: an LLM-powered agent that classifies query complexity and routes simple queries directly to fast tools while reserving full agentic flow for complex queries.

Few-shot memory processing module that filters chat history for only relevant segments, reducing redundant planning.

Key Findings

AdaptJobRec cuts average response latency by about half compared to a RAG baseline in pilot users.

NumbersLatency 498 ms vs RAG 1065 ms (≈53% faster)

Practical UseRoute simple queries to tool APIs to halve user wait times; expect biggest gains where many queries are simple status or lookup requests.

Evidence RefTable 5 (Pilot study)

AdaptJobRec reduces the number of conversation rounds needed to get the target information.

NumbersAvg rounds 3.32 vs RAG 7.10 (≈54% fewer rounds)

Practical UseFiltering memory and better planner grouping reduces back-and-forth; this improves UX and lowers session time.

Evidence RefTable 5 (Pilot study)

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Hit@10 (job recommendation)	0.3176	Plan & Execute 0.3127	+0.0049	10,014 Walmart users	Table 1: AdaptJobRec vs baselines	Table 1
NDCG@10 (job recommendation)	0.0810	Plan & Execute 0.0799	+0.0011	10,014 Walmart users	Table 1: AdaptJobRec vs baselines	Table 1

What To Try In 7 Days

Add a lightweight complexity classifier to route simple queries straight to existing APIs.

Implement a small memory filter (few-shot prompt) that returns only relevant chat snippets for complex queries.

Cache frequent tool responses (Redis) to avoid repeated LLM calls for high-frequency lookups.

Agent Features

Memory

Few-shot memory processing module (filters chat history)Integrates profile and recent activity into query

Planning

Task decomposition into nested listsGrouping of asynchronously executable subtasks

Tool Use

Personalized recommendation enginesPredefined Cypher templatesText-to-Cypher generationJob Application Microservice APIs

Frameworks

Model Context Protocol (MCP)People.AI knowledge graph

Is Agentic

Yes

Architectures

LLM-based reasoning agentPlanner + Memory + Tool invocation

Optimization Features

Token Efficiency

Few-shot memory processing to reduce unnecessary context

Infra Optimization

Independent microservices for tools to scale separatelyUse of MCP server for tool execution and Cypher queries

System Optimization

Kafka for streaming and orchestrationRedis caching for frequent queriesCassandra for conversation history

Inference Optimization

Complexity-based routing to avoid planner for simple queriesPlanner parallelization via nested async groupsCache Augmented Generation (Redis) to reuse tool results

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusUnknown

LicenseUnknown

Risks & Boundaries

Limitations

Evaluation uses internal Walmart data and a small pilot (150 sessions); external generalization is untested.

System depends on a rich People.AI knowledge graph — missing or sparse KG coverage will hurt recommendations.

When Not To Use

You lack a structured knowledge graph or matching recommendation APIs.

Most queries are complex and require deep planning in every session (less benefit from routing).

Failure Modes

Misclassifying a complex query as simple leads to incomplete answers.

Memory filter omits relevant history, causing planner to miss necessary context.

Core Entities

Models

Llama-3.1-8B (fine-tuned as Llama-Capa)DeepSeek-R1-Distill-Qwen-7B (fine-tuned as DeepSeek-Capa)AdaptJobRec (system integrating LLM agent + tools)

Metrics

Hit@10NDCG@10MAP@10Average response latency (s / ms)Average conversation roundsStatistical significance (Welch's t-test, p-values)

Datasets

Walmart Job Recommendation logs (10,014 users)Walmart job transition records (932,854 training)Walmart job transition test set (471,495 records, 2024)

Benchmarks

Job Recommendation (Hit@10, NDCG@10, MAP@10)Career Path Prediction (Hit real transitions, latency)Pilot user study (conversation rounds, response latency)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

AdaptJobRec cuts average response latency by about half compared to a RAG baseline in pilot users.

AdaptJobRec reduces the number of conversation rounds needed to get the target information.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Survey: Reframe LLMs as agents that plan, act, and continually learn

Key finding

Reference architecture, multi-agent taxonomy, and enterprise hardening for LLM agents

Key finding

Systematizes reusable 'agentic skills' for LLM agents, their lifecycle, design patterns, risks, and evaluation

Key finding

A closed-loop Sensing→Regulating→Correcting system that routes LLM execution by uncertainty to cut errors and API cost

Key finding

Diffusion-backed agents match accuracy but run ~30% faster and can reach up to 8× speedups in some cases

Key finding