Route simple queries straight to fast tools; use memory + planner only for complex job-career requests to cut latency and improve accuracy.

August 19, 20258 min

Overview

Production Readiness

0.7

Novelty Score

0.45

Cost Impact Score

0.6

Citation Count

0

Authors

Qixin Wang, Dawei Wang, Kun Chen, Yaowei Hu, Puneet Girdhar, Ruoteng Wang, Aadesh Gupta, Chaitanya Devella, Wenlai Guo, Shangwen Huang, Bachir Aoun, Greg Hayworth, Han Li, Xintao Wu

Links

Abstract / PDF

Why It Matters For Business

You can keep advanced agentic reasoning for hard requests while giving fast answers for common lookups. That reduces user wait time and session rounds, which likely raises engagement and lowers operational cost.

Summary TLDR

AdaptJobRec is an agentic conversational job recommender built for Walmart. It classifies incoming queries as simple or complex. Simple queries bypass planner/memory and call fast APIs; complex queries use a few-shot memory filter and a nested task planner that groups parallel subtasks. On Walmart data, this routing plus personalization cuts response latency by ~53% and reduces dialogue rounds while slightly improving ranking metrics.

Problem Statement

Agentic conversational recommenders give richer answers but are slow. Simple queries (e.g., 'check application status') waste time if they always trigger planner and memory modules. The paper asks: can we keep agentic reasoning for hard queries while giving fast responses for simple ones?

Main Contribution

AdaptJobRec: an LLM-powered agent that classifies query complexity and routes simple queries directly to fast tools while reserving full agentic flow for complex queries.

Few-shot memory processing module that filters chat history for only relevant segments, reducing redundant planning.

Task decomposition planner that outputs nested sub-task lists and groups subtasks that can run in parallel to save time.

End-to-end deployment design integrating People.AI knowledge graph (1.6M nodes, 83M edges), Redis caching, Kafka streaming, and Cypher-based tools.

Evaluation on real Walmart production data (job rec and career path tasks) with statistical tests showing latency and accuracy gains.

Key Findings

AdaptJobRec cuts average response latency by about half compared to a RAG baseline in pilot users.

NumbersLatency 498 ms vs RAG 1065 ms (≈53% faster)

AdaptJobRec reduces the number of conversation rounds needed to get the target information.

NumbersAvg rounds 3.32 vs RAG 7.10 (≈54% fewer rounds)

Job ranking metrics improve modestly over strong agentic baselines.

NumbersHit@10 0.3176 vs Plan&Execute 0.3127 (+0.0049)

Career path prediction accuracy is higher than tuned LLM baselines while keeping low latency.

NumbersHit real transitions 12.82% vs DeepSeek-Capa 11.24 (+1.58 pts); latency 0.36s vs 0.81s

Results

Hit@10 (job recommendation)

Value0.3176

BaselinePlan & Execute 0.3127

NDCG@10 (job recommendation)

Value0.0810

BaselinePlan & Execute 0.0799

MAP@10 (job recommendation)

Value0.0371

BaselinePlan & Execute 0.0364

Hit (real transitions) (career path)

Value12.82%

BaselineDeepSeek-Capa 11.24%

Latency (career path)

Value0.36 s

BaselineDeepSeek-Capa 0.81 s

Average conversation rounds (pilot)

Value3.32

BaselineRAG 7.10

Average response latency (pilot)

Value498 ms

BaselineRAG 1065 ms

Who Should Care

What To Try In 7 Days

Add a lightweight complexity classifier to route simple queries straight to existing APIs.

Implement a small memory filter (few-shot prompt) that returns only relevant chat snippets for complex queries.

Cache frequent tool responses (Redis) to avoid repeated LLM calls for high-frequency lookups.

Agent Features

Memory

  • Few-shot memory processing module (filters chat history)
  • Integrates profile and recent activity into query

Planning

  • Task decomposition into nested lists
  • Grouping of asynchronously executable subtasks

Tool Use

  • Personalized recommendation engines
  • Predefined Cypher templates
  • Text-to-Cypher generation
  • Job Application Microservice APIs

Frameworks

  • Model Context Protocol (MCP)
  • People.AI knowledge graph

Is Agentic

true

Architectures

  • LLM-based reasoning agent
  • Planner + Memory + Tool invocation

Optimization Features

Token Efficiency

  • Few-shot memory processing to reduce unnecessary context

Infra Optimization

  • Independent microservices for tools to scale separately
  • Use of MCP server for tool execution and Cypher queries

System Optimization

  • Kafka for streaming and orchestration
  • Redis caching for frequent queries
  • Cassandra for conversation history

Inference Optimization

  • Complexity-based routing to avoid planner for simple queries
  • Planner parallelization via nested async groups
  • Cache Augmented Generation (Redis) to reuse tool results

Reproducibility

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • Evaluation uses internal Walmart data and a small pilot (150 sessions); external generalization is untested.
  • System depends on a rich People.AI knowledge graph — missing or sparse KG coverage will hurt recommendations.
  • Complexity classifier errors can misroute queries, causing either unnecessary latency or lower-quality answers.

When Not To Use

  • You lack a structured knowledge graph or matching recommendation APIs.
  • Most queries are complex and require deep planning in every session (less benefit from routing).
  • You need an open-source, reproducible baseline (paper uses internal data and closed deployment).

Failure Modes

  • Misclassifying a complex query as simple leads to incomplete answers.
  • Memory filter omits relevant history, causing planner to miss necessary context.
  • Text-to-Cypher can produce incorrect queries if the schema or prompt is off.

Core Entities

Models

  • Llama-3.1-8B (fine-tuned as Llama-Capa)
  • DeepSeek-R1-Distill-Qwen-7B (fine-tuned as DeepSeek-Capa)
  • AdaptJobRec (system integrating LLM agent + tools)

Metrics

  • Hit@10
  • NDCG@10
  • MAP@10
  • Average response latency (s / ms)
  • Average conversation rounds
  • Statistical significance (Welch's t-test, p-values)

Datasets

  • Walmart Job Recommendation logs (10,014 users)
  • Walmart job transition records (932,854 training)
  • Walmart job transition test set (471,495 records, 2024)

Benchmarks

  • Job Recommendation (Hit@10, NDCG@10, MAP@10)
  • Career Path Prediction (Hit real transitions, latency)
  • Pilot user study (conversation rounds, response latency)