ChatCRS: add a knowledge retriever and a goal planner to make LLMs useful conversational recommenders

May 3, 20247 min

Overview

Decision SnapshotNeeds Validation

The approach is practical: add a retriever and small LoRA planner to an LLM and feed results via prompts. Evidence is automatic metrics and a small human study on two CRS datasets; limits include one-hop retrieval and few-shot evaluation.

Citations2

Evidence Strength0.80

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 5/5

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 40%

Production readiness: 60%

Novelty: 70%

Authors

Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, Haizhou Li

Links

Abstract / PDF / Code / Data

Why It Matters For Business

If you want LLMs to make real product recommendations in a specific domain, wrap them with a KB retriever and a goal planner; that combination turns an LLM from brittle zero-shot text generator into a materially better recommender on evaluated datasets.

Who Should Care

Summary TLDR

LLMs alone struggle for domain-specific conversational recommendation. ChatCRS is a modular framework that wraps an LLM with (1) a relation-based knowledge retrieval agent and (2) a goal-planning agent (LoRA fine-tuned). Both agents feed external inputs into few-shot in-context prompts. On two multi-goal Chinese CRS datasets (DuRecDial, TG-Redial) ChatCRS raises human-rated informativeness (~+17%) and proactivity (~+27%) and improves recommendation NDCG/MRR over few-shot LLM baselines by roughly an order of magnitude, approaching fully trained baselines.

Problem Statement

Large LLMs produce fluent text but lack reliable domain facts and explicit dialogue goals needed for conversational recommendation. Without external knowledge and goal guidance they give wrong facts, poor recommendations, or unproductive dialog turns in domain-specific CRS.

Main Contribution

Empirical study showing external knowledge and explicit goals are necessary to make LLMs work for conversational recommendation in a domain (Chinese movies).

ChatCRS: a three-agent design—relation-based knowledge retriever, LoRA-based goal planner, and an LLM conversational agent—that adds knowledge and goals without heavy LLM fine-tuning.

Key Findings

External knowledge massively improves recommendation ranking for LLMs on DuRecDial.

NumbersChatGPT NDCG@10: DG 0.024 -> Oracle 0.617 (DuRecDial, Table 1)

Practical UseIf you need usable item rankings from LLMs in a new domain, add a KB retrieval step; LLM-only few-shot setups are not enough on this dataset.

Evidence RefTable 1

Goal guidance and knowledge together improve response quality and dialog flow.

NumbersHuman scores: ChatCRS Info 1.76 vs ChatGPT 1.50 (+0.26, ≈17%); Pro 1.69 vs 1.30 (+0.39, ≈30%) (Table 6)

Practical UseUse a goal predictor to steer LLM replies toward proactive recommendations and use retrieved facts to keep responses informative and factual.

Evidence RefTable 6

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
NDCG@10ChatCRS 0.549ChatGPT 0.024 (3-shot)+0.525DuRecDialTable 5 shows ChatCRS 0.549 vs ChatGPT 0.024 on DuRecDialTable 5
MRR@10ChatCRS 0.543ChatGPT 0.018 (3-shot)+0.525DuRecDialTable 5 shows ChatCRS 0.543 vs ChatGPT 0.018 on DuRecDialTable 5

What To Try In 7 Days

Add a lightweight relation-based KB retriever that returns entity-relation triples and feed top triples into the LLM prompt.

Fine-tune a small goal planner via LoRA on your dialog-goal labels and use it to steer LLM replies.

Run a small human evaluation (100 dialogs) measuring informativeness and proactivity before/after adding KB+goals.

Agent Features

Memory
short-term dialog history via prompts
Planning
goal planning for next-turn dialogue goalrelation-based planning to choose KB relation
Tool Use
KB retrieval agentLoRAICL prompting as tool interface
Frameworks
tool-augmented LLM (LLM calls agents)in-context learning (ICL) orchestration
Is Agentic

Yes

Architectures
multi-agent (retriever + planner + conversational LLM)
Collaboration
agents coordinate: planner and retriever feed LLM

Optimization Features

Token Efficiency
limit to 50 item-based triples due to prompt token length
Infra Optimization
runs evaluated on single A100 or OpenAI API (cost ≈ US$20 for dataset)
Model Optimization
LoRA
System Optimization
agent decomposition to reduce LLM input load
Training Optimization
LoRA
Inference Optimization
few-shot in-context learning (3-shot) to avoid full LLM fine-tune

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Code URLs

Git4ChatCRS (paper states code publicly available at Git4ChatCRS)

Data URLs

DuRecDialTG-RedialKBCNpedia (cited for TG-Redial)

Risks & Boundaries

Limitations

Experiments focus on Chinese movie datasets; results may not generalize to other domains.

Knowledge retrieval is single-hop only; multi-hop needs are untested.

When Not To Use

If you require multi-hop reasoning across many KB hops.

For production systems that require full, collaborative-filtering recommendations based on rich user logs without KB signals.

Failure Modes

Incorrect relation selection yields wrong retrieved facts and factual errors in replies.

Goal planner misprediction leads to unproductive dialog turns or wrong recommendations.

Core Entities

Models

ChatCRSChatGPT (gpt-3.5-turbo variants)LLaMA-7bLLaMA-13bUniMINDMGCGTPNetSASRec

Metrics

NDCG@10NDCG@50MRR@10MRR@50BLEU-1BLEU-2F1Dist-1/2Human: Fluency/Coherence/Informativeness/Proactivity

Datasets

DuRecDialTG-RedialKBCNpedia (used for TG-Redial)

Benchmarks

DuRecDial CRS benchmarkTG-Redial CRS benchmark