Make LLM recommenders auditable: proposer LLM + deterministic verifier + repair

Overview

Decision SnapshotReady For Pilot

The idea is straightforward to integrate: combine an existing ranker with a verifier and repair. Empirical evidence on MovieLens shows strong compliance with small utility cost, but success depends on candidate window and metadata.

Citations0

Evidence Strength0.80

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 2/3

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 30%

Production readiness: 80%

Novelty: 60%

Authors

Aradhya Dixit, Shreem Dixit

Links

Abstract / PDF / Data

Why It Matters For Business

PCN-Rec turns persuasive LLM outputs into auditable, machine-checked recommendations so platforms can guarantee and log policy compliance without trusting LLM explanations.

Who Should Care

Product Manager Engineering Lead ML Engineer Data Scientist CTO

Summary TLDR

PCN-Rec is a practical pipeline that keeps LLMs as creative proposers but enforces hard governance rules with deterministic code. A base recommender supplies top-W candidates, two agents (User Advocate and Policy Agent) negotiate, a mediator LLM proposes a Top-N slate plus a JSON certificate, and a deterministic verifier checks the certificate. Failures trigger a deterministic constrained-greedy repair and an auditable log. On MovieLens-100K with per-slate head/tail and genre constraints, PCN-Rec reaches 98.5% pass rate on feasible users with only ~0.02 absolute NDCG@10 loss versus a one-shot LLM.

Problem Statement

LLM-based recommenders can generate persuasive slates but often fail strict, auditable governance constraints (e.g., minimum long-tail exposure, diversity). Platforms need a way to keep LLM flexibility while guaranteeing and logging compliance in a machine-checkable way.

Main Contribution

Proof-carrying negotiation interface: LLM proposes slates plus a JSON certificate; a deterministic verifier enforces correctness.

Feasibility analysis by candidate window: separate true infeasibility from proposer failure using a top-W candidate bound.

Key Findings

On MovieLens-100K, selecting W=80 gave 551 of 943 users with at least one compliant slate inside the window.

Numbers551 / 943 users (W=80)

Practical UseBefore deploying verifier-only guarantees, run a window feasibility sweep and pick W so most users remain feasible.

Evidence RefSection 3, Figure 2

For users feasible within W=80, PCN-Rec produced verifier-checked slates at a 0.985 pass rate versus 0.000 for a single-LLM baseline.

NumbersPassRate (feasible): PCN-Rec 0.985 vs Single LLM 0.000

Practical UseUse deterministic verification to eliminate silent policy violations for feasible cases; without it, one-shot LLMs can violate rules systematically.

Evidence RefTable 1, Section 3.6

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Governance pass rate (feasible users)	0.985	Single LLM: 0.000	↑0.985 vs baseline	MovieLens-100K, feasible users (n=551, W=80)	PCN-Rec verifier-checked passes at 0.985 on feasible subset	Table 1
NDCG@10 (utility)	0.403	Single LLM: 0.424	-0.022 absolute	MovieLens-100K, feasible users (n=551, W=80)	Small utility drop when enforcing deterministic constraints	Table 1

What To Try In 7 Days

Run a candidate-window feasibility sweep (vary W) on your data and report feasible user fraction.

Implement a simple verifier that recomputes a few codified constraints from item metadata.

Add a deterministic greedy repair that fills constraints then maximizes relevance and log every change.

Agent Features

Memory

Short-term candidate window (Top-W)

Planning

Mediator synthesizes final TopN slate from agent arguments

Tool Use

Deterministic verifierJSON certificateDeterministic repair

Is Agentic

Yes

Architectures

User Advocate + Policy Agent + LLM mediator

Collaboration

Adversarial negotiation between two agents mediated by LLM

Reproducibility

Code AvailableNo

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Data URLs

MovieLens-100K (public dataset referenced; see Harper & Konstan 2015)

Risks & Boundaries

Limitations

Guarantees hold only if a compliant slate exists inside the chosen Top-W candidate window.

Verifier correctness depends on accurate item metadata and formalized constraints.

When Not To Use

When candidate generator cannot supply a feasible set within practical W.

When item metadata or policy definitions are noisy or subjective.

Failure Modes

LLM proposes a certificate inconsistent with the slate, triggering repair.

Incorrect metadata causes false accepts or false rejects by the verifier.

Core Entities

Models

MF/CF (base recommender)Mediator LLM (unspecified)

Metrics

NDCG@10Governance pass rate (verifier-checked)

Datasets

MovieLens-100K

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

On MovieLens-100K, selecting W=80 gave 551 of 943 users with at least one compliant slate inside the window.

For users feasible within W=80, PCN-Rec produced verifier-checked slates at a 0.985 pass rate versus 0.000 for a single-LLM baseline.

Results

What To Try In 7 Days

Agent Features

Reproducibility

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

Key finding

A dynamic town simulation that tests LLM agents on doing tasks while following local cultural norms

Key finding

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding