Make LLM recommenders auditable: proposer LLM + deterministic verifier + repair

January 14, 20266 min

Overview

Decision SnapshotReady For Pilot

The idea is straightforward to integrate: combine an existing ranker with a verifier and repair. Empirical evidence on MovieLens shows strong compliance with small utility cost, but success depends on candidate window and metadata.

Citations0

Evidence Strength0.80

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 2/3

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 30%

Production readiness: 80%

Novelty: 60%

Authors

Aradhya Dixit, Shreem Dixit

Links

Abstract / PDF / Data

Why It Matters For Business

PCN-Rec turns persuasive LLM outputs into auditable, machine-checked recommendations so platforms can guarantee and log policy compliance without trusting LLM explanations.

Who Should Care

Summary TLDR

PCN-Rec is a practical pipeline that keeps LLMs as creative proposers but enforces hard governance rules with deterministic code. A base recommender supplies top-W candidates, two agents (User Advocate and Policy Agent) negotiate, a mediator LLM proposes a Top-N slate plus a JSON certificate, and a deterministic verifier checks the certificate. Failures trigger a deterministic constrained-greedy repair and an auditable log. On MovieLens-100K with per-slate head/tail and genre constraints, PCN-Rec reaches 98.5% pass rate on feasible users with only ~0.02 absolute NDCG@10 loss versus a one-shot LLM.

Problem Statement

LLM-based recommenders can generate persuasive slates but often fail strict, auditable governance constraints (e.g., minimum long-tail exposure, diversity). Platforms need a way to keep LLM flexibility while guaranteeing and logging compliance in a machine-checkable way.

Main Contribution

Proof-carrying negotiation interface: LLM proposes slates plus a JSON certificate; a deterministic verifier enforces correctness.

Feasibility analysis by candidate window: separate true infeasibility from proposer failure using a top-W candidate bound.

Key Findings

On MovieLens-100K, selecting W=80 gave 551 of 943 users with at least one compliant slate inside the window.

Numbers551 / 943 users (W=80)

Practical UseBefore deploying verifier-only guarantees, run a window feasibility sweep and pick W so most users remain feasible.

Evidence RefSection 3, Figure 2

For users feasible within W=80, PCN-Rec produced verifier-checked slates at a 0.985 pass rate versus 0.000 for a single-LLM baseline.

NumbersPassRate (feasible): PCN-Rec 0.985 vs Single LLM 0.000

Practical UseUse deterministic verification to eliminate silent policy violations for feasible cases; without it, one-shot LLMs can violate rules systematically.

Evidence RefTable 1, Section 3.6

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Governance pass rate (feasible users)0.985Single LLM: 0.0000.985 vs baselineMovieLens-100K, feasible users (n=551, W=80)PCN-Rec verifier-checked passes at 0.985 on feasible subsetTable 1
NDCG@10 (utility)0.403Single LLM: 0.424-0.022 absoluteMovieLens-100K, feasible users (n=551, W=80)Small utility drop when enforcing deterministic constraintsTable 1

What To Try In 7 Days

Run a candidate-window feasibility sweep (vary W) on your data and report feasible user fraction.

Implement a simple verifier that recomputes a few codified constraints from item metadata.

Add a deterministic greedy repair that fills constraints then maximizes relevance and log every change.

Agent Features

Memory
Short-term candidate window (Top-W)
Planning
Mediator synthesizes final TopN slate from agent arguments
Tool Use
Deterministic verifierJSON certificateDeterministic repair
Is Agentic

Yes

Architectures
User Advocate + Policy Agent + LLM mediator
Collaboration
Adversarial negotiation between two agents mediated by LLM

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

MovieLens-100K (public dataset referenced; see Harper & Konstan 2015)

Risks & Boundaries

Limitations

Guarantees hold only if a compliant slate exists inside the chosen Top-W candidate window.

Verifier correctness depends on accurate item metadata and formalized constraints.

When Not To Use

When candidate generator cannot supply a feasible set within practical W.

When item metadata or policy definitions are noisy or subjective.

Failure Modes

LLM proposes a certificate inconsistent with the slate, triggering repair.

Incorrect metadata causes false accepts or false rejects by the verifier.

Core Entities

Models

MF/CF (base recommender)Mediator LLM (unspecified)

Metrics

NDCG@10Governance pass rate (verifier-checked)

Datasets

MovieLens-100K