Make LLM recommenders auditable: proposer LLM + deterministic verifier + repair

January 14, 20266 min

Overview

Production Readiness

0.8

Novelty Score

0.6

Cost Impact Score

0.3

Citation Count

0

Authors

Aradhya Dixit, Shreem Dixit

Links

Abstract / PDF

Why It Matters For Business

PCN-Rec turns persuasive LLM outputs into auditable, machine-checked recommendations so platforms can guarantee and log policy compliance without trusting LLM explanations.

Summary TLDR

PCN-Rec is a practical pipeline that keeps LLMs as creative proposers but enforces hard governance rules with deterministic code. A base recommender supplies top-W candidates, two agents (User Advocate and Policy Agent) negotiate, a mediator LLM proposes a Top-N slate plus a JSON certificate, and a deterministic verifier checks the certificate. Failures trigger a deterministic constrained-greedy repair and an auditable log. On MovieLens-100K with per-slate head/tail and genre constraints, PCN-Rec reaches 98.5% pass rate on feasible users with only ~0.02 absolute NDCG@10 loss versus a one-shot LLM.

Problem Statement

LLM-based recommenders can generate persuasive slates but often fail strict, auditable governance constraints (e.g., minimum long-tail exposure, diversity). Platforms need a way to keep LLM flexibility while guaranteeing and logging compliance in a machine-checkable way.

Main Contribution

Proof-carrying negotiation interface: LLM proposes slates plus a JSON certificate; a deterministic verifier enforces correctness.

Feasibility analysis by candidate window: separate true infeasibility from proposer failure using a top-W candidate bound.

Empirical validation: near-perfect governance compliance on feasible MovieLens-100K users with small utility loss.

Key Findings

On MovieLens-100K, selecting W=80 gave 551 of 943 users with at least one compliant slate inside the window.

Numbers551 / 943 users (W=80)

For users feasible within W=80, PCN-Rec produced verifier-checked slates at a 0.985 pass rate versus 0.000 for a single-LLM baseline.

NumbersPassRate (feasible): PCN-Rec 0.985 vs Single LLM 0.000

Enforcing verification caused a small utility drop: NDCG@10 fell from 0.424 (Single LLM) to 0.403 (PCN-Rec).

NumbersΔ NDCG@10 = -0.022 (0.424 → 0.403)

Results

Governance pass rate (feasible users)

Value0.985

BaselineSingle LLM: 0.000

NDCG@10 (utility)

Value0.403

BaselineSingle LLM: 0.424

Feasible users within window

Value551 / 943

Who Should Care

What To Try In 7 Days

Run a candidate-window feasibility sweep (vary W) on your data and report feasible user fraction.

Implement a simple verifier that recomputes a few codified constraints from item metadata.

Add a deterministic greedy repair that fills constraints then maximizes relevance and log every change.

Agent Features

Memory

  • Short-term candidate window (Top-W)

Planning

  • Mediator synthesizes final TopN slate from agent arguments

Tool Use

  • Deterministic verifier
  • JSON certificate
  • Deterministic repair

Is Agentic

true

Architectures

  • User Advocate + Policy Agent + LLM mediator

Collaboration

  • Adversarial negotiation between two agents mediated by LLM

Reproducibility

Data Urls

  • MovieLens-100K (public dataset referenced; see Harper & Konstan 2015)

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Guarantees hold only if a compliant slate exists inside the chosen Top-W candidate window.
  • Verifier correctness depends on accurate item metadata and formalized constraints.
  • Deterministic repair can reduce utility versus an optimal constrained ranking.

When Not To Use

  • When candidate generator cannot supply a feasible set within practical W.
  • When item metadata or policy definitions are noisy or subjective.
  • When explanations themselves must be verified but are not codified.

Failure Modes

  • LLM proposes a certificate inconsistent with the slate, triggering repair.
  • Incorrect metadata causes false accepts or false rejects by the verifier.
  • Repair produces acceptable compliance but noticeably lower user utility.

Core Entities

Models

  • MF/CF (base recommender)
  • Mediator LLM (unspecified)

Metrics

  • NDCG@10
  • Governance pass rate (verifier-checked)

Datasets

  • MovieLens-100K