Use a personal causal graph so an LLM recommends foods that better lower your post-meal glucose

February 28, 20257 min

Overview

Production Readiness

0.4

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

1

Authors

Zhongqi Yang, Amir Rahmani

Links

Abstract / PDF

Why It Matters For Business

Personalized causal reasoning makes LLM-driven dietary advice measurably more tailored and stable for multi-hour glucose control; that can improve product trust and clinical usefulness compared to one-size-fits-all LLM responses.

Summary TLDR

The paper presents Personalized Causal Graph Reasoning: build an individualized causal graph from a person's glucose, activity, and meal logs, let an LLM traverse and rank causal paths, retrieve candidate foods, and verify suggestions by simulating counterfactuals on a fuller personal graph. Implemented for glucose control using CGM and meal records (34 users). Counterfactual evaluation shows large mean glucose reduction (MGR) gains at 1h and 2h versus retrieval baselines; LLM-as-a-judge (Llama-3 70B) preferred the method 98.43% of the time and humans 86.5%. Main caveats: evaluation is simulation-based (observational data), small cohort, and no prospective clinical trial.

Problem Statement

General LLMs give generic dietary advice because they reason from population-level correlations. This fails when individuals have unique metabolic patterns. The paper aims to make LLM recommendations individualized by reasoning over a person-specific causal graph built from their longitudinal data.

Main Contribution

Introduce Personalized Causal Graph Reasoning: combine a person's causal graph with an LLM that traverses and ranks causal paths to generate tailored interventions.

Implement the framework for dietary recommendations using CGM, activity, and meal logs; verify recommendations via counterfactual simulation on a full-data personal graph.

Evaluate results quantitatively (Mean Glucose Reduction across 30m/1h/2h) and qualitatively with LLM-as-a-judge and human blind comparisons.

Key Findings

Personalized causal-graph method outperforms RAG baselines on longer horizons.

Numbers1h MGR 158.21 vs ChatDiet 120.45 (p=0.046); 2h MGR 411.56 vs 307.12 (p≈1e-4).

Outputs judged far more personalized by automatic and human judges.

NumbersLLM-as-a-judge win rate 98.43%; human win rate 86.50%.

Removing the personal causal graph collapses long-term performance.

Numbers2h MGR proposed 411.56 vs sole LLM −149.89 (p≈1e-8 to 1e-10).

Verification and path-ranking change stability and mean performance.

NumbersRemoving verification: 1h MGR drops 158.21 → 116.40 (p=0.028); removing path ranking increases variance and lowers mean

Results

30 min MGR (mean [95% CI])

Value19.84 [9.12, 30.56]

BaselineChatDiet 33.92 [20.70, 47.14]

1 hour MGR (mean [95% CI])

Value158.21 [137.40, 179.02]

BaselineChatDiet 120.45 [90.11, 150.79]

2 hour MGR (mean [95% CI])

Value411.56 [385.07, 438.05]

BaselineChatDiet 307.12 [264.44, 349.80]

Who Should Care

What To Try In 7 Days

Build a tiny personal causal graph for one test user from a week of CGM+meal logs and see if LLM-guided recommendations differ from your current rules.

Add a path-ranking step (edge weights × historical usage) and compare top-5 nutrient drivers versus simple correlation ranking.

Implement a counterfactual simulator on full-user data to validate a small set of recommendations before rollout.

Agent Features

Memory

  • uses longitudinal personal data (CGM, meal logs, MET)

Planning

  • graph traversal to find causal paths
  • path ranking to prioritize interventions

Tool Use

  • external food nutrient database retrieval
  • counterfactual simulation using a full-data causal graph

Frameworks

  • Personalized Causal Graph Reasoning

Is Agentic

true

Architectures

  • LLM + per-user causal graph
  • RAG-style external retrieval

Reproducibility

Data Urls

  • CGMacros (referenced as public dataset; exact URL not provided in paper)

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Evaluation uses counterfactual simulation on observational data, not prospective clinical trials.
  • Small cohort (34 users used) limits population generality.
  • Important confounders like sleep, stress, hormones, and microbiome were not modeled.
  • Causal estimates rely on observational PC+linear SCM assumptions and can be biased by unmeasured confounding.
  • Computational demands of graph construction and LLM prompting may hinder large-scale deployment.

When Not To Use

  • No or very limited longitudinal personal data available for a user.
  • Acute clinical decision-making where randomized trial evidence is required.
  • Multi-objective cases where glucose is only one of several competing clinical priorities without a multi-objective extension.

Failure Modes

  • Incorrect causal edges from PC algorithm due to confounding produce bad recommendations.
  • LLM misreads or misapplies the causal summary and suggests inconsistent foods.
  • User non-adherence or portion-size differences invalidate simulated benefits.
  • Missing physiological covariates lead to overconfident counterfactuals.

Core Entities

Models

  • GPT-4o
  • Llama-3 70B

Metrics

  • Mean Glucose Reduction (MGR)
  • iAUC (incremental Area Under Curve)

Datasets

  • CGMacros (personal CGM + meal + MET data, 34 participants used)