Use modal logic + Kripke belief states to constrain LMs and produce verifiable autonomous diagnostics

September 15, 20257 min

Overview

Production Readiness

0.4

Novelty Score

0.65

Cost Impact Score

0.5

Citation Count

0

Authors

Antonin Sulc, Thorsten Hellert

Links

Abstract / PDF

Why It Matters For Business

Formal symbolic rules combined with LMs reduce risky hallucinations in diagnostics, making automated troubleshooting auditable and safer for critical infrastructure.

Summary TLDR

The paper presents a neuro-symbolic multi-agent system that pairs language models (LMs) as hypothesis generators with formally defined belief states (Kripke models) and a library of expert axioms in modal logic. LMs classify anomalies into a fixed set of propositions; symbolic checks reject logically or physically impossible hypotheses. In a simulated particle-accelerator sector the system solved three diagnostic scenarios (including cascading failures) by pruning impossible causes and isolating roots. Code is public. This is a systems-level proof-of-concept, not a full real-world deployment.

Problem Statement

Language models can suggest plausible-sounding but logically or physically impossible diagnoses (hallucinations). Critical control systems need diagnoses that are verifiable, physically consistent, and auditable. The gap is how to combine LM semantic power with formal checks so agents remain autonomous but not unsafe.

Main Contribution

A neuro-symbolic multi-agent architecture that stores each agent's beliefs as a Kripke model (possible-world representation) and uses modal logic operators (□ necessity, ♢ possibility).

A practical loop: perception → LM hypothesis (constrained JSON classification) → deterministic mapping to atomic propositions → symbolic validation against expert axioms → Kripke-model update.

A set of modal-logic axioms (expert rules) that encode immutable physical constraints to block impossible causal theories and prune hypothesis space.

An experimental demonstration on a simplified particle-accelerator simulation with three diagnostic scenarios showing correct root-cause isolation.

Open-source code release to reproduce the simulated experiments.

Key Findings

The system produced correct end-to-end diagnoses on all simulated test scenarios.

Numbers3/3 scenarios solved (cascading, direct causal, confounded)

Modal axioms successfully prevented physically impossible causal theories.

Structured prompting plus deterministic mapping constrained LM outputs to a fixed set of atomic propositions.

Results

End-to-end correct diagnoses

Value3/3 simulated scenarios (100%)

Who Should Care

What To Try In 7 Days

Clone the repo and run the simulator to reproduce the three scenarios.

Define a small fixed vocabulary of atomic propositions for a sub-system you care about.

Add a few high-confidence axioms that capture unavoidable causal directions and test LM outputs with structured JSON prompts.

Agent Features

Memory

  • beliefs stored as Kripke models (possible-world representation)
  • short-term model updates on each verified hypothesis

Planning

  • hierarchical reasoning
  • task decomposition (component-level monitoring)

Tool Use

  • language model as hypothesis generator
  • deterministic mapping to symbolic propositions

Frameworks

  • modal logic (Kripke semantics)
  • neuro-symbolic validation loop

Is Agentic

true

Architectures

  • multi-agent (component monitors + hierarchical reasoning + physical knowledge agent)
  • neuro-symbolic loop

Collaboration

  • structured reporting from monitors to a central Reasoning Agent
  • query-response with Physical Knowledge Agent

Reproducibility

Code Available

Open Source Status

  • yes

Risks & Boundaries

Limitations

  • Evaluation is limited to a simplified simulator, not full accelerator physics or beam dynamics.
  • System uses a small, fixed vocabulary of propositions, limiting expressiveness for open-ended faults.
  • Axioms are manually authored; incorrect axioms risk forcing wrong conclusions.
  • LM role is constrained to classification via structured prompts; semantic parser advances are needed for richer hypotheses.

When Not To Use

  • Where you cannot specify reliable expert axioms or a clear vocabulary of atomic propositions.
  • Systems that require detailed continuous physics (e.g., full beam dynamics) rather than coarse operational coupling.
  • Immediate deployment in safety-critical live control without extensive real-world validation.

Failure Modes

  • LM misclassification maps an event to the wrong atomic proposition, leading to incorrect but formally consistent updates.
  • Incorrect or overconfident axioms prune valid hypotheses and create systematic blind spots.
  • Simulation simplifications hide edge cases that appear in real hardware and cause unexpected agent behavior.

Core Entities

Models

  • Large language model (unspecified)

Metrics

  • correct diagnosis rate (simulated scenarios)