Overview
Production Readiness
0.4
Novelty Score
0.65
Cost Impact Score
0.5
Citation Count
0
Why It Matters For Business
Formal symbolic rules combined with LMs reduce risky hallucinations in diagnostics, making automated troubleshooting auditable and safer for critical infrastructure.
Summary TLDR
The paper presents a neuro-symbolic multi-agent system that pairs language models (LMs) as hypothesis generators with formally defined belief states (Kripke models) and a library of expert axioms in modal logic. LMs classify anomalies into a fixed set of propositions; symbolic checks reject logically or physically impossible hypotheses. In a simulated particle-accelerator sector the system solved three diagnostic scenarios (including cascading failures) by pruning impossible causes and isolating roots. Code is public. This is a systems-level proof-of-concept, not a full real-world deployment.
Problem Statement
Language models can suggest plausible-sounding but logically or physically impossible diagnoses (hallucinations). Critical control systems need diagnoses that are verifiable, physically consistent, and auditable. The gap is how to combine LM semantic power with formal checks so agents remain autonomous but not unsafe.
Main Contribution
A neuro-symbolic multi-agent architecture that stores each agent's beliefs as a Kripke model (possible-world representation) and uses modal logic operators (□ necessity, ♢ possibility).
A practical loop: perception → LM hypothesis (constrained JSON classification) → deterministic mapping to atomic propositions → symbolic validation against expert axioms → Kripke-model update.
A set of modal-logic axioms (expert rules) that encode immutable physical constraints to block impossible causal theories and prune hypothesis space.
An experimental demonstration on a simplified particle-accelerator simulation with three diagnostic scenarios showing correct root-cause isolation.
Open-source code release to reproduce the simulated experiments.
Key Findings
The system produced correct end-to-end diagnoses on all simulated test scenarios.
Modal axioms successfully prevented physically impossible causal theories.
Structured prompting plus deterministic mapping constrained LM outputs to a fixed set of atomic propositions.
Results
End-to-end correct diagnoses
Who Should Care
What To Try In 7 Days
Clone the repo and run the simulator to reproduce the three scenarios.
Define a small fixed vocabulary of atomic propositions for a sub-system you care about.
Add a few high-confidence axioms that capture unavoidable causal directions and test LM outputs with structured JSON prompts.
Agent Features
Memory
- beliefs stored as Kripke models (possible-world representation)
- short-term model updates on each verified hypothesis
Planning
- hierarchical reasoning
- task decomposition (component-level monitoring)
Tool Use
- language model as hypothesis generator
- deterministic mapping to symbolic propositions
Frameworks
- modal logic (Kripke semantics)
- neuro-symbolic validation loop
Is Agentic
true
Architectures
- multi-agent (component monitors + hierarchical reasoning + physical knowledge agent)
- neuro-symbolic loop
Collaboration
- structured reporting from monitors to a central Reasoning Agent
- query-response with Physical Knowledge Agent
Reproducibility
Code Available
Open Source Status
- yes
Risks & Boundaries
Limitations
- Evaluation is limited to a simplified simulator, not full accelerator physics or beam dynamics.
- System uses a small, fixed vocabulary of propositions, limiting expressiveness for open-ended faults.
- Axioms are manually authored; incorrect axioms risk forcing wrong conclusions.
- LM role is constrained to classification via structured prompts; semantic parser advances are needed for richer hypotheses.
When Not To Use
- Where you cannot specify reliable expert axioms or a clear vocabulary of atomic propositions.
- Systems that require detailed continuous physics (e.g., full beam dynamics) rather than coarse operational coupling.
- Immediate deployment in safety-critical live control without extensive real-world validation.
Failure Modes
- LM misclassification maps an event to the wrong atomic proposition, leading to incorrect but formally consistent updates.
- Incorrect or overconfident axioms prune valid hypotheses and create systematic blind spots.
- Simulation simplifications hide edge cases that appear in real hardware and cause unexpected agent behavior.
Core Entities
Models
- Large language model (unspecified)
Metrics
- correct diagnosis rate (simulated scenarios)

