A governed multi-agent runtime that makes retrieval, tools, and agent roles auditable and safe for lab-scale science

November 18, 20257 min

Overview

Production Readiness

0.7

Novelty Score

0.5

Cost Impact Score

0.4

Citation Count

0

Authors

Chandrachur Bhattacharya, Sibendu Som

Links

Abstract / PDF

Why It Matters For Business

AISAC reduces operational risk when deploying agentic AI in regulated lab settings. It enforces auditable tool use, explicit data indexing, and per-agent knowledge scopes so teams can adopt LLM-driven assistants without losing provenance or control.

Summary TLDR

AISAC is a systems-level runtime that enforces governance for multi-agent, retrieval-grounded scientific assistants. It separates reasoning agents (drivers) from execution agents (helpers), scopes retrieval per agent, uses hybrid persistent memory (SQLite + dual FAISS indices), and logs a replayable execution trace. The goal is deployable, auditable AI assistance for constrained lab environments rather than new learning algorithms.

Problem Statement

Existing agentic LLM frameworks assume permissive cloud environments, ephemeral memory, and shared retrieval. Those assumptions break in restricted scientific settings that require provenance, reproducibility, and controlled tool access.

Main Contribution

An opinionated runtime architecture that enforces four structural guarantees: declarative agent registration, budgeted orchestration, role-aligned memory access, and trace-driven transparency.

A driver-helper execution model where drivers plan and delegate but never call tools; helpers execute tools under schema validation and logging.

Role-scoped retrieval and conservative retrieval lifecycle: agent-specific corpora, explicit index builds, and dual FAISS indices plus SQLite persistence.

A strict, declarative project bootstrap contract that isolates project customizations from the shared core and records execution metadata for replay and audit.

Key Findings

AISAC enforces four structural guarantees for scientific reasoning.

Numbers4 guarantees (declared in abstract)

Drivers never invoke tools; helpers are the only agents allowed to execute tools and return structured results.

Persistent knowledge uses hybrid storage: SQLite for execution traces and two FAISS indices for dialogue and evidence.

NumbersSQLite + dual FAISS indices (stated in abstract & sections 3/5)

Retrieval is conservative and scoped: indices are never built or refreshed implicitly and helper agents can only query assigned corpora.

NumbersIndex builds/refreshes require explicit user action (stated in Section 5)

Results

Deployment domains

ValueInternal deployments in combustion science; materials research; energy process safety

Who Should Care

What To Try In 7 Days

Prototype a helper tool that wraps a single external action (e.g., job submit) and register it in AISAC to test governance and logging.

Create a small agent-specific retrieval corpus and run a manual index build to exercise the explicit retrieval lifecycle.

Simulate planner-driven delegation for a simple workflow (decomposition → helper calls) and inspect the execution trace in SQLite.

Agent Features

Memory

  • Hybrid persistence: SQLite execution traces + dual FAISS indices
  • Role-scoped retrieval (agent-specific corpora)
  • Per-turn context budgets and explicit history selection

Planning

  • Planner-directed task decomposition
  • Tournament and critique driver patterns supported

Tool Use

  • Helpers-only tool execution
  • Tool schema validation and logging
  • Project-declared tool catalog

Frameworks

  • Declarative bootstrap contracts for project customization
  • Event-stream interface for live observability

Is Agentic

true

Architectures

  • router/planner/coordinator driver-helper hierarchy
  • depth-bounded hierarchical delegation

Collaboration

  • Driver coordinates multiple helpers
  • Drivers may delegate to other drivers with bounded depth

Optimization Features

Token Efficiency

  • Router-selected historical turns to fit context budget

Infra Optimization

  • Endpoint selection resolved outside agent logic to adapt to available backends

System Optimization

  • Budgeted per-turn context and delegation depth limits

Inference Optimization

  • Support for heterogeneous inference endpoints
  • Decouples chat and embedding endpoints

Reproducibility

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • No systematic, quantitative evaluation on representative scientific tasks — evidence is architectural and anecdotal (Section 8).
  • Governance and safety posture are project-dependent because externally consequential actions are exposed via project-level tools.
  • Does not solve long-horizon memory selection or hypothesis life-cycle consolidation; storage ≠ scientific model management.
  • Conservative retrieval lifecycle (manual indexing) can be inconvenient for frequently changing corpora.
  • Tool sandboxing, partial-failure recovery, and strong resource isolation are noted but not fully implemented.

When Not To Use

  • If you need a fully autonomous, unconstrained agent that can modify its own execution policy.
  • When your workflow requires automatic, continuous reindexing of rapidly changing corpora without manual control.
  • If you need an off-the-shelf open-source runtime today — AISAC code is internal and release is pending.

Failure Modes

  • Governance gaps if project owners misconfigure tool capabilities or retrieval roots, leading to unintended access.
  • Replayability issues if inference endpoints change or model versions are not pinned across runs.
  • Operational fragility from endpoint heterogeneity: latency and capability mismatches can disrupt coordinated delegation.

Core Entities

Models

  • LLMs (unspecified vendor/model)

Context Entities

Models

  • Heterogeneous inference endpoints (chat vs embedding endpoints)