Declarative agent spec plus a runtime that enforces safety, memory, and low-latency execution

February 27, 20267 min

Overview

Production Readiness

0.7

Novelty Score

0.6

Cost Impact Score

0.7

Citation Count

0

Authors

Sheng Cao, Zhao Chang, Chang Li, Hannan Li, Liyao Fu, Ji Tang

Links

Abstract / PDF

Why It Matters For Business

A declarative agent spec plus a governed runtime cuts vendor lock-in, makes agent behavior auditable, and reduces operational risk by enforcing safety before actions are emitted.

Summary TLDR

This paper proposes a practical architecture for production autonomous agents. It splits agent definition (the declarative "Cognitive Blueprint" written in AgenticFormat) from execution (Runtime Engine / SDKs), enforces safety by projecting policies onto a token-level Constraint Manifold, provides a hierarchical memory system with a Reflector-driven consolidation pipeline, and reduces latency via dependency-based parallelism, speculative inference, and dynamic context pruning. The design is largely conceptual and engineering-focused; the paper describes mechanisms and APIs rather than benchmarks.

Problem Statement

LLMs output stochastic, loosely structured text while real-world tools and services require deterministic, schema-conformant inputs. This mismatch (the "Integration Paradox") and the lack of a standard agent definition cause vendor lock-in, brittle glue code, auditability gaps, and high operational cost.

Main Contribution

AgenticFormat: a language-agnostic, declarative schema to specify agent identity, tools, memory, output contracts, and safety constraints.

Constraint Manifold: enforce safety by projecting the raw policy onto a formally defined safe subspace at token decode time.

Hierarchical memory with a Reflector-driven Consolidation Protocol to compress event streams into retrievable long-term memories.

Formal execution model: augmented POMDP with a latent reasoning space and a factorized policy (reasoning then action).

Three-level self-evolution: in-context lessons, supervised fine-tuning of successful traces (STaR), and on-policy RL (e.g., GRPO/PPO).

Runtime optimizations: Cognitive Map-Reduce (parallel DAG execution), speculative execution, and attention-guided context pruning.

Integration plan with the Model Context Protocol (MCP) and cross-language SDKs (agentic-py, agentic-java) for portability.

Key Findings

Decoupling agent specification from runtime enables portable, auditable agents.

Safety is enforced by policy projection: unsafe token sequences get zero probability via token-level masking.

Latency can be reduced by executing independent plan steps in parallel and hiding tool latency with speculative inference.

Who Should Care

What To Try In 7 Days

Write a small AgenticFormat YAML for a simple task (e.g., PR reviewer) with an explicit output JSON schema.

Integrate one MCP connector (e.g., GitHub) and bind it in the blueprint to test tool permissions.

Prototype token-level constraint masking for one risky action and validate that unsafe outputs are prevented at decode time.

Agent Features

Memory

  • hierarchical memory (short-term event stream, long-term semantic/episodic/procedural)
  • Reflector-driven consolidation
  • embedding-based retrieval from vector stores

Planning

  • think-before-act reasoning traces
  • speculative planning (lookahead predictions)
  • parallel plan execution via DAG analysis

Tool Use

  • MCP-based tool connectors
  • explicit tool bindings in AgenticFormat blueprint
  • token-level masking for unsafe tool outputs

Frameworks

  • AgenticFormat
  • Agentic AI Platform SDK (agentic-py, agentic-java)
  • Model Context Protocol (MCP)

Is Agentic

true

Architectures

  • augmented POMDP with latent reasoning space
  • factorized policy (reasoning then action)
  • declarative blueprint/runtime separation

Collaboration

  • local agents composition (blueprint can reference local agents)
  • runtime SDKs enabling cross-language deployment

Optimization Features

Token Efficiency

  • token budget controller with KKT-based shadow pricing
  • budget-aware biasing of reasoning depth

Infra Optimization

  • cross-language SDKs for low-latency runtimes (Java)
  • asynchronous tool call execution and commit/rollback

Model Optimization

  • SFT

System Optimization

  • token-level masking to enforce constraint manifold
  • dependency analyzer to bound critical path latency

Training Optimization

  • self-purified dataset filtering of correct trajectories
  • GRPO

Inference Optimization

  • Cognitive Map-Reduce (parallel DAG execution)
  • speculative inference (prediction + lookahead)
  • dynamic KV-cache pruning via attention scores

Reproducibility

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Paper is mostly architectural and conceptual; lacks empirical benchmarks demonstrating latency or safety gains.
  • Implementing token-level constraint masking and a full Reflector pipeline requires substantial engineering effort.
  • Adoption depends on MCP and third-party runtimes supporting AgenticFormat.

When Not To Use

  • For quick prototypes where simple scripts are faster to ship.
  • When the team lacks resources to implement a custom runtime and safety enforcement.
  • When agent tasks are low-risk and do not require cross-language portability or auditability.

Failure Modes

  • Mis-specified blueprints or constraints could block valid actions or permit unsafe ones if predicates are incorrect.
  • Reflector consolidation may extract incorrect 'lessons' and bias future behavior.
  • Speculative execution can lead to wasted compute when predictions diverge, increasing cost and latency.

Core Entities

Models

  • LLMs (unspecified)
  • gemini-3-pro-preview (example in listing)

Context Entities

Models

  • ReAct, Chain-of-Thought (referenced prior work)