Make software teams of humans + autonomous, norm-aware AI agents that plan, remember, and self-regulate

December 2, 20256 min

Overview

Decision SnapshotNeeds Validation

Conceptual design and partial prototype only. Ideas are novel (normative multi-agent SE and BDIM-SE memory), but no quantitative evaluations yet, so readiness and evidence are low.

Citations0

Evidence Strength0.20

Confidence0.85

Risk Signals11

Trust Signals

Findings with numeric evidence: 0/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/0

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 40%

Production readiness: 20%

Novelty: 60%

Authors

Hoa Khanh Dam, Geeta Mahala, Rashina Hoda, Xi Zheng, Cristina Conati

Links

Abstract / PDF

Why It Matters For Business

This design lets teams scale software work with many specialized AI agents while enforcing rules (privacy, security, legal). That reduces manual coordination and speeds routine work, but requires rule design and human oversight.

Who Should Care

Summary TLDR

This is a vision and system-design paper proposing BDIM-SE: a cognitive agent architecture for autonomous software-engineering (SE) agents that adds persistent memory and LLM links to classic BDI agents, and NorMAS-SE: a normative multi-agent team framework that uses commitments and deontic norms to ensure compliant coordination between human developers and SE agents. The authors implemented core components and lay out a practical roadmap for single-agent tests, multi-agent experiments, and human studies.

Problem Statement

Current LLM-based multi-agent SE frameworks use scripted workflows and role prompts. They lack genuine agency, persistent memory, norm-awareness, and scalable coordination that supports human collaboration and legal/ethical compliance.

Main Contribution

BDIM-SE: A cognitive architecture that extends BDI (Belief-Desire-Intention) agents with persistent, LLM-empowered memory (episodic/semantic/procedural) and query hooks to LLMs.

NorMAS-SE: A normative multi-agent systems design that represents software-development norms as deontic modalities (obligations, prohibitions, permissions) and encodes coordination as commitments.

Key Findings

BDIM-SE extends BDI agents by adding persistent memory and direct LLM queries to support longitudinal reasoning.

Practical UseIf you need agents that remember past tasks and learn across sessions, use BDIM-SE's mix of symbolic beliefs plus LLM-backed memory rather than stateless LLM prompts.

Evidence RefSections 2 and 2 (Figure 2)

Norms are modeled as deontic modalities and encoded as JSON norms and commitments to make agent interactions accountable.

Practical UseEncode team rules (e.g., privacy checks) as machine-readable norms so agents can automatically filter illegal or undesirable plans.

Evidence RefSection 3 (Software development norms example and commitments)

What To Try In 7 Days

Map two recurring development tasks (e.g., PR review, privacy scan) and express their rules as simple JSON norms.

Prototype one BDIM-SE agent by connecting an LLM to a small belief store (task list + repo URL) and implement a plan that runs tests and reports results.

Run a tabletop test where the agent generates a commit and a separate 'testing' agent enacts the commitment, observing failure and remedy behavior.

Agent Features

Memory
Short-term working memoryLong-term episodic memorySemantic and procedural memoryLLM-backed belief queries
Planning
Plan library with invocation and context conditionsPlan selection with fallback alternatives
Tool Use
LLM integration for code and feasibility queriesTool connectors for GitHub, JIRA, CI/CD
Frameworks
NorMAS-SE
Is Agentic

Yes

Architectures
BDIM-SE
Collaboration
Commitment-based coordinationNormative reasoner for complianceAuto-generated interaction protocols

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

No empirical results yet; evaluation is future work.

Relies on current LLM capabilities which can hallucinate and be brittle.

When Not To Use

In high-assurance or safety-critical systems without strong human oversight and formal verification.

When you cannot invest in norm design, governance, and monitoring.

Failure Modes

LLM hallucination producing norm-violating outputs.

Conflicting norms or incomplete norms causing wrong plan filtering.

Core Entities

Models

foundational LLM (unnamed)

Context Entities

Models

LLM-based code generation and testing models (surveyed references)