Make software teams of humans + autonomous, norm-aware AI agents that plan, remember, and self-regulate

Overview

Decision SnapshotNeeds Validation

Conceptual design and partial prototype only. Ideas are novel (normative multi-agent SE and BDIM-SE memory), but no quantitative evaluations yet, so readiness and evidence are low.

Citations0

Evidence Strength0.20

Confidence0.85

Risk Signals11

Trust Signals

Findings with numeric evidence: 0/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/0

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 40%

Production readiness: 20%

Novelty: 60%

Authors

Hoa Khanh Dam, Geeta Mahala, Rashina Hoda, Xi Zheng, Cristina Conati

Links

Abstract / PDF

Why It Matters For Business

This design lets teams scale software work with many specialized AI agents while enforcing rules (privacy, security, legal). That reduces manual coordination and speeds routine work, but requires rule design and human oversight.

Who Should Care

CTO Engineering Lead Product Manager ML Engineer Founder

Summary TLDR

This is a vision and system-design paper proposing BDIM-SE: a cognitive agent architecture for autonomous software-engineering (SE) agents that adds persistent memory and LLM links to classic BDI agents, and NorMAS-SE: a normative multi-agent team framework that uses commitments and deontic norms to ensure compliant coordination between human developers and SE agents. The authors implemented core components and lay out a practical roadmap for single-agent tests, multi-agent experiments, and human studies.

Problem Statement

Current LLM-based multi-agent SE frameworks use scripted workflows and role prompts. They lack genuine agency, persistent memory, norm-awareness, and scalable coordination that supports human collaboration and legal/ethical compliance.

Main Contribution

BDIM-SE: A cognitive architecture that extends BDI (Belief-Desire-Intention) agents with persistent, LLM-empowered memory (episodic/semantic/procedural) and query hooks to LLMs.

NorMAS-SE: A normative multi-agent systems design that represents software-development norms as deontic modalities (obligations, prohibitions, permissions) and encodes coordination as commitments.

Key Findings

BDIM-SE extends BDI agents by adding persistent memory and direct LLM queries to support longitudinal reasoning.

Practical UseIf you need agents that remember past tasks and learn across sessions, use BDIM-SE's mix of symbolic beliefs plus LLM-backed memory rather than stateless LLM prompts.

Evidence RefSections 2 and 2 (Figure 2)

Norms are modeled as deontic modalities and encoded as JSON norms and commitments to make agent interactions accountable.

Practical UseEncode team rules (e.g., privacy checks) as machine-readable norms so agents can automatically filter illegal or undesirable plans.

Evidence RefSection 3 (Software development norms example and commitments)

What To Try In 7 Days

Map two recurring development tasks (e.g., PR review, privacy scan) and express their rules as simple JSON norms.

Prototype one BDIM-SE agent by connecting an LLM to a small belief store (task list + repo URL) and implement a plan that runs tests and reports results.

Run a tabletop test where the agent generates a commit and a separate 'testing' agent enacts the commitment, observing failure and remedy behavior.

Agent Features

Memory

Short-term working memoryLong-term episodic memorySemantic and procedural memoryLLM-backed belief queries

Planning

Plan library with invocation and context conditionsPlan selection with fallback alternatives

Tool Use

LLM integration for code and feasibility queriesTool connectors for GitHub, JIRA, CI/CD

Frameworks

NorMAS-SE

Is Agentic

Yes

Architectures

BDIM-SE

Collaboration

Commitment-based coordinationNormative reasoner for complianceAuto-generated interaction protocols

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusUnknown

LicenseUnknown

Risks & Boundaries

Limitations

No empirical results yet; evaluation is future work.

Relies on current LLM capabilities which can hallucinate and be brittle.

When Not To Use

In high-assurance or safety-critical systems without strong human oversight and formal verification.

When you cannot invest in norm design, governance, and monitoring.

Failure Modes

LLM hallucination producing norm-violating outputs.

Conflicting norms or incomplete norms causing wrong plan filtering.

Core Entities

Models

foundational LLM (unnamed)

Context Entities

Models

LLM-based code generation and testing models (surveyed references)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

BDIM-SE extends BDI agents by adding persistent memory and direct LLM queries to support longitudinal reasoning.

Norms are modeled as deontic modalities and encoded as JSON norms and commitments to make agent interactions accountable.

What To Try In 7 Days

Agent Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Context Entities

Models

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

Key finding

A dynamic town simulation that tests LLM agents on doing tasks while following local cultural norms

Key finding

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding