Use LLM agents plus DRL and tiny adapters to meet operator intents while cutting active radio units and memory use

February 26, 20268 min

Overview

Production Readiness

0.6

Novelty Score

0.6

Cost Impact Score

0.75

Citation Count

0

Authors

Mohammad Hossein Shokouhi, Vincent W. S. Wong

Links

Abstract / PDF

Why It Matters For Business

This approach translates high-level operator goals into automated RAN actions, cutting active radio costs and reducing LLM memory needs—so operators can save energy and cloud/edge costs without hand-coding policies.

Summary TLDR

The paper proposes an agent-based system for cell-free O-RAN where a non-RT supervisor LLM translates operator intents into objectives, near-RT LLM agents coordinate precoding weights and O-RU on/off decisions, and a monitoring agent enforces minimum rates. O-RU activation is solved via multi-agent DRL (MAPPO). To scale LLM agents, the authors use 4-bit quantization plus low-rank adapters (QLoRA) so one shared backbone serves multiple agents. In simulations the system reduces active O-RUs by ~41.93% vs baselines in energy-saving mode and cuts near-RT agent memory by ~92% versus three full LLMs. Evidence comes from simulated cell-free networks (L=50 O-RUs, K up to 20 users).

Problem Statement

Operators want to express high-level intents (e.g., save energy, meet minimum rates) and have the RAN automatically translate them into control actions. Prior work uses separate LLM agents or treats intents as independent tasks. This paper addresses complex, overlapping intents requiring coordination across agents, and the scalability problem of deploying multiple large LLMs in near-real-time RICs.

Main Contribution

An agentic AI architecture for cell-free O-RAN: a non-RT supervisor LLM translates operator intents; near-RT LLM agents set user weights and O-RU activations; a monitoring agent closes the loop.

A distributed O-RU activation method using multi-agent PPO (MAPPO) to learn on/off policies per O-RU under a shared reward that balances energy and minimum-rate violations.

A retrieval-augmented tuning memory that stores past coefficient vectors (user priorities and violation penalties) to speed coefficient tuning on similar environments.

Scalability via QLoRA: quantize the backbone to 4-bit and train small low-rank adapters per agent, reducing near-RT memory dramatically.

Key Findings

Energy-saving mode reduces active O-RUs by up to 41.93% compared with three baselines.

Numbers41.93% fewer active O-RUs vs baselines (Fig.3a)

Shared quantized backbone with QLoRA adapters cuts near-RT agent memory by ~92% versus three full LLMs.

Numbers92% memory reduction (Table I)

Performance with 7B and 14B student models was similar on the evaluated metrics.

NumbersNo large gap between 7B and 14B in active O-RU fraction (Fig.3a)

Uncoordinated coefficient updates cause instability and unnecessary O-RU activations.

NumbersDRL+GA baseline activates many O-RUs due to fast-growing λk (qualitative; described)

Results

Fraction of active O-RUs

ValueReduced by up to 41.93% vs baselines in energy-saving mode

BaselineGreedy, DRL+GA, full-power

Near-RT agent memory usage

ValueReduced by ~92% using 1×FP4 backbone + 3 adapters

Baseline3× full-precision LLMs (FP16)

Model-size sensitivity

Value7B and 14B adapter-equipped models perform similarly on active O-RU metric

BaselineComparison between 7B and 14B Qwen 2.5 students

Who Should Care

What To Try In 7 Days

Prototype a supervisor prompt that maps two real operator intents to numeric objectives and constraints.

Train tiny QLoRA adapters for one near-RT task on a small student model (7B) and measure memory use.

Run a MAPPO-style simulation for O-RU on/off in your topology to compare active unit fraction vs a greedy baseline.

Agent Features

Memory

  • key-value memory of coefficient vectors
  • cosine-retrieval over low-dim embeddings

Planning

  • intent translation to objective and constraints
  • iterative coordination loop via monitoring agent

Tool Use

  • LLM supervisor (GPT-5) for intent parsing
  • LoRA
  • MAPPO for O-RU activation

Frameworks

  • LoRA
  • MAPPO
  • WMMSE
  • Lagrange dual optimization

Is Agentic

true

Architectures

  • shared-LLM-with-adapters
  • supervisor (non-RT) + near-RT agents (xApps)
  • per-O-RU MAPPO agents

Collaboration

  • monitoring agent coordinates weighting and O-RU agents
  • A1/E2 interfaces for supervisor and agents

Optimization Features

Infra Optimization

  • ≈92% memory reduction for near-RT agents (Table I)

Model Optimization

  • 4-bit FP4 quantization of backbone
  • LoRA

System Optimization

  • single shared LLM instance replaces multiple full models
  • retrieval of prior coefficients to skip re-search

Training Optimization

  • teacher-model interaction collection to build student datasets

Inference Optimization

  • load only small adapter per agent on a shared quantized backbone

Reproducibility

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • Evaluations are simulation-only; real-world channel, latency, and orchestration issues not validated
  • Relies on proprietary GPT-5 as supervisor and teacher; operational cost and access matter
  • MAPPO training and DRL stability require careful tuning and may be slow in large deployments
  • Memory retrieval depends on quality of autoencoder embeddings and past coverage of environments

When Not To Use

  • If you require strict sub-10 ms control in all loops (near-RT loop assumptions differ)
  • When proprietary LLM access (GPT-5) is unavailable or too costly
  • For very small networks where multiple small models are cheaper than integration effort

Failure Modes

  • Oscillation or excessive activation if user priority and penalty coefficients are updated independently
  • DRL policies can learn suboptimal on/off behaviors for unseen topologies
  • Retrieved coefficients may mislead if current environment differs from stored keys

Core Entities

Models

  • GPT-5
  • Qwen 2.5
  • LoRA

Metrics

  • fraction_active_O-RUs
  • memory_usage

Context Entities

Models

  • GPT-5 (supervisor and teacher)
  • Qwen 2.5 (7B and 14B student models)

Metrics

  • Active O-RU fraction vs baselines
  • Memory usage (GB) under different quantization/adapters

Datasets

  • Simulated cell-free O-RAN scenarios (L=50 O-RUs, area 500 m^2)