RadioLLM: use LLMs for radio tasks via hybrid prompts and token reprogramming

January 28, 20257 min

Overview

Decision SnapshotNeeds Validation

The approach is promising for prototyping and lab deployments: it shows consistent gains on public benchmarks and pays attention to latency. Real-world readiness needs code release, edge profiling, and tests on live air data.

Citations0

Evidence Strength0.70

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 5/5

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 60%

Authors

Shuai Chen, Yong Zu, Zhixi Feng, Shuyuan Yang, Mengchang Li

Links

Abstract / PDF / Data

Why It Matters For Business

RadioLLM lets you reuse LLM priors for multiple radio tasks, improving classification and denoising while cutting prompt overhead and latency in many benchmark scenarios.

Who Should Care

Summary TLDR

RadioLLM adapts large language models (LLMs — big neural nets trained on text) to radio tasks by two ideas: HPTR (Hybrid Prompt and Token Reprogramming) maps raw I/Q signal patches into LLM token space and replaces long text prompts with top‑K semantic anchors; FAF (Frequency‑Attuned Fusion) injects CNN‑extracted high‑frequency features to recover transient signal details. Using GPT-2/LLaMA variants and LoRA fine‑tuning, RadioLLM outperforms many baselines across seven public radio datasets on classification and denoising, improves SSIM for denoising (e.g., 0.838–0.893), and reduces inference latency via compact prompts. Results are strong on benchmarks but come with class confusion for very

Problem Statement

Current deep models for cognitive radio are task-specific and struggle to scale across diverse signal types. LLMs have strong cross‑domain priors but are trained on text and lose native radio features when forced through textual prompts. The paper aims to (1) map raw radio I/Q signals into LLM input space without textualization, (2) inject compact expert knowledge into prompts, and (3) restore LLM sensitivity to high‑frequency signal details for unified denoising and classification.

Main Contribution

RadioLLM: a unified LLM‑based system that handles denoising, recovery, and modulation classification from raw I/Q signals.

HPTR (Hybrid Prompt + Token Reprogramming): replace long text prompts with top‑K semantic token anchors and reprogram I/Q patches into LLM tokens via cross‑attention.

Key Findings

RadioLLM beats many baselines on modulation classification.

NumbersOA: 58.10% (RML16A), 58.35% (RML16B), 68.19% (RML16C)

Practical UseUse RadioLLM pretraining+LoRA when you need higher classification accuracy across multiple public radio datasets.

Evidence RefTable I

RadioLLM gives better denoising structural quality.

NumbersSSIM: 0.838 (RML16A), 0.893 (RML16B), 0.846 (RML16C)

Practical UseAdopt FAF fusion to preserve waveform structure when denoising noisy I/Q data.

Evidence RefTable II

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
OA (RML16A)58.10%55.39% (baseline)+2.71 ppRML16A 100‑shotTable I shows RadioLLM OA 58.10% vs baseline rowsTable I
OA (RML16B)58.35%56.17% (SemiAMC runner-up)+2.18 ppRML16B 100‑shotTable I and text discussionTable I

What To Try In 7 Days

Run LoRA fine‑tuning of GPT‑2/LLaMA on a small sample of your I/Q data using HPTR mapping.

Implement top‑K semantic anchors (start with K=7) to replace long text prompts and measure latency.

Add a small CNN FAF block to inject high‑frequency features and check SSIM on denoising tasks.

Optimization Features

Token Efficiency
Top‑K anchor selection reduces redundant prompt tokens
Model Optimization
LoRA
System Optimization
Selective freezing of most LLM weights to cut storage and runtime updatesBatch inference measured with 128 samples
Training Optimization
Multi‑dataset pretraining with loss balancing to avoid dataset bias
Inference Optimization
Hybrid prompts using top‑K semantic anchors to reduce prefix token loadAccuracy

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

RadioML2016/2018/2022 series (publicly available datasets cited in paper)ADS-B dataset (public)Wi‑Fi dataset (cited)

Risks & Boundaries

Limitations

Evaluations use public benchmarks and simulated SNR mixes; real operational environments may differ.

Model confuses closely related modulations (e.g., 16QAM vs 64QAM) and some noise‑sensitive classes.

When Not To Use

On ultra low‑power edge devices without hardware acceleration (LLMs are still heavy).

When strict model interpretability and regulatory explainability are required.

Failure Modes

Misclassification among high‑order QAM or similar modulation classes under ambiguous SNRs.

Performance drops when domain pretraining bias mismatches target data.

Core Entities

Models

RadioLLMGPT-2LLaMA3BERTLoRA

Metrics

AccuracyCohen's KappaSSIM

Datasets

RML16ARML16BRML16CRML22RML18AADS-BWi-Fi

Benchmarks

modulation classification benchmarks (RadioML series)ADS-B real-world signalsWi‑Fi over-the-air set