RadioLLM: use LLMs for radio tasks via hybrid prompts and token reprogramming

January 28, 20257 min

Overview

Production Readiness

0.6

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

0

Authors

Shuai Chen, Yong Zu, Zhixi Feng, Shuyuan Yang, Mengchang Li

Links

Abstract / PDF

Why It Matters For Business

RadioLLM lets you reuse LLM priors for multiple radio tasks, improving classification and denoising while cutting prompt overhead and latency in many benchmark scenarios.

Summary TLDR

RadioLLM adapts large language models (LLMs — big neural nets trained on text) to radio tasks by two ideas: HPTR (Hybrid Prompt and Token Reprogramming) maps raw I/Q signal patches into LLM token space and replaces long text prompts with top‑K semantic anchors; FAF (Frequency‑Attuned Fusion) injects CNN‑extracted high‑frequency features to recover transient signal details. Using GPT-2/LLaMA variants and LoRA fine‑tuning, RadioLLM outperforms many baselines across seven public radio datasets on classification and denoising, improves SSIM for denoising (e.g., 0.838–0.893), and reduces inference latency via compact prompts. Results are strong on benchmarks but come with class confusion for very

Problem Statement

Current deep models for cognitive radio are task-specific and struggle to scale across diverse signal types. LLMs have strong cross‑domain priors but are trained on text and lose native radio features when forced through textual prompts. The paper aims to (1) map raw radio I/Q signals into LLM input space without textualization, (2) inject compact expert knowledge into prompts, and (3) restore LLM sensitivity to high‑frequency signal details for unified denoising and classification.

Main Contribution

RadioLLM: a unified LLM‑based system that handles denoising, recovery, and modulation classification from raw I/Q signals.

HPTR (Hybrid Prompt + Token Reprogramming): replace long text prompts with top‑K semantic token anchors and reprogram I/Q patches into LLM tokens via cross‑attention.

FAF (Frequency‑Attuned Fusion): fuse CNN high‑frequency features with LLM low‑frequency context to recover transient signal details.

Practical training recipe: LoRA fine‑tuning to limit LLM parameter updates, dataset mix for pretraining, and task‑specific decoders.

Key Findings

RadioLLM beats many baselines on modulation classification.

NumbersOA: 58.10% (RML16A), 58.35% (RML16B), 68.19% (RML16C)

RadioLLM gives better denoising structural quality.

NumbersSSIM: 0.838 (RML16A), 0.893 (RML16B), 0.846 (RML16C)

Hybrid prompts speed up inference and slightly boost accuracy vs long text prompts.

Numbers31.85% faster inference; +0.85% accuracy (reported vs hardware prompts)

Ablation shows modules are complementary.

NumbersBaseline OA 55.39% → combined OA 58.10%; inference 1131 ms → 783.8 ms

Results

OA (RML16A)

Value58.10%

Baseline55.39% (baseline)

OA (RML16B)

Value58.35%

Baseline56.17% (SemiAMC runner-up)

SSIM (denoise)

Value0.838 (RML16A); 0.893 (RML16B); 0.846 (RML16C)

BaselineSGFilter 0.782/0.821/0.777

Ablation (HTRP+FAF)

ValueOA 58.10%, Kappa 0.5391, SSIM 0.838

BaselineOA 55.39%, Kappa 0.5097, SSIM 0.805

Inference time (ms per batch)

Value783.8 ms (HTRP+FAF)

Baseline1131.0 ms (no modules)

Who Should Care

What To Try In 7 Days

Run LoRA fine‑tuning of GPT‑2/LLaMA on a small sample of your I/Q data using HPTR mapping.

Implement top‑K semantic anchors (start with K=7) to replace long text prompts and measure latency.

Add a small CNN FAF block to inject high‑frequency features and check SSIM on denoising tasks.

Optimization Features

Token Efficiency

  • Top‑K anchor selection reduces redundant prompt tokens

Model Optimization

  • LoRA

System Optimization

  • Selective freezing of most LLM weights to cut storage and runtime updates
  • Batch inference measured with 128 samples

Training Optimization

  • Multi‑dataset pretraining with loss balancing to avoid dataset bias

Inference Optimization

  • Hybrid prompts using top‑K semantic anchors to reduce prefix token load
  • Accuracy

Reproducibility

Data Urls

  • RadioML2016/2018/2022 series (publicly available datasets cited in paper)
  • ADS-B dataset (public)
  • Wi‑Fi dataset (cited)

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Evaluations use public benchmarks and simulated SNR mixes; real operational environments may differ.
  • Model confuses closely related modulations (e.g., 16QAM vs 64QAM) and some noise‑sensitive classes.
  • Pretraining filtered RML samples to SNR ≥ 14 dB, which may bias learned priors toward cleaner signals.
  • No public code or production deployment details provided.

When Not To Use

  • On ultra low‑power edge devices without hardware acceleration (LLMs are still heavy).
  • When strict model interpretability and regulatory explainability are required.
  • If you need immediate on‑device training from scratch with no access to GPUs.

Failure Modes

  • Misclassification among high‑order QAM or similar modulation classes under ambiguous SNRs.
  • Performance drops when domain pretraining bias mismatches target data.
  • Latency gains depend on prompt anchor quality; poor anchors can hurt accuracy or speed.

Core Entities

Models

  • RadioLLM
  • GPT-2
  • LLaMA3
  • BERT
  • LoRA

Metrics

  • Accuracy
  • Cohen's Kappa
  • SSIM

Datasets

  • RML16A
  • RML16B
  • RML16C
  • RML22
  • RML18A
  • ADS-B
  • Wi-Fi

Benchmarks

  • modulation classification benchmarks (RadioML series)
  • ADS-B real-world signals
  • Wi‑Fi over-the-air set