Generative AI can synthesize virtual IMU data to augment and pretrain HAR models

October 18, 20236 min

Overview

Production Readiness

0.3

Novelty Score

0.6

Cost Impact Score

0.7

Citation Count

3

Authors

Zikang Leng, Hyeokhyen Kwon, Thomas Plötz

Links

Abstract / PDF

Why It Matters For Business

Synthetic IMU data can cut labeling costs and accelerate development of wearable activity features, but synthetic-to-real gaps require small calibration sets and validation for product safety.

Summary TLDR

The paper argues that modern generative models (LLMs + text-driven motion synthesis) can create virtual IMU (inertial sensor) data from text prompts. The authors describe a pipeline: ChatGPT generates varied activity descriptions, T2M-GPT creates 3D motion, inverse kinematics + IMUSim convert motion to IMU streams, and a small real-data calibration step closes the domain gap. They report improved classifier performance on three public HAR datasets (RealWorld, Pamap2, USC-HAD). The paper is a position piece that also outlines future work: large synthetic benchmarks, hierarchical decomposition of activities, self-supervised pretraining, and health sensing. Benefits include lower data-collect/​

Problem Statement

Wearable-based human activity recognition (HAR) needs labeled IMU data. Manual labeling is costly, slow, privacy-sensitive, and scarce. The paper proposes using generative foundation models to automatically produce diverse, labeled virtual IMU data to reduce labeling costs and broaden training data.

Main Contribution

Describe a practical pipeline that turns text prompts into virtual IMU data using ChatGPT, T2M-GPT, inverse kinematics, IMUSim, and a small real-data calibration step.

Report that adding generated virtual IMU data improved downstream HAR classifier performance on three public datasets: RealWorld, Pamap2, and USC-HAD.

Outline actionable research directions: build large synthetic benchmark datasets, learn hierarchical and temporal decompositions of activities, apply self-supervised pretraining, and target clinical/health sensing use cases.

Key Findings

A text→motion→IMU pipeline can produce labeled virtual IMU data and boost HAR performance on standard datasets.

The motion synthesis model (T2M-GPT) uses a discrete codebook of 512 latent entries.

Numberscodebook size = 512

The pipeline removes the need for video data used by prior cross-modality methods like IMUTube, reducing manual video selection.

Results

downstream HAR classifier performance

Valuereported 'significant improvement' when augmented with generated virtual IMU data

Who Should Care

What To Try In 7 Days

Prototype: generate 50–200 textual variants per target activity using an LLM and feed them to a motion synthesis model to get 3D motion.

Convert a subset to IMU streams via IMUSim, then fine-tune a small HAR classifier using a mix of synthetic and 5–10% real labeled sensor data.

Measure validation accuracy vs. a real-only baseline and inspect failure cases for realistic motion mismatch.

Reproducibility

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • No public code or numeric results presented—hard to reproduce reported gains.
  • Synthetic realism depends on motion synthesis quality; mismatch can hurt real-world generalization.
  • Method needs calibration with real IMU data; pure synthetic-to-deployment without validation is risky.
  • Evaluation described on three datasets but lacks detailed metrics and ablation studies.

When Not To Use

  • You already have a large, well-labeled real IMU dataset—synthetic augmentation adds little.
  • For regulated clinical deployments without clinical validation of synthetic data.
  • When motion nuances critical to safety are not captured by the motion synthesis model.

Failure Modes

  • Generated IMU streams diverge from real sensor noise/placement, reducing model accuracy.
  • LLM prompt bias leads to non-representative activity styles and dataset bias.
  • Motion synthesis model cannot capture micro-movements, causing blind spots.

Core Entities

Models

  • ChatGPT
  • T2M-GPT
  • IMUTube
  • IMUSim

Metrics

  • Accuracy

Datasets

  • RealWorld
  • Pamap2
  • USC-HAD
  • HumanML3D

Benchmarks

  • none (proposes new synthetic benchmarks)

Context Entities

Models

  • GPT-3 (cited)
  • T2M family (cited)

Datasets

  • ImageNet (analogy to large benchmark datasets)
  • Human motion datasets referenced in citations