Overview
Production Readiness
0.45
Novelty Score
0.4
Cost Impact Score
0.65
Citation Count
3
Why It Matters For Business
You can run a practical ViT-style segmenter on a $100–$150 Jetson Nano by combining distillation and fp16 quantization, giving near-teacher accuracy while keeping model size and RAM within real device limits.
Summary TLDR
The authors build a practical compression pipeline (structured pruning of heads/weights, distillation from a Swin teacher, and fp16/post-training quantization) to deploy a lightweight ViT-based segmentation model on an NVIDIA Jetson Nano (4GB). The final UPerNet + MobileViT student reaches about 0.61 mean IoU on the LPCV UAV disaster parsing dataset with 5.6M params and runs on the Jetson with ~3–4.5 FPS while consuming ~3.7 GB RAM (uses swap). Pruning MobileViT is fragile (accuracy drops early); distillation recovers ~+0.06 IoU; fp16 helped MobileViT throughput but not ResNet18. Code is available.
Problem Statement
Vision transformers give high segmentation accuracy but are too large for battery- and memory-constrained edge devices (e.g., a Jetson Nano with 4GB). The goal is to compress ViT-style models so they run quickly on-device with minimal loss of segmentation accuracy for UAV disaster scenes.
Main Contribution
A unified, practical compression pipeline combining structured pruning (heads/filters/linear rows), logit+feature distillation, and post-training/fp16 quantization targeted at edge deployment.
Empirical evaluation on the LPCV 2023 UAV disaster scene parsing dataset showing a distilled + fp16-quantized UPerNet+MobileViT student that approaches teacher accuracy while fitting a Jetson Nano memory budget.
A focused ablation: effects of pruning types (filter/channel/unstructured), head pruning, iterative pruning, and fp16 quantization on accuracy, speed, and memory.
Key Findings
Distillation substantially boosts MobileViT segmentation accuracy.
Final compressed model fits and runs on a 4GB Jetson Nano but uses swap.
fp16 quantization increased MobileViT throughput but not ResNet18.
Pruning MobileViT degrades accuracy quickly; Swin tolerates more sparsity.
Head pruning reduces theoretical params but not runtime when dimensions stay the same.
Iterative pruning gave only marginal gains over bulk pruning on MobileViT.
Results
mean IoU (UPerNet + MobileViT, baseline)
mean IoU (UPerNet + MobileViT, +KD from Swin-v2-T)
parameters (UPerNet + MobileViT)
Jetson Nano RAM usage (final model)
Jetson Nano throughput (MobileViT, fp32 -> fp16)
Pruning sensitivity (MobileViT)
Head pruning example (MobileViT 4->2 heads)
Who Should Care
What To Try In 7 Days
Pick a strong off-the-shelf teacher (Swin-v2-T) and train a MobileViT student via logit+feature distillation on your task data.
Measure on-device RAM and FPS on your Jetson; convert student to fp16 and rerun to check throughput gains.
Avoid aggressive pruning on already compact MobileViTs; try head pruning only if you can change tensor shapes or use optimized kernels.
Optimization Features
Infra Optimization
- targeted deployment on Jetson Nano (4GB)
Model Optimization
- structured pruning (heads/filters/linear rows)
- head pruning (zeroing heads)
- distillation (logit + feature-level)
System Optimization
- measure RAM and swap on Jetson Nano
- recommend moving to optimized inference engines
Training Optimization
- knowledge distillation from Swin teacher
- fine-tuning after pruning
Inference Optimization
- fp16 quantization
- post-training quantization (limited by PyTorch)
- recommendation to use ONNX/TensorRT for further gains
Reproducibility
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Experiments run on a single, small LPCV dataset (1,120 train images) — small-data effects may bias results.
- Final on-device setup used swap (memory pressure); pure 4GB RAM headroom is limited.
- PyTorch has limited quantized CUDA support; reported speed/memory depends on current kernel implementations.
- Pruning did not change attention tensor shapes, so no real runtime win from head pruning in these experiments.
When Not To Use
- When you need per-class performance on rare but critical classes (low-frequency classes suffer under compression).
- When you cannot tolerate swap usage or have stricter memory-latency SLAs.
- When your deployment stack cannot run fp16 or lacks optimized attention kernels.
Failure Modes
- Loss of fine-grain details and low-frequency class recall after compression.
- No runtime improvement from pruning if tensor shapes are not reduced.
- Quantization or fp16 may not speed up all backbones due to kernel differences (ResNet vs attention).
- Iterative pruning may not yield meaningful extra compression for already compact student models.
Core Entities
Models
- MobileViT
- Swin-v2-T
- Swin-v2-B
- ResNet18
- UPerNet
- DeepLabv3
- FANet
Metrics
- mean IoU
- Accuracy
- FPS
- model size (params)
- RAM usage
Datasets
- LPCV 2023 UAV disaster scene dataset
- ImageNet-1k
- ADE20K
Benchmarks
- LPCV 2023 challenge
- ADE20K

