Overview
Production Readiness
0.6
Novelty Score
0.45
Cost Impact Score
0.65
Citation Count
1
Why It Matters For Business
IDG can produce usable labeled ABSA data from unlabeled text, lowering annotation cost and quickly bootstrapping sentiment models in new domains.
Summary TLDR
The paper presents IDG, a three-stage pipeline that uses an LLM (GPT‑3.5‑turbo) to extract domain aspects from unlabeled text, expand them, generate single- and multi-aspect sentence-aspect-polarity triplets via iterative prompting, and filter outputs with an LLM-based discriminator. On four SemEval ABSA benchmarks, synthetic data from IDG matches or improves performance of five baseline ABSA models. Key wins: generated-only training often approaches manual labels; mixing generated + original data yields consistent gains (up to +4.01% F1); discriminator and multi-aspect generation materially help. The method requires access to an LLM and careful aspect extraction and filtering.
Problem Statement
Aspect-based sentiment models need many labeled sentence–aspect–polarity examples but manual annotation is expensive. Existing augmentation methods either tweak words or paraphrase and still suffer poor fluency, low diversity, or require labeled seeds. Directly prompting LLMs is promising but leads to hallucinations and low-quality pseudo labels. The goal is to produce diverse, fluent, high-quality ABSA training data from an unlabeled corpus using LLMs while controlling hallucination.
Main Contribution
IDG: a three-stage, iterative LLM pipeline (aspect extraction/extension, iterative generation, LLM-based evaluation/filtering) to produce pseudo-labeled ABSA data from unlabeled text.
A self-reflection discriminator that uses the LLM as a judge plus automatic scoring to remove low-quality outputs.
Comprehensive evaluation on four SemEval ABSA benchmarks showing generated data can match or improve over manual labels and improves multiple baseline ABSA models when mixed with real data.
Key Findings
IDG-generated data can match or exceed manual training data on ABSA models.
Mixing IDG synthetic data with original labeled data consistently improves models.
Filtering generated samples is critical for final model quality.
Generating multi-aspect sentences improves ABSA training over single-aspect only.
Aspect extraction benefits from few-shot demonstrations and affects final performance.
Results
Accuracy
R-GAT F1
R-GAT F1
Mixing generated + original F1 gain
Aspect extraction F1
ASGCN F1 drop without discriminator
Who Should Care
What To Try In 7 Days
Run IDG on your domain unlabeled corpus to generate ~1× training data and train a BERT-based ABSA model.
Enable few-shot examples for aspect extraction to raise aspect F1 quickly.
Include the discriminator (LLM-as-judge + score threshold) before training to avoid noisy samples harming performance.
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Requires access to a high-quality LLM (authors use GPT‑3.5‑turbo); API cost and privacy may limit adoption.
- Performance depends on accuracy of extracted aspects; gold aspects give a clear upper bound.
- Filtering threshold T needs tuning (authors find T=6 best); overfiltering reduces effective data.
- Evaluations are on SemEval restaurant/laptop datasets; cross-domain generalization needs more tests.
When Not To Use
- You already have ample, high-quality labeled ABSA data — manual labels may be better.
- When LLM use is disallowed for privacy or compliance reasons.
- If you lack compute/budget for repeated LLM calls for generation and self-reflection.
Failure Modes
- LLM hallucination produces wrong aspect–polarity pairs that degrade training if not filtered.
- Repetitive low-diversity outputs without iterative feedback reduce model gains.
- Overly strict filtering removes too much data and hurts downstream learning.
- Poor few-shot or domain demonstrations cause weak aspect extraction and noisy generation.
Core Entities
Models
- GPT-3.5-turbo (LLM for generation and judging)
- BERT-base-uncased (backbone for downstream ABSA)
- ATAE-LSTM
- ASGCN
- BERT-SPC
- R-GAT
- KGAN
- R-GAT (used heavily in comparisons)
Metrics
- Accuracy
- F1
- Precision
- Recall
- Macro-F1
Datasets
- Laptop14 (SemEval2014)
- Restaurant14 (SemEval2014)
- Restaurant15 (SemEval2015)
- Restaurant16 (SemEval2016)
Benchmarks
- SemEval 2014/2015/2016 ABSA benchmarks

