Iteratively prompt an LLM to produce filtered, diverse ABSA training data that rivals manual labels

June 29, 20248 min

Overview

Decision SnapshotNeeds Validation

The method is experimentally validated across four standard ABSA datasets with multiple baselines, but it depends on a closed LLM API and on aspect-extraction quality.

Citations1

Evidence Strength0.80

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 6/6

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 65%

Production readiness: 60%

Novelty: 45%

Authors

Qihuang Zhong, Haiyun Li, Luyao Zhuang, Juhua Liu, Bo Du

Links

Abstract / PDF

Why It Matters For Business

IDG can produce usable labeled ABSA data from unlabeled text, lowering annotation cost and quickly bootstrapping sentiment models in new domains.

Who Should Care

Summary TLDR

The paper presents IDG, a three-stage pipeline that uses an LLM (GPT‑3.5‑turbo) to extract domain aspects from unlabeled text, expand them, generate single- and multi-aspect sentence-aspect-polarity triplets via iterative prompting, and filter outputs with an LLM-based discriminator. On four SemEval ABSA benchmarks, synthetic data from IDG matches or improves performance of five baseline ABSA models. Key wins: generated-only training often approaches manual labels; mixing generated + original data yields consistent gains (up to +4.01% F1); discriminator and multi-aspect generation materially help. The method requires access to an LLM and careful aspect extraction and filtering.

Problem Statement

Aspect-based sentiment models need many labeled sentence–aspect–polarity examples but manual annotation is expensive. Existing augmentation methods either tweak words or paraphrase and still suffer poor fluency, low diversity, or require labeled seeds. Directly prompting LLMs is promising but leads to hallucinations and low-quality pseudo labels. The goal is to produce diverse, fluent, high-quality ABSA training data from an unlabeled corpus using LLMs while controlling hallucination.

Main Contribution

IDG: a three-stage, iterative LLM pipeline (aspect extraction/extension, iterative generation, LLM-based evaluation/filtering) to produce pseudo-labeled ABSA data from unlabeled text.

A self-reflection discriminator that uses the LLM as a judge plus automatic scoring to remove low-quality outputs.

Key Findings

IDG-generated data can match or exceed manual training data on ABSA models.

NumbersR-GAT: Laptop14 F1 73.9276.18 (+2.26); Rest14 F1 80.7482.04 (+1.30)

Practical UseYou can train ABSA models without labeled data in some domains and expect near-manual performance; use IDG when annotations are costly.

Evidence RefTable IV; Table II

Mixing IDG synthetic data with original labeled data consistently improves models.

NumbersUp to +4.01% F1 when mixing generated + original data on evaluated models

Practical UseIf you have some labeled data, add IDG data to get a reliable boost; it's low-cost augmentation.

Evidence RefSection IV-B, Table II

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Accuracy80.2578.37 (R-GAT base)+1.88Laptop14 (Generated data)Table IV: R-GAT + IDG Acc 80.25 vs baseline 78.37Table IV
R-GAT F176.1873.92 (R-GAT base)+2.26Laptop14 (Generated data)Table IV: R-GAT + IDG F1 76.18 vs baseline 73.92Table IV

What To Try In 7 Days

Run IDG on your domain unlabeled corpus to generate ~1× training data and train a BERT-based ABSA model.

Enable few-shot examples for aspect extraction to raise aspect F1 quickly.

Include the discriminator (LLM-as-judge + score threshold) before training to avoid noisy samples harming performance.

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

Requires access to a high-quality LLM (authors use GPT‑3.5‑turbo); API cost and privacy may limit adoption.

Performance depends on accuracy of extracted aspects; gold aspects give a clear upper bound.

When Not To Use

You already have ample, high-quality labeled ABSA data — manual labels may be better.

When LLM use is disallowed for privacy or compliance reasons.

Failure Modes

LLM hallucination produces wrong aspect–polarity pairs that degrade training if not filtered.

Repetitive low-diversity outputs without iterative feedback reduce model gains.

Core Entities

Models

GPT-3.5-turbo (LLM for generation and judging)BERT-base-uncased (backbone for downstream ABSA)ATAE-LSTMASGCNBERT-SPCR-GATKGANR-GAT (used heavily in comparisons)

Metrics

AccuracyF1PrecisionRecallMacro-F1

Datasets

Laptop14 (SemEval2014)Restaurant14 (SemEval2014)Restaurant15 (SemEval2015)Restaurant16 (SemEval2016)

Benchmarks

SemEval 2014/2015/2016 ABSA benchmarks