Use LLMs to automatically invent better tensor-network structure search algorithms

Overview

Decision SnapshotNeeds Validation

The method shows practical wins on benchmarks, but performance depends on LLM choice and needs compute to evaluate candidates; sandboxing reduces but does not eliminate risk.

Citations0

Evidence Strength0.70

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 3/3

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 40%

Novelty: 70%

Authors

Junhua Zeng, Chao Li, Zhun Sun, Qibin Zhao, Guoxu Zhou

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Automating algorithm discovery with LLMs can find improved model-compression strategies and reduce expert hours needed to hand-craft search heuristics.

Who Should Care

ML Engineer Data Scientist Engineering Lead CTO

Summary TLDR

tnGPS is a prompt-driven pipeline that uses large language models to generate new tensor-network-structure-search (TN-SS) algorithms, runs them in a sandboxed evaluator, and iterates. On image compression and Gaussian-process model-compression benchmarks the discovered algorithms beat several prior heuristics; an ablation shows the LLM-driven steps (recombine, increment, inject diversity) matter. Results vary with LLM choice and require careful evaluation.

Problem Statement

Choosing tensor-network topology and ranks is a high-dimensional discrete search. Existing TN-SS heuristics are hand-designed, often trade off exploration vs exploitation poorly, and can get stuck or need heavy expert work. The paper asks if LLMs can automate discovery of better TN-SS algorithms.

Main Contribution

A practical framework, tnGPS, that uses LLMs (prompt pipeline) to generate, refine, and diversify TN-SS sampling functions and then evaluates them automatically.

Empirical evidence that tnGPS can produce new TN-SS algorithms that outperform several state-of-the-art heuristics on in-domain (image compression) and out-of-domain (Gaussian-process compression) benchmarks.

Key Findings

tnGPS produced best-found algorithm with a much lower objective value than baselines on the reported benchmark.

NumbersObjective: baseline 0.1558 -> tnGPS 0.1102

Practical UseRun tnGPS if you want to automatically search for better TN-SS sampling strategies; expect measurable objective improvement on similar benchmarks.

Evidence RefTable 4

A discovered algorithm (Ho-2) showed higher compression on held-out images than prior methods.

NumbersLog compression ratio (test): TNGA 1.332 vs Ho-2 1.352 (Δ +0.020)

Practical UseUse tnGPS-discovered strategies to get slightly better model compression with similar error (better parameter efficiency on images).

Evidence RefTable 2 (image compression test)

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Objective (lower better)	tnGPS 0.1102	baseline (best of TNGA,TNLS,GREEDY,TnALE) 0.1558	-0.0456	MG variational mean (model compression training set used in ablation)	Table 4: full tnGPS vs baseline and component ablations	Table 4
Log compression ratio (higher better)	Ho-2 test 1.352 (avg)	TNGA test 1.332 (avg)	+0.020	Image compression test (BSD500 subset)	Table 2 averaged test compression ratios	Table 2

What To Try In 7 Days

Clone the tnGPS repo and run the sandbox on a small image (BSD500 subset) to reproduce Ho-2.

Use GPT-4 with the provided prompts to generate a few GenerateSample variants and run them locally.

Run the ablation (disable KR/II/DI) to see how each prompt stage affects results on your data.

Agent Features

Memory

history_populations (past sampled algorithms and scores) used as context

Planning

iterative generate-evaluate-refine loopbi-level roulette selection for idea dropout

Tool Use

LLM code generationsandboxed code execution for evaluation

Frameworks

tnGPS

Is Agentic

Yes

Architectures

prompt-pipeline

Collaboration

LLM uses in-context examples (existing algorithms) as meta-prompts

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Code URLs

https://github.com/ChaoLiAtRIKEN/tngps

Data URLs

BSD500 (public), CCPP (public), MG (public)

Risks & Boundaries

Limitations

Final algorithm quality depends on the LLM used; results vary across models.

Requires nontrivial compute to evaluate many generated algorithms.

When Not To Use

When you need provable algorithmic guarantees or formal correctness.

If you lack compute to evaluate many candidate algorithms.

Failure Modes

LLM produces plausible-looking but incorrect or inefficient code that escapes sandbox checks.

Overfitting discovered heuristics to training images or a narrow dataset.

Core Entities

Models

gpt-4-1106-previewgpt-3.5-turbo-16k-0613claude-instant-1claude-2TNGATNLSGREEDYTnALEHo-1Ho-2Ho-3tnGPS

Metrics

Objective (Eq.1 combining complexity and relative squared error)Log compression ratioRelative squared error (RSE)

Datasets

BSD500 (subset)CCPPMG

Benchmarks

Image compression (tensor decomposition)Model compression for tensorial Gaussian Process

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

tnGPS produced best-found algorithm with a much lower objective value than baselines on the reported benchmark.

A discovered algorithm (Ho-2) showed higher compression on held-out images than prior methods.

Results

What To Try In 7 Days

Agent Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

BLPO: jointly optimize judge and caption prompts to better align multimodal LLM judges with human image judgments

Key finding

AutoPDL: AutoML that finds and returns editable, executable prompt programs for LLM agents

Key finding

Use evolutionary search to generate harmless prompts that trigger unnecessary LLM refusals, build tests and alignment data, and reduce over‑

Key finding

Find a model's true knowledge boundary by optimizing prompts that preserve meaning

Key finding

IPOMP: pick a small, diverse evaluation set and refine it from live model feedback to get better and more stable prompts

Key finding