Use LLMs to automatically invent better tensor-network structure search algorithms

February 4, 20247 min

Overview

Decision SnapshotNeeds Validation

The method shows practical wins on benchmarks, but performance depends on LLM choice and needs compute to evaluate candidates; sandboxing reduces but does not eliminate risk.

Citations0

Evidence Strength0.70

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 3/3

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 40%

Novelty: 70%

Authors

Junhua Zeng, Chao Li, Zhun Sun, Qibin Zhao, Guoxu Zhou

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Automating algorithm discovery with LLMs can find improved model-compression strategies and reduce expert hours needed to hand-craft search heuristics.

Who Should Care

Summary TLDR

tnGPS is a prompt-driven pipeline that uses large language models to generate new tensor-network-structure-search (TN-SS) algorithms, runs them in a sandboxed evaluator, and iterates. On image compression and Gaussian-process model-compression benchmarks the discovered algorithms beat several prior heuristics; an ablation shows the LLM-driven steps (recombine, increment, inject diversity) matter. Results vary with LLM choice and require careful evaluation.

Problem Statement

Choosing tensor-network topology and ranks is a high-dimensional discrete search. Existing TN-SS heuristics are hand-designed, often trade off exploration vs exploitation poorly, and can get stuck or need heavy expert work. The paper asks if LLMs can automate discovery of better TN-SS algorithms.

Main Contribution

A practical framework, tnGPS, that uses LLMs (prompt pipeline) to generate, refine, and diversify TN-SS sampling functions and then evaluates them automatically.

Empirical evidence that tnGPS can produce new TN-SS algorithms that outperform several state-of-the-art heuristics on in-domain (image compression) and out-of-domain (Gaussian-process compression) benchmarks.

Key Findings

tnGPS produced best-found algorithm with a much lower objective value than baselines on the reported benchmark.

NumbersObjective: baseline 0.1558 -> tnGPS 0.1102

Practical UseRun tnGPS if you want to automatically search for better TN-SS sampling strategies; expect measurable objective improvement on similar benchmarks.

Evidence RefTable 4

A discovered algorithm (Ho-2) showed higher compression on held-out images than prior methods.

NumbersLog compression ratio (test): TNGA 1.332 vs Ho-2 1.352+0.020)

Practical UseUse tnGPS-discovered strategies to get slightly better model compression with similar error (better parameter efficiency on images).

Evidence RefTable 2 (image compression test)

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Objective (lower better)tnGPS 0.1102baseline (best of TNGA,TNLS,GREEDY,TnALE) 0.1558-0.0456MG variational mean (model compression training set used in ablation)Table 4: full tnGPS vs baseline and component ablationsTable 4
Log compression ratio (higher better)Ho-2 test 1.352 (avg)TNGA test 1.332 (avg)+0.020Image compression test (BSD500 subset)Table 2 averaged test compression ratiosTable 2

What To Try In 7 Days

Clone the tnGPS repo and run the sandbox on a small image (BSD500 subset) to reproduce Ho-2.

Use GPT-4 with the provided prompts to generate a few GenerateSample variants and run them locally.

Run the ablation (disable KR/II/DI) to see how each prompt stage affects results on your data.

Agent Features

Memory
history_populations (past sampled algorithms and scores) used as context
Planning
iterative generate-evaluate-refine loopbi-level roulette selection for idea dropout
Tool Use
LLM code generationsandboxed code execution for evaluation
Frameworks
tnGPS
Is Agentic

Yes

Architectures
prompt-pipeline
Collaboration
LLM uses in-context examples (existing algorithms) as meta-prompts

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

BSD500 (public), CCPP (public), MG (public)

Risks & Boundaries

Limitations

Final algorithm quality depends on the LLM used; results vary across models.

Requires nontrivial compute to evaluate many generated algorithms.

When Not To Use

When you need provable algorithmic guarantees or formal correctness.

If you lack compute to evaluate many candidate algorithms.

Failure Modes

LLM produces plausible-looking but incorrect or inefficient code that escapes sandbox checks.

Overfitting discovered heuristics to training images or a narrow dataset.

Core Entities

Models

gpt-4-1106-previewgpt-3.5-turbo-16k-0613claude-instant-1claude-2TNGATNLSGREEDYTnALEHo-1Ho-2Ho-3tnGPS

Metrics

Objective (Eq.1 combining complexity and relative squared error)Log compression ratioRelative squared error (RSE)

Datasets

BSD500 (subset)CCPPMG

Benchmarks

Image compression (tensor decomposition)Model compression for tensorial Gaussian Process