Use LLMs to automatically invent better tensor-network structure search algorithms

February 4, 20247 min

Overview

Production Readiness

0.4

Novelty Score

0.7

Cost Impact Score

0.6

Citation Count

0

Authors

Junhua Zeng, Chao Li, Zhun Sun, Qibin Zhao, Guoxu Zhou

Links

Abstract / PDF

Why It Matters For Business

Automating algorithm discovery with LLMs can find improved model-compression strategies and reduce expert hours needed to hand-craft search heuristics.

Summary TLDR

tnGPS is a prompt-driven pipeline that uses large language models to generate new tensor-network-structure-search (TN-SS) algorithms, runs them in a sandboxed evaluator, and iterates. On image compression and Gaussian-process model-compression benchmarks the discovered algorithms beat several prior heuristics; an ablation shows the LLM-driven steps (recombine, increment, inject diversity) matter. Results vary with LLM choice and require careful evaluation.

Problem Statement

Choosing tensor-network topology and ranks is a high-dimensional discrete search. Existing TN-SS heuristics are hand-designed, often trade off exploration vs exploitation poorly, and can get stuck or need heavy expert work. The paper asks if LLMs can automate discovery of better TN-SS algorithms.

Main Contribution

A practical framework, tnGPS, that uses LLMs (prompt pipeline) to generate, refine, and diversify TN-SS sampling functions and then evaluates them automatically.

Empirical evidence that tnGPS can produce new TN-SS algorithms that outperform several state-of-the-art heuristics on in-domain (image compression) and out-of-domain (Gaussian-process compression) benchmarks.

Key Findings

tnGPS produced best-found algorithm with a much lower objective value than baselines on the reported benchmark.

NumbersObjective: baseline 0.1558 -> tnGPS 0.1102

A discovered algorithm (Ho-2) showed higher compression on held-out images than prior methods.

NumbersLog compression ratio (test): TNGA 1.332 vs Ho-2 1.352 (Δ +0.020)

Ablation of LLM-driven phases hurts final performance.

NumbersObjective: tnGPS 0.1102 vs KR 0.1308, II 0.1273, DI 0.1239

Stronger LLMs give better discovered algorithms; differences are present but modest in tested setup.

NumbersLLM results (lower objective better): GPT-4 0.1813 vs GPT-3.5 0.1842

Results

Objective (lower better)

ValuetnGPS 0.1102

Baselinebaseline (best of TNGA,TNLS,GREEDY,TnALE) 0.1558

Log compression ratio (higher better)

ValueHo-2 test 1.352 (avg)

BaselineTNGA test 1.332 (avg)

Objective across LLMs (lower better)

ValueGPT-4 0.1813, GPT-3.5 0.1842, Claude-1 0.1840, Claude-2 0.1834

Baselinebaseline 0.1847

Who Should Care

What To Try In 7 Days

Clone the tnGPS repo and run the sandbox on a small image (BSD500 subset) to reproduce Ho-2.

Use GPT-4 with the provided prompts to generate a few GenerateSample variants and run them locally.

Run the ablation (disable KR/II/DI) to see how each prompt stage affects results on your data.

Agent Features

Memory

  • history_populations (past sampled algorithms and scores) used as context

Planning

  • iterative generate-evaluate-refine loop
  • bi-level roulette selection for idea dropout

Tool Use

  • LLM code generation
  • sandboxed code execution for evaluation

Frameworks

  • tnGPS

Is Agentic

true

Architectures

  • prompt-pipeline

Collaboration

  • LLM uses in-context examples (existing algorithms) as meta-prompts

Reproducibility

Data Urls

  • BSD500 (public), CCPP (public), MG (public)

Code Available

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Final algorithm quality depends on the LLM used; results vary across models.
  • Requires nontrivial compute to evaluate many generated algorithms.
  • LLMs can hallucinate runnable code; sandbox checks help but may not catch subtle bugs.
  • Discovery is driven by the initial pool of algorithms and prompt design, which can bias outcomes.

When Not To Use

  • When you need provable algorithmic guarantees or formal correctness.
  • If you lack compute to evaluate many candidate algorithms.
  • When data or code cannot be exposed to external LLM APIs due to privacy.

Failure Modes

  • LLM produces plausible-looking but incorrect or inefficient code that escapes sandbox checks.
  • Overfitting discovered heuristics to training images or a narrow dataset.
  • Path dependence: generated ideas remain close to the initial pool and miss fundamentally different tactics.

Core Entities

Models

  • gpt-4-1106-preview
  • gpt-3.5-turbo-16k-0613
  • claude-instant-1
  • claude-2
  • TNGA
  • TNLS
  • GREEDY
  • TnALE
  • Ho-1
  • Ho-2
  • Ho-3
  • tnGPS

Metrics

  • Objective (Eq.1 combining complexity and relative squared error)
  • Log compression ratio
  • Relative squared error (RSE)

Datasets

  • BSD500 (subset)
  • CCPP
  • MG

Benchmarks

  • Image compression (tensor decomposition)
  • Model compression for tensorial Gaussian Process