Prompt LLMs to propose hyperparameters and training code; they match or beat standard HPO early in search.
LLMs can find better hyperparameters faster than random search in low-budget settings, speeding model iteration and cutting compute cost when trials are expensive.
Key finding
GPT-4 Turbo beats random search on HPOBench in the 10-evaluation setting.
Numbers: Beats random 81.25%; median error change 13.70%; mean change 19.83% (Table 1).

