Overview
Production Readiness
0.5
Novelty Score
0.6
Cost Impact Score
0.6
Citation Count
8
Why It Matters For Business
You can forecast structured future events from past facts using off‑the‑shelf LLMs without costly retraining, which speeds deployment and reduces model maintenance.
Summary TLDR
The authors convert temporal knowledge graph (TKG) forecasting into an in‑context learning (ICL) problem for large language models (LLMs). They turn historical graph facts into structured prompts and decode model token probabilities to rank candidate future facts. Across WIKI, YAGO, ICEWS14/18 and an ACLED slice, open LLMs (e.g., GPT‑NeoX) reach performance close to supervised SOTA (within -3.6% to +1.5% Hits@1 median gap) and beat simple frequency/recency baselines by large margins. Replacing entity/relation names with numeric IDs barely changes results, implying LLMs mainly exploit symbolic patterns in the prompt rather than prior semantics.
Problem Statement
Temporal knowledge graph forecasting asks: given past time‑stamped facts, predict missing future facts. Current methods need supervised training and custom architectures. The paper asks whether pre‑trained LLMs, using only in‑context examples turned from history, can forecast future links without any fine‑tuning.
Main Contribution
A simple three‑stage ICL pipeline that (1) retrieves relevant past facts, (2) serializes them into structured prompts (index or lexical), and (3) decodes LLM token probabilities to score candidate entities.
Large experimental comparison showing pre‑trained LLMs (GPT2/J/NeoX and gpt‑3.5‑turbo) match or nearly match supervised TKG models on common benchmarks without training.
A targeted analysis showing LLMs still perform when entity/relation names are replaced with numeric indices, suggesting pattern learning from symbolic sequences rather than semantic priors.
Key Findings
Pretrained LLMs (ICL) reach near‑SOTA forecasting performance without fine‑tuning.
LLMs outperform simple heuristics based on frequency or recency by a meaningful margin.
Semantic entity names are not required for good ICL performance.
Results
Hits@1 (single-step)
Hits@1 vs heuristics
Robustness to anonymization
Who Should Care
What To Try In 7 Days
Serialize a small historical slice of your domain graph into the paper's 'index' prompt format and call a large pre‑trained LLM to rank candidate next facts.
Compare ICL predictions to simple heuristics (most recent/most frequent) and your existing supervised model on Hits@1 to gauge parity.
If data privacy is a concern, test anonymized numeric IDs in prompts; performance often stays similar.
Reproducibility
Data Urls
- WIKI (Leblay and Chekol 2018)
- YAGO (Mahdisoltani et al. 2014)
- ICEWS14/ICEWS18 (García‑Durán et al. 2018)
- https://data.humdata.org/organization/acled (ACLED)
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Experiments limited to small/medium open models due to compute; results may change with larger or different models.
- Method assumes candidate answers appear in observed histories (inductive setting) and does not handle unseen entities (transductive-only).
- Approach can struggle with tokenizers that lack multi‑digit numeric tokens (noted for some model families).
When Not To Use
- When answers can be entities never observed in history (transductive future entities).
- When you require calibrated probability estimates for downstream decision making.
- If your deployment cannot afford repeated LLM API calls or large model inference cost.
Failure Modes
- Top‑token decoding may omit numeric labels; paper sets rank=100 for missing tokens, producing false negatives.
- Accumulation of errors in multi‑step mode when the model's predictions are re‑fed as history.
- Performance depends on careful history selection; including unrelated bidirectional facts can drop accuracy.
Core Entities
Models
- GPT2
- gpt-j-6b
- gpt-neox-20b
- gpt-3.5-turbo
- GPT-NeoX
- GPT-J
Metrics
- Hits@1
- Hits@3
- Hits@10
- Time-aware filter
Datasets
- WIKI
- YAGO
- ICEWS14
- ICEWS18
- ACLED-CD22
Benchmarks
- Temporal Knowledge Graph (TKG) forecasting

