Overview
SKETCH is a practical hybrid retrieval method with clear per-dataset gains; it is ready for prototyping but costly at scale due to KG build and LLM dependence.
Citations0
Evidence Strength0.70
Confidence0.85
Risk Signals11
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 5/5
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 40%
Production readiness: 60%
Novelty: 50%
Why It Matters For Business
SKETCH gives more accurate, context-preserving retrieval for complex, multi-part queries, which improves downstream answers and traceability at the cost of higher KG construction and LLM use.
Who Should Care
Summary TLDR
SKETCH combines semantic chunking (split text into meaning-preserving units) with a knowledge graph (structured entities and relations) and a hybrid retriever to improve Retrieval-Augmented Generation (RAG). Evaluated using RAGAS metrics on four datasets (Italian Cuisine, QuALITY, QASPER, NarrativeQA), SKETCH raises answer relevancy and context precision versus Naive RAG and several baselines. Gains are largest for small-domain tests and long-document comprehension, but building large KGs and relying on GPT models raises cost and reproducibility concerns.
Problem Statement
Current RAG systems often lose context when they split text arbitrarily and struggle to combine evidence spread across distant parts of a corpus. This reduces answer relevancy and multi-hop reasoning for complex queries.
Main Contribution
SKETCH: a hybrid retrieval method that fuses semantic chunking (meaningful text chunks) with a knowledge graph (entities + relations).
A concrete indexing pipeline: semantic splitting, recursive character splitting (100-token chunks, 16-token overlap), FAISS embeddings, and a KG built from extracted entities.
Key Findings
On the small Italian Cuisine test, SKETCH reached very high relevancy and precision.
On QuALITY (long-document comprehension), SKETCH improved answer relevancy over Naive RAG.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Italian Cuisine - answer_relevancy | 0.94 | Naive RAG=0.61 | +54.1% | Italian Cuisine | Table 1: SKETCH=0.94 vs Naive RAG=0.61 | Table 1 |
| Italian Cuisine - context_precision | 0.99 | Naive RAG=0.81 | +22.2% | Italian Cuisine | Table 1: SKETCH context_precision=0.99 | Table 1 |
What To Try In 7 Days
Run semantic chunking on a small domain corpus and index embeddings into FAISS to see immediate gains in retrieval.
Extract entities with an LLM for a small subset, build a toy KG, and run cypher queries to compare structured vs unstructured hits.
Combine KG results and embedding results with a simple overlap-weight rule and measure relevancy on a handful of multi-context queries using an LLM judge.
Reproducibility
Data URLs
Risks & Boundaries
Limitations
KG construction is labor intensive and may not scale cheaply to very large corpora.
Dependency on GPT models for NER and judging increases cost and adds variance from sampling and prompt sensitivity.
When Not To Use
If you cannot afford LLM API costs or KG construction at scale.
When absolute recall is required and a simpler KG-only approach already gives higher recall.
Failure Modes
Erroneous entity extraction from GPT NER leading to wrong KG traversals.
Sparse or incomplete KG causing missed multi-hop links and low recall.

