Contrato360 2.0 — agent-orchestrated RAG + text-to-SQL Q&A for contract management

December 23, 20247 min

Overview

Decision SnapshotNeeds Validation

The system is practical and engineering-focused; evaluation is small (75 contracts, two experts) so expect a reliable prototype but limited generalization without wider testing.

Citations0

Evidence Strength0.60

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 0/3

Reproducibility

Status: No open assets linked

Open source: No

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 40%

Authors

Antony Seabra, Claudio Cavalcante, Joao Nepomuceno, Lucas Lago, Nicolaas Ruberg, Sergio Lifschitz

Links

Abstract / PDF

Why It Matters For Business

A small engineering effort that wires RAG, text-to-SQL, and lightweight agents gives contract teams fast, accurate answers across PDFs and contract databases without retraining LLMs, cutting manual search time.

Who Should Care

Summary TLDR

Contrato360 2.0 is a practical Q&A system for contract managers that combines document retrieval (RAG), a text-to-SQL agent over a contract database, prompt engineering, and a router agent to direct queries. Implemented with OpenAI models (embeddings: text-davinci-002; answers: gpt-4-turbo), ChromaDb vectorstore, LangChain SQL agent, Streamlit UI and Plotly for graphs. On an internal test with 75 contracts and domain experts, direct contract lookups were consistently correct and mixed results occurred for semantically complex indirect queries. The design avoids fine-tuning by orchestrating retrieval and execution with agents.

Problem Statement

Contract managers need fast, reliable answers that combine facts inside long contract PDFs and up-to-date records in contract management systems. Standard contract layouts and repeated wording make simple similarity search return the wrong contract. The paper solves how to combine PDFs and structured CMS data without retraining LLMs and how to route queries to the right tool.

Main Contribution

A production-style Q&A architecture that routes queries via a Router Agent to a RAG agent and a text-to-SQL agent in parallel.

Semantic chunking by contract section plus metadata (source, contract number, clause) to reduce cross-contract retrieval errors.

Key Findings

Direct document lookups returned correct answers on the evaluated benchmark questions.

NumbersTable 1: direct questions show 10/10 correct for listed items

Practical UseFor factual, document-contained queries (who, what, term), use RAG with section-based chunking and metadata filtering for reliable results.

Evidence RefSection 5, Table 1

Indirect queries that require database facts were mostly correct but showed gaps for some semantic topics.

NumbersTable 2: many indirect items 10/10 correct; some items 8/10 or 9/10

Practical UseCombine text-to-SQL and RAG in parallel, but expect some indirect queries to need prompt or schema adjustments to capture domain concepts.

Evidence RefSection 5, Table 2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Direct question correctness10/10 per listed questionDirect QA benchmark (Table 1)All listed direct questions labeled 'Correct' across evaluationsSection 5, Table 1
Indirect question correctness (mixed)810/10 depending on questionIndirect QA benchmark (Table 2)Some indirect questions returned 10/10, others 8/10 or 9/10 correctSection 5, Table 2

What To Try In 7 Days

Index a pilot set of contracts as section-level chunks and add contract-id metadata to each chunk.

Implement a simple router: run RAG and a safe text-to-SQL in parallel and merge outputs for the LLM to synthesize.

Add prompt rules that force the model to return contract identifiers and to avoid using prior knowledge.

Agent Features

Memory
Chat history used as context per session
Planning
Task routing based on query domainParallel retrieval and SQL execution
Tool Use
LangChain agentsOpenAI modelsChromaDb vectorstoreSQLite
Frameworks
LangChainOpenAI API
Is Agentic

Yes

Architectures
Router AgentRAG AgentText-to-SQL (SQL) AgentGraph AgentLLM Answer Generation Agent
Collaboration
Agents exchange retrieved chunks, SQL results, and prompts

Optimization Features

Token Efficiency
Section-based chunking to limit irrelevant context
System Optimization
Metadata filtering at retrieval to reduce wrong-contract hits

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusNo
LicenseUnknown

Risks & Boundaries

Limitations

Small evaluation: two specialists and a 75-contract corpus limit external validity.

Domain and language specific (Portuguese contracts); performance on other languages/domains not shown.

When Not To Use

For automated legal advice or decisions without human review.

When data privacy or regulations forbid external LLM APIs without strict controls.

Failure Modes

Wrong-contract retrieval if metadata missing or mis-tagged in chunks.

Incomplete or vague answers for semantic concepts absent from prompts or schema mapping.

Core Entities

Models

gpt-4-turbo (answers)text-davinci-002 (embeddings)

Metrics

Relevance categories: Correct, Incomplete (no Incorrect observed)

Datasets

BNDES contract PDFs (75 documents)Contract Management System export (SQLite sample)

Benchmarks

Internal direct/indirect QA benchmark (prepared questions; Tables 1 & 2)