Contrato360 2.0 — agent-orchestrated RAG + text-to-SQL Q&A for contract management

Overview

Decision SnapshotNeeds Validation

The system is practical and engineering-focused; evaluation is small (75 contracts, two experts) so expect a reliable prototype but limited generalization without wider testing.

Citations0

Evidence Strength0.60

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 0/3

Reproducibility

Status: No open assets linked

Open source: No

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 40%

Authors

Antony Seabra, Claudio Cavalcante, Joao Nepomuceno, Lucas Lago, Nicolaas Ruberg, Sergio Lifschitz

Links

Abstract / PDF

Why It Matters For Business

A small engineering effort that wires RAG, text-to-SQL, and lightweight agents gives contract teams fast, accurate answers across PDFs and contract databases without retraining LLMs, cutting manual search time.

Who Should Care

Product Manager ML Engineer Engineering Lead CTO

Summary TLDR

Contrato360 2.0 is a practical Q&A system for contract managers that combines document retrieval (RAG), a text-to-SQL agent over a contract database, prompt engineering, and a router agent to direct queries. Implemented with OpenAI models (embeddings: text-davinci-002; answers: gpt-4-turbo), ChromaDb vectorstore, LangChain SQL agent, Streamlit UI and Plotly for graphs. On an internal test with 75 contracts and domain experts, direct contract lookups were consistently correct and mixed results occurred for semantically complex indirect queries. The design avoids fine-tuning by orchestrating retrieval and execution with agents.

Problem Statement

Contract managers need fast, reliable answers that combine facts inside long contract PDFs and up-to-date records in contract management systems. Standard contract layouts and repeated wording make simple similarity search return the wrong contract. The paper solves how to combine PDFs and structured CMS data without retraining LLMs and how to route queries to the right tool.

Main Contribution

A production-style Q&A architecture that routes queries via a Router Agent to a RAG agent and a text-to-SQL agent in parallel.

Semantic chunking by contract section plus metadata (source, contract number, clause) to reduce cross-contract retrieval errors.

Key Findings

Direct document lookups returned correct answers on the evaluated benchmark questions.

NumbersTable 1: direct questions show 10/10 correct for listed items

Practical UseFor factual, document-contained queries (who, what, term), use RAG with section-based chunking and metadata filtering for reliable results.

Evidence RefSection 5, Table 1

Indirect queries that require database facts were mostly correct but showed gaps for some semantic topics.

NumbersTable 2: many indirect items 10/10 correct; some items 8/10 or 9/10

Practical UseCombine text-to-SQL and RAG in parallel, but expect some indirect queries to need prompt or schema adjustments to capture domain concepts.

Evidence RefSection 5, Table 2

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Direct question correctness	10/10 per listed question	—	—	Direct QA benchmark (Table 1)	All listed direct questions labeled 'Correct' across evaluations	Section 5, Table 1
Indirect question correctness (mixed)	8–10/10 depending on question	—	—	Indirect QA benchmark (Table 2)	Some indirect questions returned 10/10, others 8/10 or 9/10 correct	Section 5, Table 2

What To Try In 7 Days

Index a pilot set of contracts as section-level chunks and add contract-id metadata to each chunk.

Implement a simple router: run RAG and a safe text-to-SQL in parallel and merge outputs for the LLM to synthesize.

Add prompt rules that force the model to return contract identifiers and to avoid using prior knowledge.

Agent Features

Memory

Chat history used as context per session

Planning

Task routing based on query domainParallel retrieval and SQL execution

Tool Use

LangChain agentsOpenAI modelsChromaDb vectorstoreSQLite

Frameworks

LangChainOpenAI API

Is Agentic

Yes

Architectures

Router AgentRAG AgentText-to-SQL (SQL) AgentGraph AgentLLM Answer Generation Agent

Collaboration

Agents exchange retrieved chunks, SQL results, and prompts

Optimization Features

Token Efficiency

Section-based chunking to limit irrelevant context

System Optimization

Metadata filtering at retrieval to reduce wrong-contract hits

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusNo

LicenseUnknown

Risks & Boundaries

Limitations

Small evaluation: two specialists and a 75-contract corpus limit external validity.

Domain and language specific (Portuguese contracts); performance on other languages/domains not shown.

When Not To Use

For automated legal advice or decisions without human review.

When data privacy or regulations forbid external LLM APIs without strict controls.

Failure Modes

Wrong-contract retrieval if metadata missing or mis-tagged in chunks.

Incomplete or vague answers for semantic concepts absent from prompts or schema mapping.

Core Entities

Models

gpt-4-turbo (answers)text-davinci-002 (embeddings)

Metrics

Relevance categories: Correct, Incomplete (no Incorrect observed)

Datasets

BNDES contract PDFs (75 documents)Contract Management System export (SQLite sample)

Benchmarks

Internal direct/indirect QA benchmark (prepared questions; Tables 1 & 2)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Direct document lookups returned correct answers on the evaluated benchmark questions.

Indirect queries that require database facts were mostly correct but showed gaps for some semantic topics.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

Create, customize, and run multi-step LLM agents from plain language — no code needed

Key finding

COMPASS: a multi-agent orchestration that uses RAG and an LLM-as-judge to enforce sovereignty, carbon-awareness, compliance, and ethics in实时

Key finding

RAPS: intent-driven, reputation-aware publish–subscribe for adaptive multi-agent LLM coordination

Key finding

ACP: a layered, federated protocol for secure cross-platform agent-to-agent collaboration

Key finding