Contrato360 2.0 — agent-orchestrated RAG + text-to-SQL Q&A for contract management

December 23, 20247 min

Overview

Production Readiness

0.7

Novelty Score

0.4

Cost Impact Score

0.6

Citation Count

0

Authors

Antony Seabra, Claudio Cavalcante, Joao Nepomuceno, Lucas Lago, Nicolaas Ruberg, Sergio Lifschitz

Links

Abstract / PDF

Why It Matters For Business

A small engineering effort that wires RAG, text-to-SQL, and lightweight agents gives contract teams fast, accurate answers across PDFs and contract databases without retraining LLMs, cutting manual search time.

Summary TLDR

Contrato360 2.0 is a practical Q&A system for contract managers that combines document retrieval (RAG), a text-to-SQL agent over a contract database, prompt engineering, and a router agent to direct queries. Implemented with OpenAI models (embeddings: text-davinci-002; answers: gpt-4-turbo), ChromaDb vectorstore, LangChain SQL agent, Streamlit UI and Plotly for graphs. On an internal test with 75 contracts and domain experts, direct contract lookups were consistently correct and mixed results occurred for semantically complex indirect queries. The design avoids fine-tuning by orchestrating retrieval and execution with agents.

Problem Statement

Contract managers need fast, reliable answers that combine facts inside long contract PDFs and up-to-date records in contract management systems. Standard contract layouts and repeated wording make simple similarity search return the wrong contract. The paper solves how to combine PDFs and structured CMS data without retraining LLMs and how to route queries to the right tool.

Main Contribution

A production-style Q&A architecture that routes queries via a Router Agent to a RAG agent and a text-to-SQL agent in parallel.

Semantic chunking by contract section plus metadata (source, contract number, clause) to reduce cross-contract retrieval errors.

Prompt engineering patterns and role-based contexts to constrain LLM outputs and require contract identifiers in answers.

A safety step that validates generated SQL to block destructive commands before execution.

Key Findings

Direct document lookups returned correct answers on the evaluated benchmark questions.

NumbersTable 1: direct questions show 10/10 correct for listed items

Indirect queries that require database facts were mostly correct but showed gaps for some semantic topics.

NumbersTable 2: many indirect items 10/10 correct; some items 8/10 or 9/10

A specific indirect query type ('DLs / exemptions from tenders') produced incomplete answers in all evaluated cases.

NumbersTable 2: 0 correct, 10 incomplete for 'How many DLs were contracted in yy?'

Results

Direct question correctness

Value10/10 per listed question

Indirect question correctness (mixed)

Value8–10/10 depending on question

Specific semantic failure (DLs/exemptions)

Value0/10 correct, 10/10 incomplete

Who Should Care

What To Try In 7 Days

Index a pilot set of contracts as section-level chunks and add contract-id metadata to each chunk.

Implement a simple router: run RAG and a safe text-to-SQL in parallel and merge outputs for the LLM to synthesize.

Add prompt rules that force the model to return contract identifiers and to avoid using prior knowledge.

Agent Features

Memory

  • Chat history used as context per session

Planning

  • Task routing based on query domain
  • Parallel retrieval and SQL execution

Tool Use

  • LangChain agents
  • OpenAI models
  • ChromaDb vectorstore
  • SQLite

Frameworks

  • LangChain
  • OpenAI API

Is Agentic

true

Architectures

  • Router Agent
  • RAG Agent
  • Text-to-SQL (SQL) Agent
  • Graph Agent
  • LLM Answer Generation Agent

Collaboration

  • Agents exchange retrieved chunks, SQL results, and prompts

Optimization Features

Token Efficiency

  • Section-based chunking to limit irrelevant context

System Optimization

  • Metadata filtering at retrieval to reduce wrong-contract hits

Reproducibility

Open Source Status

  • no

Risks & Boundaries

Limitations

  • Small evaluation: two specialists and a 75-contract corpus limit external validity.
  • Domain and language specific (Portuguese contracts); performance on other languages/domains not shown.
  • Relies on external LLM API (OpenAI) and a private contract database; costs and data governance matter.
  • Some indirect semantic concepts were not captured and need prompt/schema tuning (e.g., 'DLs').

When Not To Use

  • For automated legal advice or decisions without human review.
  • When data privacy or regulations forbid external LLM APIs without strict controls.
  • If you lack structured metadata or a contract database to filter by contract id.

Failure Modes

  • Wrong-contract retrieval if metadata missing or mis-tagged in chunks.
  • Incomplete or vague answers for semantic concepts absent from prompts or schema mapping.
  • Incorrect SQL generation if entity recognition mismaps terms to schema.

Core Entities

Models

  • gpt-4-turbo (answers)
  • text-davinci-002 (embeddings)

Metrics

  • Relevance categories: Correct, Incomplete (no Incorrect observed)

Datasets

  • BNDES contract PDFs (75 documents)
  • Contract Management System export (SQLite sample)

Benchmarks

  • Internal direct/indirect QA benchmark (prepared questions; Tables 1 & 2)