Overview
Production Readiness
0.7
Novelty Score
0.4
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
A small engineering effort that wires RAG, text-to-SQL, and lightweight agents gives contract teams fast, accurate answers across PDFs and contract databases without retraining LLMs, cutting manual search time.
Summary TLDR
Contrato360 2.0 is a practical Q&A system for contract managers that combines document retrieval (RAG), a text-to-SQL agent over a contract database, prompt engineering, and a router agent to direct queries. Implemented with OpenAI models (embeddings: text-davinci-002; answers: gpt-4-turbo), ChromaDb vectorstore, LangChain SQL agent, Streamlit UI and Plotly for graphs. On an internal test with 75 contracts and domain experts, direct contract lookups were consistently correct and mixed results occurred for semantically complex indirect queries. The design avoids fine-tuning by orchestrating retrieval and execution with agents.
Problem Statement
Contract managers need fast, reliable answers that combine facts inside long contract PDFs and up-to-date records in contract management systems. Standard contract layouts and repeated wording make simple similarity search return the wrong contract. The paper solves how to combine PDFs and structured CMS data without retraining LLMs and how to route queries to the right tool.
Main Contribution
A production-style Q&A architecture that routes queries via a Router Agent to a RAG agent and a text-to-SQL agent in parallel.
Semantic chunking by contract section plus metadata (source, contract number, clause) to reduce cross-contract retrieval errors.
Prompt engineering patterns and role-based contexts to constrain LLM outputs and require contract identifiers in answers.
A safety step that validates generated SQL to block destructive commands before execution.
Key Findings
Direct document lookups returned correct answers on the evaluated benchmark questions.
Indirect queries that require database facts were mostly correct but showed gaps for some semantic topics.
A specific indirect query type ('DLs / exemptions from tenders') produced incomplete answers in all evaluated cases.
Results
Direct question correctness
Indirect question correctness (mixed)
Specific semantic failure (DLs/exemptions)
Who Should Care
What To Try In 7 Days
Index a pilot set of contracts as section-level chunks and add contract-id metadata to each chunk.
Implement a simple router: run RAG and a safe text-to-SQL in parallel and merge outputs for the LLM to synthesize.
Add prompt rules that force the model to return contract identifiers and to avoid using prior knowledge.
Agent Features
Memory
- Chat history used as context per session
Planning
- Task routing based on query domain
- Parallel retrieval and SQL execution
Tool Use
- LangChain agents
- OpenAI models
- ChromaDb vectorstore
- SQLite
Frameworks
- LangChain
- OpenAI API
Is Agentic
true
Architectures
- Router Agent
- RAG Agent
- Text-to-SQL (SQL) Agent
- Graph Agent
- LLM Answer Generation Agent
Collaboration
- Agents exchange retrieved chunks, SQL results, and prompts
Optimization Features
Token Efficiency
- Section-based chunking to limit irrelevant context
System Optimization
- Metadata filtering at retrieval to reduce wrong-contract hits
Reproducibility
Open Source Status
- no
Risks & Boundaries
Limitations
- Small evaluation: two specialists and a 75-contract corpus limit external validity.
- Domain and language specific (Portuguese contracts); performance on other languages/domains not shown.
- Relies on external LLM API (OpenAI) and a private contract database; costs and data governance matter.
- Some indirect semantic concepts were not captured and need prompt/schema tuning (e.g., 'DLs').
When Not To Use
- For automated legal advice or decisions without human review.
- When data privacy or regulations forbid external LLM APIs without strict controls.
- If you lack structured metadata or a contract database to filter by contract id.
Failure Modes
- Wrong-contract retrieval if metadata missing or mis-tagged in chunks.
- Incomplete or vague answers for semantic concepts absent from prompts or schema mapping.
- Incorrect SQL generation if entity recognition mismaps terms to schema.
Core Entities
Models
- gpt-4-turbo (answers)
- text-davinci-002 (embeddings)
Metrics
- Relevance categories: Correct, Incomplete (no Incorrect observed)
Datasets
- BNDES contract PDFs (75 documents)
- Contract Management System export (SQLite sample)
Benchmarks
- Internal direct/indirect QA benchmark (prepared questions; Tables 1 & 2)

