Overview
The system is practical and engineering-focused; evaluation is small (75 contracts, two experts) so expect a reliable prototype but limited generalization without wider testing.
Citations0
Evidence Strength0.60
Confidence0.80
Risk Signals10
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 0/3
Reproducibility
Status: No open assets linked
Open source: No
At A Glance
Cost impact: 60%
Production readiness: 70%
Novelty: 40%
Why It Matters For Business
A small engineering effort that wires RAG, text-to-SQL, and lightweight agents gives contract teams fast, accurate answers across PDFs and contract databases without retraining LLMs, cutting manual search time.
Who Should Care
Summary TLDR
Contrato360 2.0 is a practical Q&A system for contract managers that combines document retrieval (RAG), a text-to-SQL agent over a contract database, prompt engineering, and a router agent to direct queries. Implemented with OpenAI models (embeddings: text-davinci-002; answers: gpt-4-turbo), ChromaDb vectorstore, LangChain SQL agent, Streamlit UI and Plotly for graphs. On an internal test with 75 contracts and domain experts, direct contract lookups were consistently correct and mixed results occurred for semantically complex indirect queries. The design avoids fine-tuning by orchestrating retrieval and execution with agents.
Problem Statement
Contract managers need fast, reliable answers that combine facts inside long contract PDFs and up-to-date records in contract management systems. Standard contract layouts and repeated wording make simple similarity search return the wrong contract. The paper solves how to combine PDFs and structured CMS data without retraining LLMs and how to route queries to the right tool.
Main Contribution
A production-style Q&A architecture that routes queries via a Router Agent to a RAG agent and a text-to-SQL agent in parallel.
Semantic chunking by contract section plus metadata (source, contract number, clause) to reduce cross-contract retrieval errors.
Key Findings
Direct document lookups returned correct answers on the evaluated benchmark questions.
Indirect queries that require database facts were mostly correct but showed gaps for some semantic topics.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Direct question correctness | 10/10 per listed question | — | — | Direct QA benchmark (Table 1) | All listed direct questions labeled 'Correct' across evaluations | Section 5, Table 1 |
| Indirect question correctness (mixed) | 8–10/10 depending on question | — | — | Indirect QA benchmark (Table 2) | Some indirect questions returned 10/10, others 8/10 or 9/10 correct | Section 5, Table 2 |
What To Try In 7 Days
Index a pilot set of contracts as section-level chunks and add contract-id metadata to each chunk.
Implement a simple router: run RAG and a safe text-to-SQL in parallel and merge outputs for the LLM to synthesize.
Add prompt rules that force the model to return contract identifiers and to avoid using prior knowledge.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
System Optimization
Reproducibility
Risks & Boundaries
Limitations
Small evaluation: two specialists and a 75-contract corpus limit external validity.
Domain and language specific (Portuguese contracts); performance on other languages/domains not shown.
When Not To Use
For automated legal advice or decisions without human review.
When data privacy or regulations forbid external LLM APIs without strict controls.
Failure Modes
Wrong-contract retrieval if metadata missing or mis-tagged in chunks.
Incomplete or vague answers for semantic concepts absent from prompts or schema mapping.

