Combine multiple OCR engines + two LLMs and pick the best JSON by majority voting to boost invoice OCR accuracy and speed.

Overview

Decision SnapshotNeeds Validation

Results are promising but based on a small (100-image) invoice dataset and free-tier API constraints; verify on larger, real-world data before production rollout.

Citations1

Evidence Strength0.55

Confidence0.60

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 2/2

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 50%

Novelty: 60%

Authors

Osama Abdellatif, Ahmed Ayman, Ali Hamdi

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Combining multiple OCR engines with LLM-based JSON conversion and majority voting can cut extraction errors and improve throughput for invoice automation, reducing manual fixes and speeding up batch processing.

Who Should Care

Product Manager CTO ML Engineer Engineering Lead Data Scientist Founder

Summary TLDR

LMV-RPA is a production-style pipeline that runs four OCR engines (PaddleOCR, Tesseract, EasyOCR, DocTR), sends each OCR output to two Large Language Models (LLMs) to convert text into JSON, and picks the final structured result by majority voting. On a 100-image invoice test set the authors report 99% extraction accuracy versus a 94% baseline and an average runtime of 121.27s versus 212–218s for UiPath/Automation Anywhere. The study is promising but uses a small invoice-focused dataset and constrained free-tier APIs, so expect more validation before deploying at scale.

Problem Statement

Standard OCR in RPA struggles with ambiguous characters and complex layouts. Single-engine OCR often misreads noisy or varied invoices and manual fixes are expensive. The paper seeks a reliable, automated pipeline that both improves accuracy of extracted fields and outputs structured JSON for downstream RPA tasks.

Main Contribution

LMV-RPA pipeline: multi-OCR (PaddleOCR, Tesseract, EasyOCR, DocTR) + two LLMs to convert each OCR output to JSON, then majority-vote across JSON outputs.

Empirical comparison showing higher extraction accuracy (reported 99% vs 94%) on a 100-image invoice dataset.

Key Findings

LMV-RPA achieved higher extraction accuracy than the baseline.

Numbers99% vs 94% (reported on 100 invoices)

Practical UseUse multi-engine OCR plus LLM-based JSON conversion and voting to reduce field extraction errors in invoice pipelines.

Evidence RefTable 2 (Accuracy comparison)

LMV-RPA ran faster, on average, than two commercial RPA platforms in authors' tests.

NumbersAverage runtime 121.27s vs UiPath 212.33s and Automation Anywhere 217.80s

Practical UseThe pipeline can lower end-to-end processing time under similar hardware and dataset constraints, helping throughput for batch invoice jobs.

Evidence RefTable 1 (Average Run Time)

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Accuracy	99%	Traditional (1 OCR + 1-layer LLM)	+5 percentage points	100-image invoice test set	Table 2 reports 99% for LMV-RPA vs 94% for traditional model	Table 2
Average runtime	121.27 sec	UiPath 212.33 sec; Automation Anywhere 217.80 sec	~43–44% faster than UiPath/Automation Anywhere	Benchmark tasks (same hardware, paper's test)	Table 1 reports runtimes per platform	Table 1

What To Try In 7 Days

Run LMV-RPA on a small sample of your invoices (50–200) and compare field-level accuracy to your current tool.

Log per-file runtimes to check throughput under your hardware and adjust polling interval.

Test failure cases: low-quality scans, unusual layouts, and non-invoice documents to map limitations.

Agent Features

Memory

Short-term file tracking (seen vs new files)

Planning

Simple event-driven monitoring (detect new files and process)

Tool Use

Uses multiple OCR engines and LLMs as tools

Frameworks

Custom RPA pipeline (LMV-RPA)

Is Agentic

Yes

Architectures

Pipeline: multi-OCR → 2 LLMs → majority votingContinuous directory-watching loop

Collaboration

Voting across independent OCR+LLM outputs

Optimization Features

System Optimization

Asynchronous multi-engine processing to shorten end-to-end time

Inference Optimization

Parallel OCR engines to improve robustnessMajority voting reduces need for heavier single-model correction

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Code URLs

Repo

Data URLs

data set

Risks & Boundaries

Limitations

Small dataset (100 images) focused on invoices only.

Authors used free-tier OCR/APIs and added a 5s delay, which may alter runtime behavior.

When Not To Use

For non-invoice or very different document types without retesting.

When strict, provable privacy constraints forbid sending text to external LLMs.

Failure Modes

Majority voting can reinforce a common OCR misread if all engines err the same way.

LLMs may alter or hallucinate critical field text when converting to JSON.

Core Entities

Models

LLaMA 3Gemini-1.5-proPaddleOCRTesseractEasyOCRDocTR

Metrics

Accuracyaverage runtime (seconds)

Datasets

100-image invoice dataset (Kaggle/Roboflow/Kozlowski samples)Kaggle invoice-like imagesRoboflow invoice_dataKozlowski Samples of Electronic Invoices

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

LMV-RPA achieved higher extraction accuracy than the baseline.

LMV-RPA ran faster, on average, than two commercial RPA platforms in authors' tests.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

You May Also Want to Read

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding

Create, customize, and run multi-step LLM agents from plain language — no code needed

Key finding

MLRC-Bench: a competition-based benchmark that tests if LLM agents can propose and implement novel ML research

Key finding

A closed-loop Sensing→Regulating→Correcting system that routes LLM execution by uncertainty to cut errors and API cost

Key finding

BackdoorAgent: a stage-aware framework and benchmark showing memory backdoors persist across multi-step LLM agents

Key finding