Web Analytics Made Easy - Statcounter

Scaling enterprise OCR means solving quality and cost. At the same time.

Most OCR solutions force a choice between quality and cost. LightOnOCR-2-1B doesn't.

March 6, 2026
Lightbulb

TL;DR

At small volumes, OCR quality is the only thing that matters. At scale, cost becomes just as structural. A model that reads perfectly but prices you out of production isn't a solution, it's a pilot that never ships.

We built LightOnOCR-2-1B for production.

Measured by someone else, on documents we didn't choose

A developer recently published an open source workbench to compare OCR engines on real-world documents. Four publicly available PDFs: a corporate document, a handwriting sample, a multi-column annual report, a German medical bulletin. Full methodology on GitHub.

Engine Coca-Cola NIST Handwriting World Food Bank RKI Bulletin (DE) Cost / page
LightOnOCR-2-1B πŸ† Excellent πŸ† Excellent πŸ† Excellent πŸ† Excellent 0.5 ct
Azure Document Intelligence 🟒 Very good 🟒 Good 🟒 Very good πŸ† Excellent 1 ct
Docling β€” suryaocr 🟒 Very good πŸ”΄ Poor 🟒 Good 🟒 Very good 0.17 ct
Docling β€” EasyOCR 🟑 Medium πŸ”΄ Poor 🟑 Medium 🟒 Very good 0.11 ct
Docling β€” RapidOCR 🟒 Good πŸ”΄ Poor 🟒 Good 🟒 Very good 0.06 ct
MinerU 🟒 Good πŸ”΄ Poor 🟒 Good 🟒 Very good 0.17 ct
Marker 🟒 Good πŸ”΄ Poor 🟒 Good 🟑 Medium 0.11 ct
Docling β€” Granite πŸ”΄ Poor 🟑 Medium πŸ”΄ Poor 🟒 Good 1.16 ct
Docling β€” Tesseract πŸ”΄ Poor πŸ”΄ Poor πŸ”΄ Poor 🟒 Good 0.13 ct

Source: Jonas Wacker / ocr-workbench

Nine engines tested. One scored Excellent on all four documents. Handwriting is where every open source engine fails and Azure Document Intelligence only reaches Good. LightOnOCR-2-1B is the only one to get it right. At 0.5 ct per page: half the cost of Azure, better results on three documents, equal on the fourth.

Small model. Serious results.

LightOnOCR-2-1B runs on standard GPU hardware. It deploys inside your own infrastructure. Your documents stay within your perimeter.

At one million pages per month, the economics are straightforward. At any volume, the architecture is the same: no external dependency, no data exposure, no per-page cloud bill that scales against you.

This is what production-ready looks like.

Credit where it's due

The benchmark referenced in this post was built and published independently by Jonas Wacker. His OCR Workbench is open source, reproducible, and available for anyone to run on their own documents. We had no part in it and that's exactly what makes it valuable.

‍

LightOnOCR-2-1B is available on HuggingFace. For enterprise document pipelines with full data sovereignty, explore Paradigm.

Ready to Transform Your Enterprise?

Recent Blogs

Ready to Transform Your Enterprise?