Scaling enterprise OCR means solving quality and cost. At the same time.

Most OCR solutions force a choice between quality and cost. LightOnOCR-2-1B doesn't.

March 6, 2026

TL;DR

At small volumes, OCR quality is the only thing that matters. At scale, cost becomes just as structural. A model that reads perfectly but prices you out of production isn't a solution, it's a pilot that never ships.

We built LightOnOCR-2-1B for production.

Measured by someone else, on documents we didn't choose

A developer recently published an open source workbench to compare OCR engines on real-world documents. Four publicly available PDFs: a corporate document, a handwriting sample, a multi-column annual report, a German medical bulletin. Full methodology on GitHub.

Engine	Coca-Cola	NIST Handwriting	World Food Bank	RKI Bulletin (DE)	Cost / page
LightOnOCR-2-1B	🏆 Excellent	🏆 Excellent	🏆 Excellent	🏆 Excellent	0.5 ct
Azure Document Intelligence	🟢 Very good	🟢 Good	🟢 Very good	🏆 Excellent	1 ct
Docling — suryaocr	🟢 Very good	🔴 Poor	🟢 Good	🟢 Very good	0.17 ct
Docling — EasyOCR	🟡 Medium	🔴 Poor	🟡 Medium	🟢 Very good	0.11 ct
Docling — RapidOCR	🟢 Good	🔴 Poor	🟢 Good	🟢 Very good	0.06 ct
MinerU	🟢 Good	🔴 Poor	🟢 Good	🟢 Very good	0.17 ct
Marker	🟢 Good	🔴 Poor	🟢 Good	🟡 Medium	0.11 ct
Docling — Granite	🔴 Poor	🟡 Medium	🔴 Poor	🟢 Good	1.16 ct
Docling — Tesseract	🔴 Poor	🔴 Poor	🔴 Poor	🟢 Good	0.13 ct

Source: Jonas Wacker / ocr-workbench

Nine engines tested. One scored Excellent on all four documents. Handwriting is where every open source engine fails and Azure Document Intelligence only reaches Good. LightOnOCR-2-1B is the only one to get it right. At 0.5 ct per page: half the cost of Azure, better results on three documents, equal on the fourth.

Small model. Serious results.

LightOnOCR-2-1B runs on standard GPU hardware. It deploys inside your own infrastructure. Your documents stay within your perimeter.

At one million pages per month, the economics are straightforward. At any volume, the architecture is the same: no external dependency, no data exposure, no per-page cloud bill that scales against you.

This is what production-ready looks like.

Credit where it's due

The benchmark referenced in this post was built and published independently by Jonas Wacker. His OCR Workbench is open source, reproducible, and available for anyone to run on their own documents. We had no part in it and that's exactly what makes it valuable.

‍

LightOnOCR-2-1B is available on HuggingFace. For enterprise document pipelines with full data sovereignty, explore Paradigm.

Ready to Transform Your Enterprise?

TL;DR

Measured by someone else, on documents we didn't choose

Small model. Serious results.

Credit where it's due

Recent Blogs

LightOn | News March 2026

EDiTh: Enterprise Search Benchmark for Questions You Can't Outsource

Day Zero of Multi-Vector Retrieval

Ready to Transform Your Enterprise?