Production RAG without the
9-month build

SOTA on public retrieval and OCR benchmarks. Three endpoints, one API key.
Stop building. Start shipping.

# Your first call 
curl -X POST https://api.lighton.ai/api/v3/search \
    -H "Authorization: Bearer $LIGHTON_API_KEY" \ 
    -H "Content-Type: application/json" \ 
    -d '{
       "query": "Which clauses differ between the enterprise and SMB        agreements?",
       "top_k": 10,
       "workspace_id": ["42"]
       }'
   

Start for free

Read the docs

In production at

Three endpoints.
Zero pipeline maintenance.

Parse a document. Extract any field. Retrieve with citations.
The same API key on Console, the same SDK on Enterprise.

LightOnOCR-2.
State-of-the-art parsing.

Turns scans, tables, handwriting, and multi-column layouts into structured Markdown. 20+ languages natively. The parsing engine behind every retrieval workflow.

83.2

OLMOCR-BENCH (SOTA)

€0.002

Per Page

20+

Native language

Open

Weights on HuggingFace

$ curl https://api.lighton.ai/v3/parse \    
    -H 'Authorization: Bearer $LIGHTON_API_KEY' \ 
    -H 'Content-Type: application/json'\ 
    -d '{"document":"https://console-examples.lighton.ai/AFD-091005-062.pdf"}'

Define the schema.
Get JSON back.

Pull any field, entity, or key-value pair you care about. Invoice numbers, lease end dates, claim IDs, contract clauses. You define the schema; LightOn returns structured JSON.

JSON Schema

In / Out

€0.004

Per Page

Async

Via Webhooks

Cited

Per Field

$ curl https://api.lighton.ai/v3/extract \    
    -H 'Authorization: Bearer $LIGHTON_API_KEY' \   
    -H 'Content-Type: application/json' \   
    -d ''{"document":"https://console-examples.lighton.ai/AFD-091005-062.pdf","schema":"<YOUR_JSON_SCHEMA>"}'

Grounded retrieval
with citations.

One query, three signals: dense, sparse, late-interaction. The index picks the right signal, not the developer. Every result ships with the source passage that produced it. Built on LateOn and NextPlaid, our open-source ColBERT family.

Multi-vector

Dense + Sparse + Li

€0.006

Per Query

<200ms

P50 Latency

ACL

At Chunk Level

$ curl https://api.lighton.ai/v3/search \
    -H 'Authorization: Bearer $LIGHTON_API_KEY' \
    -H 'Content-Type: application/json' \
    -d '{"query":"<YOUR_QUERY>"}'

Try it in your browser.
No install needed.

Test every endpoint in Console. Drop a file, get the response, copy the code into your project.

Open the playground

console.lighton.ai

Built to scale from first document
to enterprise deployment.

Start free with full API access, then scale with transparent usage-based pricing and plans designed for production teams and large organizations.

STARTER

For developers and builders.
‍

Free

+ PAYG

Pure usage-based - no commitment.
‍

Start for free

Read the docs

BUSINESS

For teams shipping to production.

149€

/mo + PAYG

Annual license, Business SLAs.
‍

Get started

Read the docs

ENTERPRISE

For large organizations and the public sector.

Custom

Tailored deployment, Enterprise SLAs and white-glove support.

More details

BUILT ON OPEN RESEARCH

Our retrieval models are in your dependency tree.

50M

HuggingFace Downloads

916K

PYPI installs per month

2,345

GitHub Stars

3,845

HuggingFace Likes

Open-source models in production

Empower developers with a production-ready Multimodal Retrieval API running on your infrastructure. Integrate secure reasoning into your apps (CRM, ERP) without managing the complex AI stack.

LateOn

NextPlaid

PyLate

DenseOn

LightOnOCR-2

+8 more on HF →

RAG was built for chatbots.
LightOn is built for agents.

Agents do not ask nicely. They dump raw PDFs, garbled tables, and off-domain queries into the same thread. The retrieval layer has to handle the input it gets, not the input you wish you had.

Retrieval

Hybrid retrieval

Dense, lexical, and late-interaction signals on one query. The index picks the right signal, not the developer. Built on LateOn and NextPlaid, our open-source ColBERT family.

Trust

Grounded by default

Every answer ships with the exact passage that supports it. Retrieval and reasoning are separable. Auditable by design, not retrofitted.

Infra

LLM-agnostic

Bring your own model. Open-source, commercial, or private. No lock-in on the inference layer. Your security policy dictates where inference happens.

Protocol

MCP-native

Drop LightOn into any agent that speaks Model Context Protocol. Single agent, multi-agent system, or business application integration. Same API.

Scope

Workspaces and ACLs at chunk level

Multi-agent systems need scoped corpora. Each agent gets its own workspace, its own collections, its own permissions. Access control is enforced at the chunk, not at the document. An agent never sees a single token it should not see.

Built for search

LightOn supports every modern enterprise search behavior

Massive RAG

Massive Multimodal RAG

Analyze more than just text. The engine ingests millions of files and understands complex formats: images, technical diagrams, tables, and handwritten notes with high precision.

Data Sync

Universal Data Synchronization

Connect all your knowledge silos. Seamlessly index and sync data from external sources (SharePoint, Drive, Confluence, File Servers). The platform keeps your knowledge base continuously up-to-date.

Access Control

Strict Governance & ACLs

Control exactly who sees what. We mirror native permissions (ACLs) from your sources and organize data into isolated workspaces. You guarantee strict data segregation and compliance across teams.

Imagine the Impact

How much faster could your teams move if they could instantly discover knowledge and surface insights?

Empower your organization to make data-driven decisions, move faster, and stay secure—without ever risking a data leak.

Counterparty Risk Analysis

Surface real financial exposure across subsidiaries and languages in minutes, with every number tied to a source document.

Search hundreds of contracts and purchase orders across entities and languages from a single question.

Separate panic estimates from contractually defensible numbers, backed by clause-level citations.

Confirm what does not exist: zero open commitments returned as evidence, not silence.

Read the story

Multi-Jurisdiction Contract Review

Identify which contracts actually hold up under legal scrutiny across languages, jurisdictions, and amendments.

Review supplier agreements across multiple legal systems and languages in a single query.

Surface clauses that look protective until exclusions or annexes reverse them.

Every conclusion tied to the source clause, with jurisdiction and amendment context.

Read the story

Regulatory Perimeter Mapping

Reconstruct the systems that fall inside a regulatory perimeter when the answer is spread across audits, inventories, and architecture diagrams that no one has reconciled.

Read pentest reports, OT matrices, security policies, and local audits as a single corpus.

Identify assets labeled "non-critical" that support critical operations in practice.

Surface gaps between official inventories and operational reality, with documented sources.

Read the story

Deploy with confidence

Ensures strong, reliable security and administrative controls designed to fully protect your company’s most sensitive data

Single sign-on (SSO) with domain management

Role-based access with detailed permission settings

Support for SCIM (System for Cross-domain Identity Management)

Secure hosting confidentiality and integrity of your data

Cost control flat pricing for predictable costs, with flexible plans to adapt to your usage

Support and consulting expert guidance for successful implementation

Don’t just take our word for it

Hear from some of our amazing customers who are building faster

The expertise of their tech team and the rapid evolution of the product, such as the hybrid search feature, put them at the forefront of innovation.

Jerome Lacaille - Emeritus Expert in Algorithms

Jérôme Lacaille
‍Emeritus Expert in Algorithms

Babbar needed an efficient SEO strategy enhancement through LLM technology to stay competitive in the dynamic SEO industry.

Sylvain Peyronnet - Currently working on IBOU (but still @Babbar)

Sylvain Peyronnet
‍Co-founder & search engine specialist

LightOn responded very quickly with tools that perfectly matched our needs, enhancing our document base and onboarding users without experience.

Achille Lerpinière
‍Chief Information & Technology Officer

For builders

Try Console for free

Two-minute signup.
Free tier on /parse, /extract, /search. Per-page and per-query pricing when you scale.

Start for free

Read the docs

Production RAG without the
9-month build

Parse a document. Extract any field. Retrieve with citations.
The same API key on Console, the same SDK on Enterprise.

LightOnOCR-2.
State-of-the-art parsing.

Define the schema.
Get JSON back.

Grounded retrieval
with citations.

Test every endpoint in Console. Drop a file, get the response, copy the code into your project.

Start free with full API access, then scale with transparent usage-based pricing and plans designed for production teams and large organizations.

BUILT ON OPEN RESEARCH

Agents do not ask nicely. They dump raw PDFs, garbled tables, and off-domain queries into the same thread. The retrieval layer has to handle the input it gets, not the input you wish you had.

Hybrid retrieval

Grounded by default

LLM-agnostic

MCP-native

Workspaces and ACLs at chunk level

Built for search

Conversational Search & Q&A

Massive Multimodal RAG

Agentic Reasoning Chains

Custom Specialized Agents

Universal Data Synchronization

Strict Governance & ACLs

Imagine the Impact

Counterparty Risk Analysis

Multi-Jurisdiction Contract Review

Regulatory Perimeter Mapping

Deploy with confidence

Don’t just take our word for it

For builders

Try Console for free

For builders

Talk to an AI architect

Parse a document. Extract any field. Retrieve with citations. The same API key on Console, the same SDK on Enterprise.

LightOnOCR-2.State-of-the-art parsing.

Define the schema. Get JSON back.

Grounded retrievalwith citations.

Test every endpoint in Console. Drop a file, get the response, copy the code into your project.

Start free with full API access, then scale with transparent usage-based pricing and plans designed for production teams and large organizations.

BUILT ON OPEN RESEARCH

Agents do not ask nicely. They dump raw PDFs, garbled tables, and off-domain queries into the same thread. The retrieval layer has to handle the input it gets, not the input you wish you had.

Hybrid retrieval

Grounded by default

LLM-agnostic

MCP-native

Workspaces and ACLs at chunk level

Built for search

Conversational Search & Q&A

Massive Multimodal RAG

Agentic Reasoning Chains

Custom Specialized Agents

Universal Data Synchronization

Strict Governance & ACLs

Imagine the Impact

Counterparty Risk Analysis

Multi-Jurisdiction Contract Review

Regulatory Perimeter Mapping

Deploy with confidence

Don’t just take our word for it

For builders

Try Console for free

For builders

Talk to an AI architect

Parse a document. Extract any field. Retrieve with citations.
The same API key on Console, the same SDK on Enterprise.

LightOnOCR-2.
State-of-the-art parsing.

Define the schema.
Get JSON back.

Grounded retrieval
with citations.