Make Your Enterprise Knowledge
Agent-Readable
The retrieval infrastructure your teams and AI agents can trust
For builders
Ship retrieval in your product
Free tier. /parse, /extract, /search behind one API key.
# 4 lines to structured Markdown
curl https://api.lighton.ai/v3/parse \
    -H 'Authorization: Bearer $LIGHTON_API_KEY' \
   -H 'Content-Type: application/json'\
    -D '{"document":"https://console-examples.lighton.ai/AFD-091005-062.pdf"}'
curl https://api.lighton.ai/v3/parse \
    -H 'Authorization: Bearer $LIGHTON_API_KEY' \
   -H 'Content-Type: application/json'\
    -D '{"document":"https://console-examples.lighton.ai/AFD-091005-062.pdf"}'
FOR REGULATED ORGANIZATIONS
Deploy document intelligence inside your walls
VPC, on-prem, or air-gapped. SecNumCloud-ready. SSO, SCIM, audit logs built in.
Zero outbound calls
0.002€
VPC / on-prem / air-gap
Per-server pricing
0.002€
predictable TCO
Native ACL mirroring
0.002€
Sharepoint, Drive, Confluence
In production at
Three endpoints.
Zero pipeline maintenance.
Zero pipeline maintenance.
Parse a document. Extract any field. Retrieve with citations.
The same API key on Console, the same SDK on Enterprise.
LightOnOCR-2.
State-of-the-art parsing.
Turns scans, tables, handwriting, and multi-column layouts into structured Markdown. 20+ languages natively. The parsing engine behind every retrieval workflow.
83.2
OLMOCR-BENCH (SOTA)
€0.002
Per Page
20+
Native language
Open
Weights on HuggingFace
$ curl https://api.lighton.ai/v1/parse \ Â Â
    -H 'Authorization: Bearer $LIGHTON_API_KEY' \
   -H 'Content-Type: application/json'\
    -D '{"document":"https://console-examples.lighton.ai/AFD-091005-062.pdf"}'
    -H 'Authorization: Bearer $LIGHTON_API_KEY' \
   -H 'Content-Type: application/json'\
    -D '{"document":"https://console-examples.lighton.ai/AFD-091005-062.pdf"}'
Define the schema.
Get JSON back.
Pull any field, entity, or key-value pair you care about. Invoice numbers, lease end dates, claim IDs, contract clauses. You define the schema; LightOn returns structured JSON.
JSON Schema
In / Out
€0.004
Per Page
Async
Via Webhooks
Cited
Per Field
$ curl https://api.lighton.ai/v1/extract \ Â Â
    -H 'Authorization: Bearer $LIGHTON_API_KEY' \ Â
    -H 'Content-Type: application/json' \  Â
    -d ''{"document":"https://console-examples.lighton.ai/AFD-091005-062.pdf","schema":"<YOUR_JSON_SCHEMA>"}'
    -H 'Authorization: Bearer $LIGHTON_API_KEY' \ Â
    -H 'Content-Type: application/json' \  Â
    -d ''{"document":"https://console-examples.lighton.ai/AFD-091005-062.pdf","schema":"<YOUR_JSON_SCHEMA>"}'
Grounded retrieval
with citations.
One query, three signals: dense, sparse, late-interaction. The index picks the right signal, not the developer. Every result ships with the source passage that produced it. Built on LateOn and NextPlaid, our open-source ColBERT family.
Multi-vector
Dense + Sparse + Li
€0.006
Per Query
<200ms
P50 Latency
ACL
At Chunk Level
$ curl https://api.lighton.ai/v1/search \
  -H 'Authorization: Bearer $LIGHTON_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"query":"<YOUR_QUERY>"}'
  -H 'Authorization: Bearer $LIGHTON_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"query":"<YOUR_QUERY>"}'
RAG was built for chatbots.
LightOn is built for agents.
LightOn is built for agents.
Agents do not ask nicely. They dump raw PDFs, garbled tables, and off-domain queries into the same thread. The retrieval layer has to handle the input it gets, not the input you wish you had.
Retrieval
Hybrid retrieval
Dense, lexical, and late-interaction signals on one query. The index picks the right signal, not the developer. Built on LateOn and NextPlaid, our open-source ColBERT family.
Trust
Grounded by default
Every answer ships with the exact passage that supports it. Retrieval and reasoning are separable. Auditable by design, not retrofitted.
Infra
LLM-agnostic
Bring your own model. Open-source, commercial, or private. No lock-in on the inference layer. Your security policy dictates where inference happens.
Protocol
MCP-native
Drop LightOn into any agent that speaks Model Context Protocol. Single agent, multi-agent system, or business application integration. Same API.
Scope
Workspaces and ACLs at chunk level
Multi-agent systems need scoped corpora. Each agent gets its own workspace, its own collections, its own permissions. Access control is enforced at the chunk, not at the document. An agent never sees a single token it should not see.
Imagine the Impact
How much faster could your teams move if they could instantly discover knowledge and surface insights?
Empower your organization to make data-driven decisions, move faster, and stay secure—without ever risking a data leak.


Counterparty Risk Analysis
Surface real financial exposure across subsidiaries and languages in minutes, with every number tied to a source document.
Search hundreds of contracts and purchase orders across entities and languages from a single question.
Separate panic estimates from contractually defensible numbers, backed by clause-level citations.
Confirm what does not exist: zero open commitments returned as evidence, not silence.
Read the story


Multi-Jurisdiction Contract Review
Identify which contracts actually hold up under legal scrutiny across languages, jurisdictions, and amendments.
Review supplier agreements across multiple legal systems and languages in a single query.
Surface clauses that look protective until exclusions or annexes reverse them.
Every conclusion tied to the source clause, with jurisdiction and amendment context.
Read the story


Regulatory Perimeter Mapping
Reconstruct the systems that fall inside a regulatory perimeter when the answer is spread across audits, inventories, and architecture diagrams that no one has reconciled.
Read pentest reports, OT matrices, security policies, and local audits as a single corpus.
Identify assets labeled "non-critical" that support critical operations in practice.
Surface gaps between official inventories and operational reality, with documented sources.
Read the story
Built for search
LightOn supports every modern enterprise search behavior
Chat Search


Conversational Search & Q&A
Users ask questions, not keywords. Switch instantly between retrieving a list of documents or getting a synthesized answer backed by precise, clickable citations.
Massive RAG


Massive Multimodal RAG
Analyze more than just text. The engine ingests millions of files and understands complex formats: images, technical diagrams, tables, and handwritten notes with high precision.
Tool Chaining


Agentic Reasoning Chains
Execute complex tasks, not just search. For multi-step requests (e.g., "Find info + Cross-reference HR + Generate graph"), the AI autonomously chains tools and sources to deliver a complete result.
Team Agents


Custom Specialized Agents
Create dedicated experts for every team. Empower users to build custom agents with specific prompts and restricted document scopes tailored to their role
Data Sync


Universal Data Synchronization
Connect all your knowledge silos. Seamlessly index and sync data from external sources (SharePoint, Drive, Confluence, File Servers). The platform keeps your knowledge base continuously up-to-date.
Access Control


Strict Governance & ACLs
Control exactly who sees what. We mirror native permissions (ACLs) from your sources and organize data into isolated workspaces. You guarantee strict data segregation and compliance across teams.
Deploy with confidence
Ensures strong, reliable security and administrative controls designed to fully protect your company’s most sensitive data
Single sign-on (SSO) with domain management
Role-based access with detailed permission settings
Support for SCIM (System for Cross-domain Identity Management)
Secure hosting confidentiality and integrity of your data
Cost control flat pricing for predictable costs, with flexible plans to adapt to your usage
Support and consulting expert guidance for successful implementation
BUILT ON OPEN RESEARCH
Our retrieval models are in your dependency tree.
Empower developers with a production-ready Multimodal RAG API running on your infrastructure. Integrate secure reasoning into your apps (CRM, ERP) without managing the complex AI stack.
50M
HuggingFace Downloads
916K
PYPI installs per mounth
2,345
GitHub Stars
3,845
HuggingFace Likes
Open-source models in production
Don’t just take our word for it
Hear from some of our amazing customers who are building faster

The expertise of their tech team and the rapid evolution of the product, such as the hybrid search feature, put them at the forefront of innovation.

Jérôme Lacaille
‍Emeritus Expert in Algorithms
‍Emeritus Expert in Algorithms
%201.png)

Babbar needed an efficient SEO strategy enhancement through LLM technology to stay competitive in the dynamic SEO industry.

Sylvain Peyronnet
‍Co-founder & search engine specialist
‍Co-founder & search engine specialist


LightOn responded very quickly with tools that perfectly matched our needs, enhancing our document base and onboarding users without experience.

Achille Lerpinière
‍Chief Information & Technology Officer
‍Chief Information & Technology Officer
For builders
Try Console for free
Two-minute signup.
Free tier on /parse, /extract, /search. Per-page and per-query pricing when you scale.
Free tier on /parse, /extract, /search. Per-page and per-query pricing when you scale.
For builders
Talk to an AI architect
VPC, on-prem, or air-gapped deployment. SSO, SCIM, ACL mirroring. GDPR, SOC 2, AI Act-ready.
