Research And Development

R&D Overview

Advancing Generative AI through Innovation

The R&D team at LightOn plays a pivotal role in advancing the field of generative AI through continuous innovation and development. Their expertise spans across creating and fine-tuning large language models (LLMs) that form the backbone of the Paradigm platform, a comprehensive AI solution designed for enterprise use. This platform simplifies the integration of generative AI into business workflows, offering both on-premise and cloud options to ensure flexibility and scalability for various business needs.

r&d publications

Recent R&D Posts

Read post

Making Knowledge Machine-Readable

Introducing LightOnOCR-1B, a 1B parameter vision language model for OCR that pushes the Pareto frontier.

October 23, 2025

CTA Title

Lorem Ipsum

Read post

LightOn’s Multi-Vector Retrieval Revolution: From Research to Production

Discover how LightOn’s late-interaction stack, ModernBERT, PyLate, and FastPlaid,is transforming semantic search and AI retrieval from academic theory into real-world production systems

August 25, 2025

CTA Title

Lorem Ipsum

Read post

FastPlaid: Bringing Multi-Vector Search to Production Scale

FastPlaid is LightOn’s open-source Rust engine for late-interaction retrieval. Version 1.10.0 adds incrementally-updatable indexes—6.5× faster than Stanford PLAID—so your RAG, recommender or search pipeline can evolve in real time without downtime.

August 14, 2025

CTA Title

Lorem Ipsum

Read post

Introducing Ettin Suite: the SoTA open recipe to outperform existing Generative & Retrieval Models

Introducing Ettin, the first ever SOTA suite of paired encoder & decoder models, developed by Johns Hopkins University in collaboration with LightOn.

July 16, 2025

CTA Title

Lorem Ipsum

Read post

PyLate-rs: a lightweight tool to compute lightning-fast embeddings

PyLate-rs is a high-performance inference engine for PyLate models, meticulously crafted in Rust for optimal speed and efficiency.

July 8, 2025

CTA Title

Lorem Ipsum

Read post

Announcing BioClinical ModernBERT: a new SOTA encoder model for Medical NLP

The recent release of ModernBERT by LightOn and AnswerAI aims at providing the best base model that can be then used in different industry verticals. Today, Thomas Sounack from the Dana-Farber Cancer Institute in collaboration with researchers at Harvard University, LightOn, MIT, McGill University, Albany Medical College and Microsoft Research, used this capability and trained a new State-Of-The-Art (SOTA) medical encoder named BioClinical ModernBERT.

June 13, 2025

CTA Title

Lorem Ipsum

Read post

LightOn Unlocks Agentic RAG with new SOTA Model Reason-ModernColBERT

After the recent release of GTE-ModernColBERT exhibiting state-of-the-art results on long-context retrieval, LightOn announces a major leap forward in AI-driven knowledge discovery: Reason-ModernColBERT, an open-source multi-vector model purpose-built for Deep Research applications.

May 22, 2025

CTA Title

Lorem Ipsum

Read post

LightOn Releases GTE-ModernColBERT, First State-of-the-Art Late-Interaction Model Trained on PyLate!

LightOn is proud to announce the release of GTE-ModernColBERT, our new state-of-the-art, open-source, multi-vector retrieval model. By leveraging ModernBERT architecture and our innovative PyLate library, we've created a solution that sets a new milestone in the field and addresses the complex challenges of modern enterprise information retrieval.

April 30, 2025

CTA Title

Lorem Ipsum

Read post

ModernBERT: Finally, a Replacement for BERT

This blog post introduces ModernBERT, a family of state-of-the-art encoder-only models representing improvements over older generation encoders across the board.

December 19, 2024

CTA Title

Lorem Ipsum

Read post

Explore Publications by LightOn

publications

R&D Overview

Advancing Generative AI through Innovation

Recent R&D Posts

Making Knowledge Machine-Readable

CTA Title

LightOn’s Multi-Vector Retrieval Revolution: From Research to Production

CTA Title

FastPlaid: Bringing Multi-Vector Search to Production Scale

CTA Title

Introducing Ettin Suite: the SoTA open recipe to outperform existing Generative & Retrieval Models

CTA Title

PyLate-rs: a lightweight tool to compute lightning-fast embeddings

CTA Title

Announcing BioClinical ModernBERT: a new SOTA encoder model for Medical NLP

CTA Title

LightOn Unlocks Agentic RAG with new SOTA Model Reason-ModernColBERT

CTA Title

LightOn Releases GTE-ModernColBERT, First State-of-the-Art Late-Interaction Model Trained on PyLate!

CTA Title

ModernBERT: Finally, a Replacement for BERT

CTA Title

Explore Publications by LightOn

R&D Overview