Web Analytics Made Easy - Statcounter

Research And Development

R&D
Overview

Advancing Generative AI through Innovation

The R&D team at LightOn plays a pivotal role in advancing the field of generative AI through continuous innovation and development. Their expertise spans across creating and fine-tuning large language models (LLMs) that form the backbone of the Paradigm platform, a comprehensive AI solution designed for enterprise use. This platform simplifies the integration of generative AI into business workflows, offering both on-premise and cloud options to ensure flexibility and scalability for various business needs.​

r&d publications

Recent R&D Posts

Read post

Announcing BioClinical ModernBERT: a new SOTA encoder model for Medical NLP

The recent release of ModernBERT by LightOn and AnswerAI aims at providing the best base model that can be then used in different industry verticals. Today, Thomas Sounack from the Dana-Farber Cancer Institute in collaboration with researchers at Harvard University, LightOn, MIT, McGill University, Albany Medical College and Microsoft Research, used this capability and trained a new State-Of-The-Art (SOTA) medical encoder named BioClinical ModernBERT.

June 13, 2025
R&D

CTA Title

Lorem Ipsum

Read post

LightOn Unlocks Agentic RAG with new SOTA Model Reason-ModernColBERT

After the recent release of GTE-ModernColBERT exhibiting state-of-the-art results on long-context retrieval, LightOn announces a major leap forward in AI-driven knowledge discovery: Reason-ModernColBERT, an open-source multi-vector model purpose-built for Deep Research applications.

May 22, 2025
R&D

CTA Title

Lorem Ipsum

Read post

LightOn Releases GTE-ModernColBERT, First State-of-the-Art Late-Interaction Model Trained on PyLate!

LightOn is proud to announce the release of GTE-ModernColBERT, our new state-of-the-art, open-source, multi-vector retrieval model. By leveraging ModernBERT architecture and our innovative PyLate library, we've created a solution that sets a new milestone in the field and addresses the complex challenges of modern enterprise information retrieval.

April 30, 2025
R&D

CTA Title

Lorem Ipsum

Read post

Finally, a Replacement for BERT

This blog post introduces ModernBERT, a family of state-of-the-art encoder-only models representing improvements over older generation encoders across the board.

December 19, 2024
R&D

CTA Title

Lorem Ipsum

Read post

MonoQwen-Vision, the first visual document reranker

We introduce MonoQwen2-VL-v0.1, the first visual document reranker to enhance the quality of the retrieved visual documents and take these pipelines to the next level. Reranking a small number of candidates with MonoQwen2-VL-v0.1 achieve top results on the ViDoRe leaderboard.

November 7, 2024
R&D

CTA Title

Lorem Ipsum

Read post

FC-AMF-OCR Dataset : LightOn releases a 9.3 million images OCR dataset to improve real world document parsing

With over 9.3 million annotated images, this dataset offers researchers and AI developers a valuable resource for creating models adapted to real world documents.

September 20, 2024
R&D

CTA Title

Lorem Ipsum

Read post

PyLate: Flexible Training and Retrieval for ColBERT Models

We release PyLate, a new user-friendly library for training and experimenting with ColBERT models, a family of models that exhibit strong retrieval capabilities on out-of-domain data.

August 29, 2024
R&D

CTA Title

Lorem Ipsum

Read post

ArabicWeb24: Creating a high quality Arabic Web-only pre-training dataset

August 7, 2024
R&D

CTA Title

Lorem Ipsum

Read post

Transforming LLMs into Agents for Enterprise Automation

Developing Agentic Capabilities for LLMs to automate business workflows and create smart assistants.

June 25, 2024
R&D

CTA Title

Lorem Ipsum

Explore Publications by LightOn