Research And Development
R&D
Overview
.avif)
Advancing Generative AI through Innovation
The R&D team at LightOn plays a pivotal role in advancing the field of generative AI through continuous innovation and development. Their expertise spans across creating and fine-tuning large language models (LLMs) that form the backbone of the Paradigm platform, a comprehensive AI solution designed for enterprise use. This platform simplifies the integration of generative AI into business workflows, offering both on-premise and cloud options to ensure flexibility and scalability for various business needs.​
r&d publicationsRecent R&D Posts

Announcing BioClinical ModernBERT: a new SOTA encoder model for Medical NLP
The recent release of ModernBERT by LightOn and AnswerAI aims at providing the best base model that can be then used in different industry verticals. Today, Thomas Sounack from the Dana-Farber Cancer Institute in collaboration with researchers at Harvard University, LightOn, MIT, McGill University, Albany Medical College and Microsoft Research, used this capability and trained a new State-Of-The-Art (SOTA) medical encoder named BioClinical ModernBERT.
CTA Title
Lorem Ipsum

LightOn Unlocks Agentic RAG with new SOTA Model Reason-ModernColBERT
After the recent release of GTE-ModernColBERT exhibiting state-of-the-art results on long-context retrieval, LightOn announces a major leap forward in AI-driven knowledge discovery: Reason-ModernColBERT, an open-source multi-vector model purpose-built for Deep Research applications.
CTA Title
Lorem Ipsum

LightOn Releases GTE-ModernColBERT, First State-of-the-Art Late-Interaction Model Trained on PyLate!
LightOn is proud to announce the release of GTE-ModernColBERT, our new state-of-the-art, open-source, multi-vector retrieval model. By leveraging ModernBERT architecture and our innovative PyLate library, we've created a solution that sets a new milestone in the field and addresses the complex challenges of modern enterprise information retrieval.
CTA Title
Lorem Ipsum

Finally, a Replacement for BERT
This blog post introduces ModernBERT, a family of state-of-the-art encoder-only models representing improvements over older generation encoders across the board.
CTA Title
Lorem Ipsum
.png)
MonoQwen-Vision, the first visual document reranker
We introduce MonoQwen2-VL-v0.1, the first visual document reranker to enhance the quality of the retrieved visual documents and take these pipelines to the next level. Reranking a small number of candidates with MonoQwen2-VL-v0.1 achieve top results on the ViDoRe leaderboard.
CTA Title
Lorem Ipsum

FC-AMF-OCR Dataset : LightOn releases a 9.3 million images OCR dataset to improve real world document parsing
With over 9.3 million annotated images, this dataset offers researchers and AI developers a valuable resource for creating models adapted to real world documents.
CTA Title
Lorem Ipsum

PyLate: Flexible Training and Retrieval for ColBERT Models
We release PyLate, a new user-friendly library for training and experimenting with ColBERT models, a family of models that exhibit strong retrieval capabilities on out-of-domain data.
CTA Title
Lorem Ipsum
CTA Title
Lorem Ipsum
Explore Publications by LightOn

.avif)