Web Analytics Made Easy - Statcounter

Research And Development

R&D
Overview

Advancing Generative AI through Innovation

The R&D team at LightOn plays a pivotal role in advancing the field of generative AI through continuous innovation and development. Their expertise spans across creating and fine-tuning large language models (LLMs) that form the backbone of the Paradigm platform, a comprehensive AI solution designed for enterprise use. This platform simplifies the integration of generative AI into business workflows, offering both on-premise and cloud options to ensure flexibility and scalability for various business needs.​

r&d publications

Recent R&D Posts

Read post

DuckSearch: search through Hugging Face datasets

DuckSearch is a lightweight Python library built on DuckDB, designed for efficient document search and filtering with Hugging Face datasets and standard documents.

October 3, 2024
R&D

CTA Title

Lorem Ipsum

Read post

FC-AMF-OCR Dataset : LightOn releases a 9.3 million images OCR dataset to improve real world document parsing

With over 9.3 million annotated images, this dataset offers researchers and AI developers a valuable resource for creating models adapted to real world documents.

September 20, 2024
R&D

CTA Title

Lorem Ipsum

Read post

PyLate: Flexible Training and Retrieval for ColBERT Models

We release PyLate, a new user-friendly library for training and experimenting with ColBERT models, a family of models that exhibit strong retrieval capabilities on out-of-domain data.

August 29, 2024
R&D

CTA Title

Lorem Ipsum

Read post

ArabicWeb24: Creating a high quality Arabic Web-only pre-training dataset

August 7, 2024
R&D

CTA Title

Lorem Ipsum

Read post

Training Mamba Models on AMD MI250/MI250X GPUs with Custom Kernels

In this blogpost we show how we can train a Mamba model interchangeably on both NVIDIA and AMD and we compare both training performance and convergence in both cases. This shows that our training stack is becoming more GPU-agnostic.

July 19, 2024
R&D

CTA Title

Lorem Ipsum

Read post

Transforming LLMs into Agents for Enterprise Automation

Developing Agentic Capabilities for LLMs to automate business workflows and create smart assistants.

June 25, 2024
R&D

CTA Title

Lorem Ipsum

Read post

Passing the Torch: Training a Mamba Model for Smooth Handover

We present our explorations on training language models based on the new Mamba architecture, which deviates from the traditional Transformer architecture.

April 10, 2024
R&D

CTA Title

Lorem Ipsum

Read post

LightOn AI Meetup Creating a Large Dataset for Pretraining LLMs

March 22, 2024
R&D

CTA Title

Lorem Ipsum

Read post

Introducing Alfred-40B-1023:

Pioneering the Future of Open-Source Language Model from LightOn

November 17, 2023
R&D

CTA Title

Lorem Ipsum

Explore Publications by LightOn