TL;DR
.png)
Efficient Continued Pre-Training, Streamlined for Medicine
One critical but lesser-known scheduling feature of ModernBERT allows researchers a seamless continued pre-training while eliminating cold restarts. Stable-phase checkpoints and a decay phase contribute to having models that can efficiently converge on specific domains.
Leveraging this scheduling feature, Thomas Sounack continued the pre-training of ModernBERT on an extensive collection of medical texts. The result: BioClinical ModernBERT, a new model that outperforms all existing encoders on medical classification and Named Entity Recognition (NER) tasks, setting a new SOTA benchmark for medical NLP applications.
Optimized for the Realities of Clinical Context
Real-world medical texts can be very long; they span full clinical notes and as well as large reports. BioClinical ModernBERT’s ModernBERT backbone provides long-context document support, with hybrid attention and unpadding mechanisms for rapid processing, crucial for healthcare and clinical workflows.
A Recipe for Continued Pre-Training
Beyond the model itself, this experience refines continued pre-training for domain adaptation. This approach is reproducible: BioClinical ModernBERT demonstrates robust transfer to new domains, opening the door for anyone seeking to tailor ModernBERT for their own specialized data.
Try It Out
Interested in leveraging continued pre-training or ModernBERT’s long-context expertise in a different domain? Explore the BioClinical ModernBERT collection and see how it can advance specialized NLP tasks in your domain.