Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for
Biomedical and Clinical NLP
@thomas-sounack\n\t did you modified btw. some pretraining parameters compared to the original ModernBERT? I am thinking of the RoPE theta for example.\n","updatedAt":"2025-06-13T06:47:09.213Z","author":{"_id":"5e6a3d4ea9afd5125d9ec064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","fullname":"Stefan Schweter","name":"stefan-it","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3686,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7275164723396301},"editors":["stefan-it"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"684c812bfc1554739305b8b6","author":{"_id":"66f44d6d1aec548e0b83656c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66f44d6d1aec548e0b83656c/n8aw-zPT3mqUoJ5mtqsYp.jpeg","fullname":"Thomas Sounack","name":"thomas-sounack","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"createdAt":"2025-06-13T19:51:07.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Hi @stefan-it, thanks for the feedback!\nWe did not modify RoPE theta. Overall, our training hyperparameters were very similar, the only thing that we changed is lowering the masking ratio during the decay phase (referred to as Phase 2 in our paper).\nThis is due to the nature of the WSD schedule of ModernBERT, you can take any checkpoint to continue training on your data without cold restart issues, but the training parameters should be similar.","html":"
Hi \n\n@stefan-it\n\t, thanks for the feedback! We did not modify RoPE theta. Overall, our training hyperparameters were very similar, the only thing that we changed is lowering the masking ratio during the decay phase (referred to as Phase 2 in our paper). This is due to the nature of the WSD schedule of ModernBERT, you can take any checkpoint to continue training on your data without cold restart issues, but the training parameters should be similar.
\n","updatedAt":"2025-06-13T19:51:07.322Z","author":{"_id":"66f44d6d1aec548e0b83656c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66f44d6d1aec548e0b83656c/n8aw-zPT3mqUoJ5mtqsYp.jpeg","fullname":"Thomas Sounack","name":"thomas-sounack","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9336704015731812},"editors":["thomas-sounack"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/66f44d6d1aec548e0b83656c/n8aw-zPT3mqUoJ5mtqsYp.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"684bc96d44725ce6f738ca74"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2506.10896","authors":[{"_id":"684b8bc53b733ba333686ff2","user":{"_id":"66f44d6d1aec548e0b83656c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66f44d6d1aec548e0b83656c/n8aw-zPT3mqUoJ5mtqsYp.jpeg","isPro":false,"fullname":"Thomas Sounack","user":"thomas-sounack","type":"user"},"name":"Thomas Sounack","status":"claimed_verified","statusLastChangedAt":"2025-06-14T06:24:35.991Z","hidden":false},{"_id":"684b8bc53b733ba333686ff3","name":"Joshua Davis","hidden":false},{"_id":"684b8bc53b733ba333686ff4","name":"Brigitte Durieux","hidden":false},{"_id":"684b8bc53b733ba333686ff5","user":{"_id":"609bbe2f4932693ca2009d6a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1620819560688-609bbe2f4932693ca2009d6a.jpeg","isPro":false,"fullname":"Antoine Chaffin","user":"NohTow","type":"user"},"name":"Antoine Chaffin","status":"claimed_verified","statusLastChangedAt":"2025-07-16T15:16:02.960Z","hidden":false},{"_id":"684b8bc53b733ba333686ff6","name":"Tom J. Pollard","hidden":false},{"_id":"684b8bc53b733ba333686ff7","name":"Eric Lehman","hidden":false},{"_id":"684b8bc53b733ba333686ff8","name":"Alistair E. W. Johnson","hidden":false},{"_id":"684b8bc53b733ba333686ff9","name":"Matthew McDermott","hidden":false},{"_id":"684b8bc53b733ba333686ffa","name":"Tristan Naumann","hidden":false},{"_id":"684b8bc53b733ba333686ffb","name":"Charlotta Lindvall","hidden":false}],"publishedAt":"2025-06-12T17:01:11.000Z","title":"BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for\n Biomedical and Clinical NLP","summary":"Encoder-based transformer models are central to biomedical and clinical\nNatural Language Processing (NLP), as their bidirectional self-attention makes\nthem well-suited for efficiently extracting structured information from\nunstructured text through discriminative tasks. However, encoders have seen\nslower development compared to decoder models, leading to limited domain\nadaptation in biomedical and clinical settings. We introduce BioClinical\nModernBERT, a domain-adapted encoder that builds on the recent ModernBERT\nrelease, incorporating long-context processing and substantial improvements in\nspeed and performance for biomedical and clinical NLP. BioClinical ModernBERT\nis developed through continued pretraining on the largest biomedical and\nclinical corpus to date, with over 53.5 billion tokens, and addresses a key\nlimitation of prior clinical encoders by leveraging 20 datasets from diverse\ninstitutions, domains, and geographic regions, rather than relying on data from\na single source. It outperforms existing biomedical and clinical encoders on\nfour downstream tasks spanning a broad range of use cases. We release both base\n(150M parameters) and large (396M parameters) versions of BioClinical\nModernBERT, along with training checkpoints to support further research.","upvotes":4,"discussionId":"684b8bc53b733ba333686ffc","githubRepo":"https://github.com/lindvalllab/bioclinical-modernbert","githubRepoAddedBy":"auto","ai_summary":"BioClinical ModernBERT, an encoder-based transformer model, enhances biomedical and clinical NLP through continued pretraining, long-context processing, and improvements in speed and performance across diverse datasets and tasks.","ai_keywords":["encoder-based transformer models","bidirectional self-attention","domain adaptation","biomedical and clinical NLP","long-context processing","ModernBERT","pretraining","downstream tasks"],"githubStars":28},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"5e6a3d4ea9afd5125d9ec064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","isPro":true,"fullname":"Stefan Schweter","user":"stefan-it","type":"user"},{"_id":"61a3e384dc4cca1fb6096cd3","avatarUrl":"/avatars/5d3dc8d14fe19032ac9301a60d642d1d.svg","isPro":false,"fullname":"Bill Donaldson-Zurita","user":"BillD","type":"user"},{"_id":"67127cff4efd82fdaa7da0fd","avatarUrl":"/avatars/15c686c5c3c90bbca2a955ad8b16f372.svg","isPro":false,"fullname":"Avatansh Malaviya","user":"AvtnshM","type":"user"},{"_id":"6042b2759e905013ae8715ce","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6042b2759e905013ae8715ce/2nZ1ObhgoQKmC6-1Tw-ON.png","isPro":false,"fullname":"Michael Feldman","user":"mfeldman143","type":"user"}],"acceptLanguages":["*"]}">
BioClinical ModernBERT, an encoder-based transformer model, enhances biomedical and clinical NLP through continued pretraining, long-context processing, and improvements in speed and performance across diverse datasets and tasks.
AI-generated summary
Encoder-based transformer models are central to biomedical and clinical
Natural Language Processing (NLP), as their bidirectional self-attention makes
them well-suited for efficiently extracting structured information from
unstructured text through discriminative tasks. However, encoders have seen
slower development compared to decoder models, leading to limited domain
adaptation in biomedical and clinical settings. We introduce BioClinical
ModernBERT, a domain-adapted encoder that builds on the recent ModernBERT
release, incorporating long-context processing and substantial improvements in
speed and performance for biomedical and clinical NLP. BioClinical ModernBERT
is developed through continued pretraining on the largest biomedical and
clinical corpus to date, with over 53.5 billion tokens, and addresses a key
limitation of prior clinical encoders by leveraging 20 datasets from diverse
institutions, domains, and geographic regions, rather than relying on data from
a single source. It outperforms existing biomedical and clinical encoders on
four downstream tasks spanning a broad range of use cases. We release both base
(150M parameters) and large (396M parameters) versions of BioClinical
ModernBERT, along with training checkpoints to support further research.
Hi @stefan-it, thanks for the feedback! We did not modify RoPE theta. Overall, our training hyperparameters were very similar, the only thing that we changed is lowering the masking ratio during the decay phase (referred to as Phase 2 in our paper). This is due to the nature of the WSD schedule of ModernBERT, you can take any checkpoint to continue training on your data without cold restart issues, but the training parameters should be similar.