Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets
[go: Go Back, main page]

https://huggingface.co/blog/MaziyarPanahi/open-health-ai

\n

\"ChatGPT\n

\n","updatedAt":"2025-08-05T09:38:08.047Z","author":{"_id":"5fd5e18a90b6dc4633f6d292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png","fullname":"Maziyar Panahi","name":"MaziyarPanahi","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4506,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.5023902654647827},"editors":["MaziyarPanahi"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png"],"reactions":[],"isReport":false}},{"id":"6894332a0635aa71297278d3","author":{"_id":"5fd5e18a90b6dc4633f6d292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png","fullname":"Maziyar Panahi","name":"MaziyarPanahi","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4506,"isUserFollowing":false},"createdAt":"2025-08-07T05:01:30.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Meet OpenMed NER! The Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets! πŸ”₯ ","html":"

Meet OpenMed NER! The Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets! πŸ”₯

\n","updatedAt":"2025-08-07T05:02:39.139Z","author":{"_id":"5fd5e18a90b6dc4633f6d292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png","fullname":"Maziyar Panahi","name":"MaziyarPanahi","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4506,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.7501212954521179},"editors":["MaziyarPanahi"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png"],"reactions":[],"isReport":false}},{"id":"6895553d3f0a136784d66ba8","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-08-08T01:39:09.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP](https://huggingface.co/papers/2506.10896) (2025)\n* [Towards Domain Specification of Embedding Models in Medicine](https://huggingface.co/papers/2507.19407) (2025)\n* [GeistBERT: Breathing Life into German NLP](https://huggingface.co/papers/2506.11903) (2025)\n* [The Impact of LoRA Adapters on LLMs for Clinical Text Classification Under Computational and Data Constraints](https://huggingface.co/papers/2407.19299) (2024)\n* [Effective Multi-Task Learning for Biomedical Named Entity Recognition](https://huggingface.co/papers/2507.18542) (2025)\n* [NDAI-NeuroMAP: A Neuroscience-Specific Embedding Model for Domain-Specific Retrieval](https://huggingface.co/papers/2507.03329) (2025)\n* [Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content](https://huggingface.co/papers/2506.20331) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-08-08T01:39:09.642Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7277949452400208},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"πŸ”₯","users":["MaziyarPanahi"],"count":1}],"isReport":false}},{"id":"689572cf79a421e867f33ddb","author":{"_id":"64c1c77c245c55a21c6f5a13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c1c77c245c55a21c6f5a13/d9zlSksf3TxWpBbb-r0fd.jpeg","fullname":"Reza Sayar","name":"Reza2kn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":90,"isUserFollowing":false},"createdAt":"2025-08-08T03:45:19.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Wonderful! πŸ™β€οΈ","html":"

Wonderful! πŸ™β€οΈ

\n","updatedAt":"2025-08-08T03:45:19.516Z","author":{"_id":"64c1c77c245c55a21c6f5a13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c1c77c245c55a21c6f5a13/d9zlSksf3TxWpBbb-r0fd.jpeg","fullname":"Reza Sayar","name":"Reza2kn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":90,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5113980174064636},"editors":["Reza2kn"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64c1c77c245c55a21c6f5a13/d9zlSksf3TxWpBbb-r0fd.jpeg"],"reactions":[{"reaction":"❀️","users":["MaziyarPanahi"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2508.01630","authors":[{"_id":"68916f8df01a094725f8347f","user":{"_id":"5fd5e18a90b6dc4633f6d292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png","isPro":true,"fullname":"Maziyar Panahi","user":"MaziyarPanahi","type":"user"},"name":"Maziyar Panahi","status":"claimed_verified","statusLastChangedAt":"2025-08-06T19:22:17.859Z","hidden":false}],"publishedAt":"2025-08-03T07:33:28.000Z","submittedOnDailyAt":"2025-08-07T03:31:30.498Z","title":"OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers\n for Biomedical NER Across 12 Public Datasets","submittedOnDailyBy":{"_id":"5fd5e18a90b6dc4633f6d292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png","isPro":true,"fullname":"Maziyar Panahi","user":"MaziyarPanahi","type":"user"},"summary":"Named-entity recognition (NER) is fundamental to extracting structured\ninformation from the >80% of healthcare data that resides in unstructured\nclinical notes and biomedical literature. Despite recent advances with large\nlanguage models, achieving state-of-the-art performance across diverse entity\ntypes while maintaining computational efficiency remains a significant\nchallenge. We introduce OpenMed NER, a suite of open-source, domain-adapted\ntransformer models that combine lightweight domain-adaptive pre-training (DAPT)\nwith parameter-efficient Low-Rank Adaptation (LoRA). Our approach performs\ncost-effective DAPT on a 350k-passage corpus compiled from ethically sourced,\npublicly available research repositories and de-identified clinical notes\n(PubMed, arXiv, and MIMIC-III) using DeBERTa-v3, PubMedBERT, and BioELECTRA\nbackbones. This is followed by task-specific fine-tuning with LoRA, which\nupdates less than 1.5% of model parameters. We evaluate our models on 12\nestablished biomedical NER benchmarks spanning chemicals, diseases, genes, and\nspecies. OpenMed NER achieves new state-of-the-art micro-F1 scores on 10 of\nthese 12 datasets, with substantial gains across diverse entity types. Our\nmodels advance the state-of-the-art on foundational disease and chemical\nbenchmarks (e.g., BC5CDR-Disease, +2.70 pp), while delivering even larger\nimprovements of over 5.3 and 9.7 percentage points on more specialized gene and\nclinical cell line corpora. This work demonstrates that strategically adapted\nopen-source models can surpass closed-source solutions. This performance is\nachieved with remarkable efficiency: training completes in under 12 hours on a\nsingle GPU with a low carbon footprint (< 1.2 kg CO2e), producing permissively\nlicensed, open-source checkpoints designed to help practitioners facilitate\ncompliance with emerging data protection and AI regulations, such as the EU AI\nAct.","upvotes":14,"discussionId":"68916f8df01a094725f83480","projectPage":"https://huggingface.co/OpenMed","ai_summary":"OpenMed NER, a suite of open-source transformer models using DAPT and LoRA, achieves state-of-the-art performance on diverse biomedical NER benchmarks with high efficiency and low computational cost.","ai_keywords":["transformer models","lightweight domain-adaptive pre-training (DAPT)","parameter-efficient Low-Rank Adaptation (LoRA)","DeBERTa-v3","PubMedBERT","BioELECTRA","micro-F1 scores","BC5CDR-Disease","gene","clinical cell line corpora","open-source checkpoints","EU AI Act"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"5fd5e18a90b6dc4633f6d292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png","isPro":true,"fullname":"Maziyar Panahi","user":"MaziyarPanahi","type":"user"},{"_id":"648257b92ffe54f7625e0716","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/ad4qli2AiGNOzm-uOpKT7.png","isPro":true,"fullname":"Laurian Gridinoc","user":"gridinoc","type":"user"},{"_id":"61e52be53d6dbb1da842316a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e52be53d6dbb1da842316a/gx0WGPcOCClXPymoKglc4.jpeg","isPro":false,"fullname":"Bârje Karlsson","user":"tellarin","type":"user"},{"_id":"64c1c77c245c55a21c6f5a13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c1c77c245c55a21c6f5a13/d9zlSksf3TxWpBbb-r0fd.jpeg","isPro":false,"fullname":"Reza Sayar","user":"Reza2kn","type":"user"},{"_id":"62a3bb1cd0d8c2c2169f0b88","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62a3bb1cd0d8c2c2169f0b88/eT2TS0IlQbZtz-F_zHLz9.jpeg","isPro":true,"fullname":"Joseph [open/acc] Pollack","user":"Tonic","type":"user"},{"_id":"64df3ad6a9bcacc18bc0606a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/s3kpJyOf7NwO-tHEpRcok.png","isPro":false,"fullname":"Carlos","user":"Carlosvirella100","type":"user"},{"_id":"6042b2759e905013ae8715ce","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6042b2759e905013ae8715ce/2nZ1ObhgoQKmC6-1Tw-ON.png","isPro":false,"fullname":"Michael Feldman","user":"mfeldman143","type":"user"},{"_id":"66154bc1b3d0b21da5479cfa","avatarUrl":"/avatars/0bf7d4d06339f34b83e743cf0ff5b8f7.svg","isPro":false,"fullname":"GuusBouwens","user":"GuusBouwensNL","type":"user"},{"_id":"65fd9b9b56b028c92b262dd1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65fd9b9b56b028c92b262dd1/uSgQ8SbYPscAQMmP5MmQV.png","isPro":false,"fullname":"Milad Khormaee","user":"Itz-Amethyst","type":"user"},{"_id":"67ca576043c7c58185fcad52","avatarUrl":"/avatars/52c9555466be8c47e21eca980da6df63.svg","isPro":false,"fullname":"john smith","user":"k249","type":"user"},{"_id":"64c03a78e56520a63d289cd0","avatarUrl":"/avatars/bcb4cf5de457af853c7e401ce2600a75.svg","isPro":false,"fullname":"Georgi Grazhdanski","user":"grazh","type":"user"},{"_id":"66e6a55dc8bad3f3855148a3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66e6a55dc8bad3f3855148a3/2cgj6EQC58Kzn9j4x2VGh.png","isPro":true,"fullname":"Timothy Hunter","user":"IAMRonHIT","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2508.01630

OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets

Published on Aug 3, 2025
Β· Submitted by
Maziyar Panahi
on Aug 7, 2025
Authors:

Abstract

OpenMed NER, a suite of open-source transformer models using DAPT and LoRA, achieves state-of-the-art performance on diverse biomedical NER benchmarks with high efficiency and low computational cost.

AI-generated summary

Named-entity recognition (NER) is fundamental to extracting structured information from the >80% of healthcare data that resides in unstructured clinical notes and biomedical literature. Despite recent advances with large language models, achieving state-of-the-art performance across diverse entity types while maintaining computational efficiency remains a significant challenge. We introduce OpenMed NER, a suite of open-source, domain-adapted transformer models that combine lightweight domain-adaptive pre-training (DAPT) with parameter-efficient Low-Rank Adaptation (LoRA). Our approach performs cost-effective DAPT on a 350k-passage corpus compiled from ethically sourced, publicly available research repositories and de-identified clinical notes (PubMed, arXiv, and MIMIC-III) using DeBERTa-v3, PubMedBERT, and BioELECTRA backbones. This is followed by task-specific fine-tuning with LoRA, which updates less than 1.5% of model parameters. We evaluate our models on 12 established biomedical NER benchmarks spanning chemicals, diseases, genes, and species. OpenMed NER achieves new state-of-the-art micro-F1 scores on 10 of these 12 datasets, with substantial gains across diverse entity types. Our models advance the state-of-the-art on foundational disease and chemical benchmarks (e.g., BC5CDR-Disease, +2.70 pp), while delivering even larger improvements of over 5.3 and 9.7 percentage points on more specialized gene and clinical cell line corpora. This work demonstrates that strategically adapted open-source models can surpass closed-source solutions. This performance is achieved with remarkable efficiency: training completes in under 12 hours on a single GPU with a low carbon footprint (< 1.2 kg CO2e), producing permissively licensed, open-source checkpoints designed to help practitioners facilitate compliance with emerging data protection and AI regulations, such as the EU AI Act.

Community

Paper author Paper submitter
β€’
edited Aug 5, 2025
Paper author Paper submitter
β€’
edited Aug 7, 2025

Meet OpenMed NER! The Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets! πŸ”₯

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Wonderful! πŸ™β€οΈ

Sign up or log in to comment

Models citing this paper 475

Browse 475 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.01630 in a dataset README.md to link it from this page.

Spaces citing this paper 4

Collections including this paper 5