Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and
Vision-Language Models Derived from Scientific Literature
\n","updatedAt":"2025-01-14T09:52:18.511Z","author":{"_id":"60107b385ac3e86b3ea4fc34","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1627505688463-60107b385ac3e86b3ea4fc34.jpeg","fullname":"Daniel van Strien","name":"davanstrien","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":838,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5048063397407532},"editors":["davanstrien"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1627505688463-60107b385ac3e86b3ea4fc34.jpeg"],"reactions":[],"isReport":false}},{"id":"6787107b070e6f68b031a758","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-01-15T01:33:47.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training](https://huggingface.co/papers/2411.15207) (2024)\n* [UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities](https://huggingface.co/papers/2412.10372) (2024)\n* [BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models](https://huggingface.co/papers/2411.15232) (2024)\n* [MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives](https://huggingface.co/papers/2501.04184) (2025)\n* [DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment](https://huggingface.co/papers/2412.16334) (2024)\n* [MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation](https://huggingface.co/papers/2501.04155) (2025)\n* [OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining](https://huggingface.co/papers/2411.15421) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
\n","updatedAt":"2025-03-18T04:47:08.189Z","author":{"_id":"65ac61120844d9e0d67a9f89","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ac61120844d9e0d67a9f89/wvXAMRs44oww2M57ZJGqT.jpeg","fullname":"Min Woo Sun","name":"minwoosun","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":23,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5404133796691895},"editors":["minwoosun"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65ac61120844d9e0d67a9f89/wvXAMRs44oww2M57ZJGqT.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2501.07171","authors":[{"_id":"678603c0c054524aa22be7e1","user":{"_id":"643c3fb526f177a3e4181e05","avatarUrl":"/avatars/4f08ca15c5fbd2f8bb3d5c24f3fcc7b8.svg","isPro":false,"fullname":"Alejandro Lozano","user":"lozanoe","type":"user"},"name":"Alejandro Lozano","status":"extracted_pending","statusLastChangedAt":"2025-01-14T06:27:14.763Z","hidden":false},{"_id":"678603c0c054524aa22be7e2","user":{"_id":"65ac61120844d9e0d67a9f89","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ac61120844d9e0d67a9f89/wvXAMRs44oww2M57ZJGqT.jpeg","isPro":false,"fullname":"Min Woo Sun","user":"minwoosun","type":"user"},"name":"Min Woo Sun","status":"extracted_confirmed","statusLastChangedAt":"2025-01-14T06:45:53.756Z","hidden":false},{"_id":"678603c0c054524aa22be7e3","user":{"_id":"650871aeb44445e9b3625c7b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/650871aeb44445e9b3625c7b/mtx3EnkuNF4z29IosnhaQ.png","isPro":false,"fullname":"James Burgess","user":"jmhb","type":"user"},"name":"James Burgess","status":"claimed_verified","statusLastChangedAt":"2025-01-15T08:49:11.461Z","hidden":false},{"_id":"678603c0c054524aa22be7e4","user":{"_id":"62b67da0f56de4396ca9e44b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1658586059273-62b67da0f56de4396ca9e44b.jpeg","isPro":false,"fullname":"Liangyu Chen","user":"liangyuch","type":"user"},"name":"Liangyu Chen","status":"claimed_verified","statusLastChangedAt":"2025-01-15T08:49:09.051Z","hidden":false},{"_id":"678603c0c054524aa22be7e5","user":{"_id":"6500e67e13f1546526bd373a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6500e67e13f1546526bd373a/o9ZzGx3tVpusD-g35snK7.jpeg","isPro":true,"fullname":"Jeff Nirschl","user":"jnirschl","type":"user"},"name":"Jeffrey J Nirschl","status":"extracted_pending","statusLastChangedAt":"2025-01-14T06:27:14.763Z","hidden":false},{"_id":"678603c0c054524aa22be7e6","name":"Jeffrey Gu","hidden":false},{"_id":"678603c0c054524aa22be7e7","user":{"_id":"653fe33e73325383c2c9f9ba","avatarUrl":"/avatars/9f8863fbbc613915c70e620b7a147e4d.svg","isPro":false,"fullname":"Ivan Lopez","user":"ivlopez","type":"user"},"name":"Ivan Lopez","status":"claimed_verified","statusLastChangedAt":"2025-01-15T08:49:06.626Z","hidden":false},{"_id":"678603c0c054524aa22be7e8","user":{"_id":"66844ab799dbd7c30a19a0fe","avatarUrl":"/avatars/5a5f8fa530e127105812bd1d0e515db8.svg","isPro":false,"fullname":"Josiah Aklilu","user":"josaklil-ai","type":"user"},"name":"Josiah Aklilu","status":"claimed_verified","statusLastChangedAt":"2025-01-16T21:55:50.124Z","hidden":false},{"_id":"678603c0c054524aa22be7e9","user":{"_id":"6786fb7ade13ae3961c03d4d","avatarUrl":"/avatars/156658b3c24283a564472001dec63e5a.svg","isPro":false,"fullname":"Austin Katzer","user":"Awkatzer","type":"user"},"name":"Austin Wolfgang Katzer","status":"extracted_pending","statusLastChangedAt":"2025-01-15T00:04:26.115Z","hidden":false},{"_id":"678603c0c054524aa22be7ea","name":"Collin Chiu","hidden":false},{"_id":"678603c0c054524aa22be7eb","user":{"_id":"6733c7f1d3521b36243b3c91","avatarUrl":"/avatars/a070c5296fbce770822b1897157ddbe3.svg","isPro":false,"fullname":"Anita","user":"anitar03","type":"user"},"name":"Anita Rau","status":"claimed_verified","statusLastChangedAt":"2025-02-18T09:35:17.795Z","hidden":false},{"_id":"678603c0c054524aa22be7ec","user":{"_id":"65703fab7f50602340d23704","avatarUrl":"/avatars/324c45f5fba9cd8c38a89b30427c06b4.svg","isPro":false,"fullname":"Xiaohan Wang","user":"nicholswang","type":"user"},"name":"Xiaohan Wang","status":"claimed_verified","statusLastChangedAt":"2025-01-16T08:31:48.670Z","hidden":false},{"_id":"678603c0c054524aa22be7ed","user":{"_id":"62da55164398e21bf7f0e292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62da55164398e21bf7f0e292/xjKkG8IA2IZZqCdjApSh3.jpeg","isPro":false,"fullname":"Yuhui Zhang","user":"yuhuizhang","type":"user"},"name":"Yuhui Zhang","status":"extracted_confirmed","statusLastChangedAt":"2025-01-14T06:28:22.730Z","hidden":false},{"_id":"678603c0c054524aa22be7ee","name":"Alfred Seunghoon Song","hidden":false},{"_id":"678603c0c054524aa22be7ef","user":{"_id":"69140f06e2c745db93ded9d4","avatarUrl":"/avatars/be758741fd012728185c22d26d5ce9c4.svg","isPro":false,"fullname":"Robert Tibshirani","user":"tibshirani","type":"user"},"name":"Robert Tibshirani","status":"extracted_pending","statusLastChangedAt":"2025-11-12T14:12:07.129Z","hidden":false},{"_id":"678603c0c054524aa22be7f0","user":{"_id":"677c8b2e92550a07fcad0f50","avatarUrl":"/avatars/2be26e8f25e98cfe5b1d227ee0409cd0.svg","isPro":false,"fullname":"Serena Yeung-Levy","user":"yeunglevy","type":"user"},"name":"Serena Yeung-Levy","status":"extracted_pending","statusLastChangedAt":"2025-01-14T06:27:14.763Z","hidden":false}],"publishedAt":"2025-01-13T09:58:03.000Z","submittedOnDailyAt":"2025-01-14T07:22:18.497Z","title":"BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and\n Vision-Language Models Derived from Scientific Literature","submittedOnDailyBy":{"_id":"60107b385ac3e86b3ea4fc34","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1627505688463-60107b385ac3e86b3ea4fc34.jpeg","isPro":true,"fullname":"Daniel van Strien","user":"davanstrien","type":"user"},"summary":"The development of vision-language models (VLMs) is driven by large-scale and\ndiverse multimodal datasets. However, progress toward generalist biomedical\nVLMs is limited by the lack of annotated, publicly accessible datasets across\nbiology and medicine. Existing efforts are restricted to narrow domains,\nmissing the full diversity of biomedical knowledge encoded in scientific\nliterature. To address this gap, we introduce BIOMEDICA, a scalable,\nopen-source framework to extract, annotate, and serialize the entirety of the\nPubMed Central Open Access subset into an easy-to-use, publicly accessible\ndataset.Our framework produces a comprehensive archive with over 24 million\nunique image-text pairs from over 6 million articles. Metadata and\nexpert-guided annotations are also provided. We demonstrate the utility and\naccessibility of our resource by releasing BMCA-CLIP, a suite of CLIP-style\nmodels continuously pre-trained on the BIOMEDICA dataset via streaming,\neliminating the need to download 27 TB of data locally.On average, our models\nachieve state-of-the-art performance across 40 tasks - spanning pathology,\nradiology, ophthalmology, dermatology, surgery, molecular biology,\nparasitology, and cell biology - excelling in zero-shot classification with a\n6.56% average improvement (as high as 29.8% and 17.5% in dermatology and\nophthalmology, respectively), and stronger image-text retrieval, all while\nusing 10x less compute. To foster reproducibility and collaboration, we release\nour codebase and dataset for the broader research community.","upvotes":55,"discussionId":"678603c2c054524aa22be8d5","projectPage":"https://minwoosun.github.io/biomedica-website/","githubRepo":"https://github.com/minwoosun/biomedica-etl","githubRepoAddedBy":"user","ai_summary":"BIOMEDICA is a scalable framework that creates a large, publicly accessible dataset of image-text pairs from PubMed Central, enabling state-of-the-art performance across various biomedical tasks with efficient CLIP-style models.","ai_keywords":["vision-language models","VLMs","multimodal datasets","BIOMEDICA","PubMed Central Open Access","image-text pairs","Metadata","expert-guided annotations","BMCA-CLIP","CLIP-style models","streaming","zero-shot classification","image-text retrieval"],"githubStars":90},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"60107b385ac3e86b3ea4fc34","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1627505688463-60107b385ac3e86b3ea4fc34.jpeg","isPro":true,"fullname":"Daniel van Strien","user":"davanstrien","type":"user"},{"_id":"650871aeb44445e9b3625c7b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/650871aeb44445e9b3625c7b/mtx3EnkuNF4z29IosnhaQ.png","isPro":false,"fullname":"James Burgess","user":"jmhb","type":"user"},{"_id":"62b67da0f56de4396ca9e44b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1658586059273-62b67da0f56de4396ca9e44b.jpeg","isPro":false,"fullname":"Liangyu Chen","user":"liangyuch","type":"user"},{"_id":"648c9605565e3a44f3c9bb7b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648c9605565e3a44f3c9bb7b/W5chvk17Zol6-2QSWkFVR.jpeg","isPro":true,"fullname":"Orr Zohar","user":"orrzohar","type":"user"},{"_id":"6733c7f1d3521b36243b3c91","avatarUrl":"/avatars/a070c5296fbce770822b1897157ddbe3.svg","isPro":false,"fullname":"Anita","user":"anitar03","type":"user"},{"_id":"62da55164398e21bf7f0e292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62da55164398e21bf7f0e292/xjKkG8IA2IZZqCdjApSh3.jpeg","isPro":false,"fullname":"Yuhui Zhang","user":"yuhuizhang","type":"user"},{"_id":"65703fab7f50602340d23704","avatarUrl":"/avatars/324c45f5fba9cd8c38a89b30427c06b4.svg","isPro":false,"fullname":"Xiaohan Wang","user":"nicholswang","type":"user"},{"_id":"65c4287b6b3715a9cf28ded9","avatarUrl":"/avatars/2af7d2c64665cb5283398084628c1701.svg","isPro":false,"fullname":"Mark Endo","user":"markendo","type":"user"},{"_id":"6786f6df73e88336f26fee47","avatarUrl":"/avatars/7a4b7a8f06dd75b696bb43c5b0d5de76.svg","isPro":false,"fullname":"Joshua Lazaro","user":"jola7","type":"user"},{"_id":"66844ab799dbd7c30a19a0fe","avatarUrl":"/avatars/5a5f8fa530e127105812bd1d0e515db8.svg","isPro":false,"fullname":"Josiah Aklilu","user":"josaklil-ai","type":"user"},{"_id":"6786fb7ade13ae3961c03d4d","avatarUrl":"/avatars/156658b3c24283a564472001dec63e5a.svg","isPro":false,"fullname":"Austin Katzer","user":"Awkatzer","type":"user"},{"_id":"64c1c77c245c55a21c6f5a13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c1c77c245c55a21c6f5a13/d9zlSksf3TxWpBbb-r0fd.jpeg","isPro":false,"fullname":"Reza Sayar","user":"Reza2kn","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
BIOMEDICA is a scalable framework that creates a large, publicly accessible dataset of image-text pairs from PubMed Central, enabling state-of-the-art performance across various biomedical tasks with efficient CLIP-style models.
AI-generated summary
The development of vision-language models (VLMs) is driven by large-scale and
diverse multimodal datasets. However, progress toward generalist biomedical
VLMs is limited by the lack of annotated, publicly accessible datasets across
biology and medicine. Existing efforts are restricted to narrow domains,
missing the full diversity of biomedical knowledge encoded in scientific
literature. To address this gap, we introduce BIOMEDICA, a scalable,
open-source framework to extract, annotate, and serialize the entirety of the
PubMed Central Open Access subset into an easy-to-use, publicly accessible
dataset.Our framework produces a comprehensive archive with over 24 million
unique image-text pairs from over 6 million articles. Metadata and
expert-guided annotations are also provided. We demonstrate the utility and
accessibility of our resource by releasing BMCA-CLIP, a suite of CLIP-style
models continuously pre-trained on the BIOMEDICA dataset via streaming,
eliminating the need to download 27 TB of data locally.On average, our models
achieve state-of-the-art performance across 40 tasks - spanning pathology,
radiology, ophthalmology, dermatology, surgery, molecular biology,
parasitology, and cell biology - excelling in zero-shot classification with a
6.56% average improvement (as high as 29.8% and 17.5% in dermatology and
ophthalmology, respectively), and stronger image-text retrieval, all while
using 10x less compute. To foster reproducibility and collaboration, we release
our codebase and dataset for the broader research community.