$\"Screenshot$

\n","updatedAt":"2025-01-14T09:52:18.511Z","author":{"_id":"60107b385ac3e86b3ea4fc34","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1627505688463-60107b385ac3e86b3ea4fc34.jpeg","fullname":"Daniel van Strien","name":"davanstrien","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":838,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5048063397407532},"editors":["davanstrien"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1627505688463-60107b385ac3e86b3ea4fc34.jpeg"],"reactions":[],"isReport":false}},{"id":"6787107b070e6f68b031a758","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-01-15T01:33:47.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training](https://huggingface.co/papers/2411.15207) (2024)\n* [UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities](https://huggingface.co/papers/2412.10372) (2024)\n* [BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models](https://huggingface.co/papers/2411.15232) (2024)\n* [MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives](https://huggingface.co/papers/2501.04184) (2025)\n* [DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment](https://huggingface.co/papers/2412.16334) (2024)\n* [MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation](https://huggingface.co/papers/2501.04155) (2025)\n* [OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining](https://huggingface.co/papers/2411.15421) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-01-15T01:33:47.571Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7264588475227356},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["davanstrien"],"count":1}],"isReport":false}},{"id":"67d8faccfba4ffa9a21bcb08","author":{"_id":"65ac61120844d9e0d67a9f89","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ac61120844d9e0d67a9f89/wvXAMRs44oww2M57ZJGqT.jpeg","fullname":"Min Woo Sun","name":"minwoosun","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":23,"isUserFollowing":false},"createdAt":"2025-03-18T04:47:08.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Project page: https://minwoosun.github.io/biomedica-website/\nPreprint: https://arxiv.org/pdf/2501.07171\nData: https://huggingface.co/BIOMEDICA\nData ETL repo: https://github.com/minwoosun/biomedica-etl\nTraining repo: https://github.com/Ale9806/open_clip_with_biomedica\n","html":"

Project page: https://minwoosun.github.io/biomedica-website/
Preprint: https://arxiv.org/pdf/2501.07171
Data: https://huggingface.co/BIOMEDICA
Data ETL repo: https://github.com/minwoosun/biomedica-etl
Training repo: https://github.com/Ale9806/open_clip_with_biomedica

\n","updatedAt":"2025-03-18T04:47:08.189Z","author":{"_id":"65ac61120844d9e0d67a9f89","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ac61120844d9e0d67a9f89/wvXAMRs44oww2M57ZJGqT.jpeg","fullname":"Min Woo Sun","name":"minwoosun","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":23,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5404133796691895},"editors":["minwoosun"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65ac61120844d9e0d67a9f89/wvXAMRs44oww2M57ZJGqT.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2501.07171","authors":[{"_id":"678603c0c054524aa22be7e1","user":{"_id":"643c3fb526f177a3e4181e05","avatarUrl":"/avatars/4f08ca15c5fbd2f8bb3d5c24f3fcc7b8.svg","isPro":false,"fullname":"Alejandro Lozano","user":"lozanoe","type":"user"},"name":"Alejandro Lozano","status":"extracted_pending","statusLastChangedAt":"2025-01-14T06:27:14.763Z","hidden":false},{"_id":"678603c0c054524aa22be7e2","user":{"_id":"65ac61120844d9e0d67a9f89","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ac61120844d9e0d67a9f89/wvXAMRs44oww2M57ZJGqT.jpeg","isPro":false,"fullname":"Min Woo Sun","user":"minwoosun","type":"user"},"name":"Min Woo Sun","status":"extracted_confirmed","statusLastChangedAt":"2025-01-14T06:45:53.756Z","hidden":false},{"_id":"678603c0c054524aa22be7e3","user":{"_id":"650871aeb44445e9b3625c7b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/650871aeb44445e9b3625c7b/mtx3EnkuNF4z29IosnhaQ.png","isPro":false,"fullname":"James Burgess","user":"jmhb","type":"user"},"name":"James Burgess","status":"claimed_verified","statusLastChangedAt":"2025-01-15T08:49:11.461Z","hidden":false},{"_id":"678603c0c054524aa22be7e4","user":{"_id":"62b67da0f56de4396ca9e44b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1658586059273-62b67da0f56de4396ca9e44b.jpeg","isPro":false,"fullname":"Liangyu Chen","user":"liangyuch","type":"user"},"name":"Liangyu Chen","status":"claimed_verified","statusLastChangedAt":"2025-01-15T08:49:09.051Z","hidden":false},{"_id":"678603c0c054524aa22be7e5","user":{"_id":"6500e67e13f1546526bd373a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6500e67e13f1546526bd373a/o9ZzGx3tVpusD-g35snK7.jpeg","isPro":true,"fullname":"Jeff Nirschl","user":"jnirschl","type":"user"},"name":"Jeffrey J Nirschl","status":"extracted_pending","statusLastChangedAt":"2025-01-14T06:27:14.763Z","hidden":false},{"_id":"678603c0c054524aa22be7e6","name":"Jeffrey Gu","hidden":false},{"_id":"678603c0c054524aa22be7e7","user":{"_id":"653fe33e73325383c2c9f9ba","avatarUrl":"/avatars/9f8863fbbc613915c70e620b7a147e4d.svg","isPro":false,"fullname":"Ivan Lopez","user":"ivlopez","type":"user"},"name":"Ivan Lopez","status":"claimed_verified","statusLastChangedAt":"2025-01-15T08:49:06.626Z","hidden":false},{"_id":"678603c0c054524aa22be7e8","user":{"_id":"66844ab799dbd7c30a19a0fe","avatarUrl":"/avatars/5a5f8fa530e127105812bd1d0e515db8.svg","isPro":false,"fullname":"Josiah Aklilu","user":"josaklil-ai","type":"user"},"name":"Josiah Aklilu","status":"claimed_verified","statusLastChangedAt":"2025-01-16T21:55:50.124Z","hidden":false},{"_id":"678603c0c054524aa22be7e9","user":{"_id":"6786fb7ade13ae3961c03d4d","avatarUrl":"/avatars/156658b3c24283a564472001dec63e5a.svg","isPro":false,"fullname":"Austin Katzer","user":"Awkatzer","type":"user"},"name":"Austin Wolfgang Katzer","status":"extracted_pending","statusLastChangedAt":"2025-01-15T00:04:26.115Z","hidden":false},{"_id":"678603c0c054524aa22be7ea","name":"Collin Chiu","hidden":false},{"_id":"678603c0c054524aa22be7eb","user":{"_id":"6733c7f1d3521b36243b3c91","avatarUrl":"/avatars/a070c5296fbce770822b1897157ddbe3.svg","isPro":false,"fullname":"Anita","user":"anitar03","type":"user"},"name":"Anita Rau","status":"claimed_verified","statusLastChangedAt":"2025-02-18T09:35:17.795Z","hidden":false},{"_id":"678603c0c054524aa22be7ec","user":{"_id":"65703fab7f50602340d23704","avatarUrl":"/avatars/324c45f5fba9cd8c38a89b30427c06b4.svg","isPro":false,"fullname":"Xiaohan Wang","user":"nicholswang","type":"user"},"name":"Xiaohan Wang","status":"claimed_verified","statusLastChangedAt":"2025-01-16T08:31:48.670Z","hidden":false},{"_id":"678603c0c054524aa22be7ed","user":{"_id":"62da55164398e21bf7f0e292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62da55164398e21bf7f0e292/xjKkG8IA2IZZqCdjApSh3.jpeg","isPro":false,"fullname":"Yuhui Zhang","user":"yuhuizhang","type":"user"},"name":"Yuhui Zhang","status":"extracted_confirmed","statusLastChangedAt":"2025-01-14T06:28:22.730Z","hidden":false},{"_id":"678603c0c054524aa22be7ee","name":"Alfred Seunghoon Song","hidden":false},{"_id":"678603c0c054524aa22be7ef","user":{"_id":"69140f06e2c745db93ded9d4","avatarUrl":"/avatars/be758741fd012728185c22d26d5ce9c4.svg","isPro":false,"fullname":"Robert Tibshirani","user":"tibshirani","type":"user"},"name":"Robert Tibshirani","status":"extracted_pending","statusLastChangedAt":"2025-11-12T14:12:07.129Z","hidden":false},{"_id":"678603c0c054524aa22be7f0","user":{"_id":"677c8b2e92550a07fcad0f50","avatarUrl":"/avatars/2be26e8f25e98cfe5b1d227ee0409cd0.svg","isPro":false,"fullname":"Serena Yeung-Levy","user":"yeunglevy","type":"user"},"name":"Serena Yeung-Levy","status":"extracted_pending","statusLastChangedAt":"2025-01-14T06:27:14.763Z","hidden":false}],"publishedAt":"2025-01-13T09:58:03.000Z","submittedOnDailyAt":"2025-01-14T07:22:18.497Z","title":"BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and\n Vision-Language Models Derived from Scientific Literature","submittedOnDailyBy":{"_id":"60107b385ac3e86b3ea4fc34","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1627505688463-60107b385ac3e86b3ea4fc34.jpeg","isPro":true,"fullname":"Daniel van Strien","user":"davanstrien","type":"user"},"summary":"The development of vision-language models (VLMs) is driven by large-scale and\ndiverse multimodal datasets. However, progress toward generalist biomedical\nVLMs is limited by the lack of annotated, publicly accessible datasets across\nbiology and medicine. Existing efforts are restricted to narrow domains,\nmissing the full diversity of biomedical knowledge encoded in scientific\nliterature. To address this gap, we introduce BIOMEDICA, a scalable,\nopen-source framework to extract, annotate, and serialize the entirety of the\nPubMed Central Open Access subset into an easy-to-use, publicly accessible\ndataset.Our framework produces a comprehensive archive with over 24 million\nunique image-text pairs from over 6 million articles. Metadata and\nexpert-guided annotations are also provided. We demonstrate the utility and\naccessibility of our resource by releasing BMCA-CLIP, a suite of CLIP-style\nmodels continuously pre-trained on the BIOMEDICA dataset via streaming,\neliminating the need to download 27 TB of data locally.On average, our models\nachieve state-of-the-art performance across 40 tasks - spanning pathology,\nradiology, ophthalmology, dermatology, surgery, molecular biology,\nparasitology, and cell biology - excelling in zero-shot classification with a\n6.56% average improvement (as high as 29.8% and 17.5% in dermatology and\nophthalmology, respectively), and stronger image-text retrieval, all while\nusing 10x less compute. To foster reproducibility and collaboration, we release\nour codebase and dataset for the broader research community.","upvotes":55,"discussionId":"678603c2c054524aa22be8d5","projectPage":"https://minwoosun.github.io/biomedica-website/","githubRepo":"https://github.com/minwoosun/biomedica-etl","githubRepoAddedBy":"user","ai_summary":"BIOMEDICA is a scalable framework that creates a large, publicly accessible dataset of image-text pairs from PubMed Central, enabling state-of-the-art performance across various biomedical tasks with efficient CLIP-style models.","ai_keywords":["vision-language models","VLMs","multimodal datasets","BIOMEDICA","PubMed Central Open Access","image-text pairs","Metadata","expert-guided annotations","BMCA-CLIP","CLIP-style models","streaming","zero-shot classification","image-text retrieval"],"githubStars":90},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"60107b385ac3e86b3ea4fc34","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1627505688463-60107b385ac3e86b3ea4fc34.jpeg","isPro":true,"fullname":"Daniel van Strien","user":"davanstrien","type":"user"},{"_id":"650871aeb44445e9b3625c7b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/650871aeb44445e9b3625c7b/mtx3EnkuNF4z29IosnhaQ.png","isPro":false,"fullname":"James Burgess","user":"jmhb","type":"user"},{"_id":"62b67da0f56de4396ca9e44b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1658586059273-62b67da0f56de4396ca9e44b.jpeg","isPro":false,"fullname":"Liangyu Chen","user":"liangyuch","type":"user"},{"_id":"648c9605565e3a44f3c9bb7b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648c9605565e3a44f3c9bb7b/W5chvk17Zol6-2QSWkFVR.jpeg","isPro":true,"fullname":"Orr Zohar","user":"orrzohar","type":"user"},{"_id":"6733c7f1d3521b36243b3c91","avatarUrl":"/avatars/a070c5296fbce770822b1897157ddbe3.svg","isPro":false,"fullname":"Anita","user":"anitar03","type":"user"},{"_id":"62da55164398e21bf7f0e292","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62da55164398e21bf7f0e292/xjKkG8IA2IZZqCdjApSh3.jpeg","isPro":false,"fullname":"Yuhui Zhang","user":"yuhuizhang","type":"user"},{"_id":"65703fab7f50602340d23704","avatarUrl":"/avatars/324c45f5fba9cd8c38a89b30427c06b4.svg","isPro":false,"fullname":"Xiaohan Wang","user":"nicholswang","type":"user"},{"_id":"65c4287b6b3715a9cf28ded9","avatarUrl":"/avatars/2af7d2c64665cb5283398084628c1701.svg","isPro":false,"fullname":"Mark Endo","user":"markendo","type":"user"},{"_id":"6786f6df73e88336f26fee47","avatarUrl":"/avatars/7a4b7a8f06dd75b696bb43c5b0d5de76.svg","isPro":false,"fullname":"Joshua Lazaro","user":"jola7","type":"user"},{"_id":"66844ab799dbd7c30a19a0fe","avatarUrl":"/avatars/5a5f8fa530e127105812bd1d0e515db8.svg","isPro":false,"fullname":"Josiah Aklilu","user":"josaklil-ai","type":"user"},{"_id":"6786fb7ade13ae3961c03d4d","avatarUrl":"/avatars/156658b3c24283a564472001dec63e5a.svg","isPro":false,"fullname":"Austin Katzer","user":"Awkatzer","type":"user"},{"_id":"64c1c77c245c55a21c6f5a13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c1c77c245c55a21c6f5a13/d9zlSksf3TxWpBbb-r0fd.jpeg","isPro":false,"fullname":"Reza Sayar","user":"Reza2kn","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">

Papers

arxiv:2501.07171

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Published on Jan 13, 2025

· Submitted by

Daniel van Strien on Jan 14, 2025

Upvote

Authors:

Alejandro Lozano ,

Min Woo Sun ,

James Burgess ,

Liangyu Chen ,

Jeffrey J Nirschl ,

Ivan Lopez ,

Josiah Aklilu ,

Austin Wolfgang Katzer ,

Anita Rau ,

Xiaohan Wang ,

Yuhui Zhang ,

Robert Tibshirani ,

Serena Yeung-Levy

Abstract

BIOMEDICA is a scalable framework that creates a large, publicly accessible dataset of image-text pairs from PubMed Central, enabling state-of-the-art performance across various biomedical tasks with efficient CLIP-style models.

AI-generated summary

The development of vision-language models (VLMs) is driven by large-scale and diverse multimodal datasets. However, progress toward generalist biomedical VLMs is limited by the lack of annotated, publicly accessible datasets across biology and medicine. Existing efforts are restricted to narrow domains, missing the full diversity of biomedical knowledge encoded in scientific literature. To address this gap, we introduce BIOMEDICA, a scalable, open-source framework to extract, annotate, and serialize the entirety of the PubMed Central Open Access subset into an easy-to-use, publicly accessible dataset.Our framework produces a comprehensive archive with over 24 million unique image-text pairs from over 6 million articles. Metadata and expert-guided annotations are also provided. We demonstrate the utility and accessibility of our resource by releasing BMCA-CLIP, a suite of CLIP-style models continuously pre-trained on the BIOMEDICA dataset via streaming, eliminating the need to download 27 TB of data locally.On average, our models achieve state-of-the-art performance across 40 tasks - spanning pathology, radiology, ophthalmology, dermatology, surgery, molecular biology, parasitology, and cell biology - excelling in zero-shot classification with a 6.56% average improvement (as high as 29.8% and 17.5% in dermatology and ophthalmology, respectively), and stronger image-text retrieval, all while using 10x less compute. To foster reproducibility and collaboration, we release our codebase and dataset for the broader research community.