Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
biglam (BigLAM: BigScience Libraries, Archives and Museums)
[go: Go Back, main page]

Hugging Face Hub\n
  • 🤖 Train and release open-source models for LAM-relevant tasks
  • \n
  • 🛠️ Develop tools and approaches tailored to LAM use cases
  • \n\n
    \n
    \n✨ Background\n\n

    BigLAM began as a datasets hackathon within the BigScience 🌸 project, a large-scale, open NLP collaboration.

    \n

    Our goal: make LAM datasets more discoverable and usable to support researchers, institutions, and ML practitioners working with cultural heritage data.

    \n
    \n\n\n
    \n📂 What You'll Find\n\n

    The BigLAM organization hosts:

    \n
      \n
    • Datasets: image, text, and tabular data from and about libraries, archives, and museums
    • \n
    • Models: fine-tuned for tasks like:
        \n
      • Art/historical image classification
      • \n
      • Document layout analysis and OCR
      • \n
      • Metadata quality assessment
      • \n
      • Named entity recognition in heritage texts
      • \n
      \n
    • \n
    • Spaces: tools for interactive exploration and demonstration
    \n\n
    \n🧩 Get Involved\n\n

    We welcome contributions! You can:

    \n
      \n
    • Use our datasets and models
    • \n
    • Join the discussion on GitHub
    • \n
    • Contribute your own tools or data
    • \n
    • Share your work using BigLAM resources
    \n\n

    🌍 Why It Matters

    \n

    Cultural heritage data is often underrepresented in machine learning. BigLAM helps address this by:

    \n
      \n
    • Supporting inclusive and responsible AI
    • \n
    • Helping institutions experiment with ML for access, discovery, and preservation
    • \n
    • Ensuring that ML systems reflect diverse human knowledge and expression
    • \n
    • Developing tools and methods that work well with the unique formats, values, and needs of LAMs
    • \n
    \n","classNames":"hf-sanitized hf-sanitized-FtPWIfqMvF_S2YxPp8OjD"},"users":[{"_id":"5fbfd09ee366524fe8e97cd3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1606406298765-noauth.jpeg","isPro":false,"fullname":"Albert Villanova del Moral","user":"albertvillanova","type":"user"},{"_id":"60107b385ac3e86b3ea4fc34","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1627505688463-60107b385ac3e86b3ea4fc34.jpeg","isPro":true,"fullname":"Daniel van Strien","user":"davanstrien","type":"user"},{"_id":"5e70f6048ce3c604d78fe133","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e70f6048ce3c604d78fe133/KjoeCm3tDvc7EScNAgCDR.jpeg","isPro":false,"fullname":"Christopher Akiki","user":"christopher","type":"user"},{"_id":"606e83590d1a4dd23956b892","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1644166187396-606e83590d1a4dd23956b892.jpeg","isPro":false,"fullname":"Francesco De Toni","user":"clancyoftheoverflow","type":"user"},{"_id":"61eac837f5dbc066be45e586","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657356215739-61eac837f5dbc066be45e586.jpeg","isPro":false,"fullname":"Ali Hürriyetoğlu","user":"hurrial","type":"user"},{"_id":"5ef3829e518622264685b0cd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1593016943046-noauth.jpeg","isPro":false,"fullname":"Javier de la Rosa","user":"versae","type":"user"},{"_id":"5f7c2cbbb1a525442ff96e39","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1623134857336-5f7c2cbbb1a525442ff96e39.jpeg","isPro":false,"fullname":"Ceyda Cinarel","user":"ceyda","type":"user"},{"_id":"5fff7edf6a2a91af974298c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1637335546726-5fff7edf6a2a91af974298c8.jpeg","isPro":false,"fullname":"Shamik Bose","user":"shamikbose89","type":"user"},{"_id":"618aa75d048c2c4bf9d92103","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1654534584741-618aa75d048c2c4bf9d92103.jpeg","isPro":true,"fullname":"Mike Trizna","user":"MikeTrizna","type":"user"},{"_id":"62cc6cb1bcaa438a5db78a73","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657565522426-62cc6cb1bcaa438a5db78a73.png","isPro":false,"fullname":"Bart Ortiz","user":"thebooort","type":"user"},{"_id":"60d8cc714e32d05e574bab74","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657576229127-60d8cc714e32d05e574bab74.jpeg","isPro":false,"fullname":"Semih Korkmaz","user":"skorkmaz88","type":"user"},{"_id":"61b679dfae5ee884170156f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61b679dfae5ee884170156f0/1uxlicWaAe-giisFI9RPB.png","isPro":true,"fullname":"Andy Janco","user":"apjanco","type":"user"},{"_id":"62500f361684e0335e527bc6","avatarUrl":"/avatars/8d2bc4c9cfa8ea82049196431cc3ebea.svg","isPro":false,"fullname":"Marianna Nezhurina","user":"marianna13","type":"user"},{"_id":"5f04bd384ec31d33a72116d1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1594145966049-noauth.jpeg","isPro":false,"fullname":"Zaid Alyafeai","user":"Zaid","type":"user"},{"_id":"62cede7c860469c7f6932b16","avatarUrl":"/avatars/a826b22b09ca0da6da16b59373047cd5.svg","isPro":false,"fullname":"Librarian Rafia","user":"rafia","type":"user"},{"_id":"62d05962cf3a93c933544a87","avatarUrl":"/avatars/dbc0d40b9ea3ced2b98ab86a715b1ee9.svg","isPro":false,"fullname":"Etienne Posthumus","user":"epoz","type":"user"},{"_id":"62ceab1e54477562dcb48556","avatarUrl":"/avatars/b77d1e5610db034f77230058857f62f3.svg","isPro":false,"fullname":"Ben Schmidt","user":"benmschmidt","type":"user"},{"_id":"62d5100c5654195611dbc85b","avatarUrl":"/avatars/e7911ac2c6668b2ac9eec3898fe93f68.svg","isPro":false,"fullname":"Ayesha Shafique","user":"aieeshashafique","type":"user"},{"_id":"62ced3cb9b96f22525b9953c","avatarUrl":"/avatars/62c4dae444434932f864ad2003b9bc72.svg","isPro":false,"fullname":"Mia Ridge","user":"miah","type":"user"},{"_id":"60d35154d7b174177faabd55","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1645712223620-60d35154d7b174177faabd55.jpeg","isPro":true,"fullname":"Théo Gigant","user":"gigant","type":"user"},{"_id":"622f6d82163b4ef6fd7a542d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1650643897119-622f6d82163b4ef6fd7a542d.jpeg","isPro":false,"fullname":"Clemens Neudecker","user":"cneud","type":"user"},{"_id":"62dfe9481d35a6676b614035","avatarUrl":"/avatars/2b1d61cafc9999dc1f106503ff103ee4.svg","isPro":false,"fullname":"Giles Bergel","user":"GilesBergel","type":"user"},{"_id":"62e3f4123c63d87610902678","avatarUrl":"/avatars/5da59d66ea27fe6cc9917d8d4e3083f5.svg","isPro":false,"fullname":"Eric Lease Morgan","user":"ericleasemorgan","type":"user"},{"_id":"6057b943bf611753d3101532","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6057b943bf611753d3101532/KagCnqMxTOHYxFdzc-oAl.jpeg","isPro":false,"fullname":"Sarah","user":"sarahciston","type":"user"},{"_id":"62ed0d39856857215e06323a","avatarUrl":"/avatars/6d807c68b27c299175c8cff4f5936109.svg","isPro":false,"fullname":"Yves Maurer","user":"ymaurer","type":"user"},{"_id":"62ecdb0f49fd7a9eca1384c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659689727108-noauth.jpeg","isPro":false,"fullname":"Tess Dejaeghere","user":"TessDejaeghere","type":"user"},{"_id":"62ee7caa0ba48859909e432a","avatarUrl":"/avatars/76927c0f4de6ba1c1f96e38a2a009a00.svg","isPro":false,"fullname":"Alex Wermer-Colan","user":"hawc2","type":"user"},{"_id":"6116b819468eb5fb52c297c8","avatarUrl":"/avatars/49b89e0646a56aeae686b51545fd06b1.svg","isPro":false,"fullname":"Kiymet","user":"kiymetakd","type":"user"},{"_id":"61aa5322cd8908e9c0ee8d37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659387774009-61aa5322cd8908e9c0ee8d37.jpeg","isPro":false,"fullname":"Jim Salmons","user":"FactMinerJim","type":"user"},{"_id":"6329d9fc5c15412898acc8e2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674554939826-6329d9fc5c15412898acc8e2.jpeg","isPro":true,"fullname":"Andreas Wagner","user":"awagner-mainz","type":"user"},{"_id":"63b302b17091e602f19c5c70","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1672675983039-noauth.jpeg","isPro":false,"fullname":"Jörg Lehmann","user":"Jrglmn","type":"user"},{"_id":"626c59ca030a6e7363b94dad","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1661096595351-626c59ca030a6e7363b94dad.jpeg","isPro":true,"fullname":"Michael Kirchner","user":"kirch","type":"user"},{"_id":"63329fde2217725f3adfff27","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1664442251110-63329fde2217725f3adfff27.png","isPro":false,"fullname":"spartan","user":"nightfury","type":"user"},{"_id":"632767c728396be9bfeb8127","avatarUrl":"/avatars/2ce0bbaffc1d16411aa03436a8108e6c.svg","isPro":false,"fullname":"kim pham","user":"kimpham54","type":"user"},{"_id":"619d60cd8ae9cafd72ab20a0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1673702288458-619d60cd8ae9cafd72ab20a0.jpeg","isPro":false,"fullname":"claes","user":"vincentclaes","type":"user"},{"_id":"63fa0a56d38275b44358432c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1677330988132-noauth.png","isPro":false,"fullname":"Hassan El Hajj","user":"hassanhajj910","type":"user"},{"_id":"636071759ddc44e710e0f5ce","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/636071759ddc44e710e0f5ce/HMIw2KzYN66vTzz-e3k3V.jpeg","isPro":true,"fullname":"Sebastian Majstorovic","user":"storytracer","type":"user"},{"_id":"6367f769592ab54ed7fa6369","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6367f769592ab54ed7fa6369/BGFGqK5YT_WsRmEgLMLjl.jpeg","isPro":false,"fullname":"Adrian Stevenson","user":"adrianstevenson","type":"user"},{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","isPro":false,"fullname":"Librarian Bot (Bot)","user":"librarian-bot","type":"user"},{"_id":"5e6a3d4ea9afd5125d9ec064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","isPro":true,"fullname":"Stefan Schweter","user":"stefan-it","type":"user"},{"_id":"5f0ca59719cb630495b81509","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1631018295628-5f0ca59719cb630495b81509.jpeg","isPro":false,"fullname":"Per Kummervold","user":"pere","type":"user"},{"_id":"63dcb8b922cc06e76a833f6b","avatarUrl":"/avatars/d164e5e03a578e3c615488c7188019ca.svg","isPro":false,"fullname":"David Haskiya","user":"DavidHaskiya","type":"user"},{"_id":"649169c0ed034b1864e3022f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/649169c0ed034b1864e3022f/QsCdYT1pFxXp5rieVEuYZ.jpeg","isPro":false,"fullname":"Nicky Nicolson","user":"nickynicolson","type":"user"},{"_id":"64dcbdc7cfc569da229b6383","avatarUrl":"/avatars/f95cd89ed8a1a6d7e4582a92105097d6.svg","isPro":false,"fullname":"Stefanie Schneider","user":"stefanieschneider","type":"user"},{"_id":"626237d9bbcbd1c34f1bb231","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626237d9bbcbd1c34f1bb231/EJrOjvAL-68qMCYdnvOrq.png","isPro":true,"fullname":"Ali El Filali","user":"alielfilali01","type":"user"},{"_id":"64dbed453725f8d9a918d51e","avatarUrl":"/avatars/ccfff583119d9a30c3dbfd4edf891a1b.svg","isPro":false,"fullname":"Kate Dohe","user":"katedohe","type":"user"},{"_id":"60a4e677917119d38f6bbff8","avatarUrl":"/avatars/85bb740aa905416c52d3e70fd433bd24.svg","isPro":true,"fullname":"Gabriel Borg","user":"Gabriel","type":"user"},{"_id":"6030ccc1e8149a962412a670","avatarUrl":"/avatars/175d175e511679ffd0e2bb26112c5e34.svg","isPro":false,"fullname":"Khalid Almubarak","user":"khalidalt","type":"user"},{"_id":"6149bcdd9dc959fdddb77020","avatarUrl":"/avatars/e59dc347d38715a859fbc7862cead381.svg","isPro":false,"fullname":"Kaspar Beelen","user":"Kaspar","type":"user"},{"_id":"61a615ec995d6dd6541b010c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61a615ec995d6dd6541b010c/27Iy8wd2JEWuhLOkgyi2W.jpeg","isPro":false,"fullname":"Enrique Manjavacas","user":"emanjavacas","type":"user"},{"_id":"6378f3abdbe83216de46d5b9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1668871029594-noauth.jpeg","isPro":false,"fullname":"jason a. clark","user":"jaclark","type":"user"},{"_id":"6352c4eda58cfef322d3dc9f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6352c4eda58cfef322d3dc9f/bgJOtz7DcRPru5xVsBoCQ.jpeg","isPro":true,"fullname":"William Mattingly","user":"wjbmattingly","type":"user"},{"_id":"66bf1c389f88f7346c8f4976","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bf1c389f88f7346c8f4976/N4Q0j1m-SSbCsLjUXqcak.jpeg","isPro":true,"fullname":"Eamonn Bell","user":"eamonnbell","type":"user"},{"_id":"66bf085cae70890c900ae0f8","avatarUrl":"/avatars/48e4902eef0cc87c7779e73286cb0c58.svg","isPro":false,"fullname":"Sarah Ames","user":"semames","type":"user"},{"_id":"6459fa0f5b3111fbe83286e1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6459fa0f5b3111fbe83286e1/E6Buqu8Wd9WmIHKOCZXCc.jpeg","isPro":false,"fullname":"Louis Brulé Naudet","user":"louisbrulenaudet","type":"user"},{"_id":"683380156c78c78fa4e53260","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/683380156c78c78fa4e53260/MmiSkV9XJx3rBnF7sRcb9.jpeg","isPro":false,"fullname":"Bruno Buccalon","user":"buccalon","type":"user"},{"_id":"6348519f3b679b88232e42a2","avatarUrl":"/avatars/515160831d55cb581e2dc59c0c031510.svg","isPro":false,"fullname":"Leon van Wissen","user":"LvanWissen","type":"user"},{"_id":"67cf9a24edb742caa355788d","avatarUrl":"/avatars/304e3b4164b3f9c13bbba3f9f7d8e7dc.svg","isPro":false,"fullname":"Eric Chow","user":"choweric88","type":"user"},{"_id":"6888f2d43db09744b4708090","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/OASwEVSKdhAWNWQexNyzV.png","isPro":false,"fullname":"Kelly Revak","user":"data-monkey","type":"user"},{"_id":"63b95e22205688cd2f8aefb5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b95e22205688cd2f8aefb5/V5Fsq1w6TYp-at9mQO1ht.jpeg","isPro":false,"fullname":"Andrija","user":"Sagicc","type":"user"},{"_id":"62ed66743d88d075d72515fe","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659725419224-noauth.jpeg","isPro":false,"fullname":"Sina S","user":"s-jse","type":"user"},{"_id":"63f4d7b721eb234ab73e941f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f4d7b721eb234ab73e941f/cBVEEPnut9xzJpICPg5ZO.jpeg","isPro":false,"fullname":"Emanuela Boros","user":"emanuelaboros","type":"user"},{"_id":"6274fa947e09960425812a6d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6274fa947e09960425812a6d/l--iLizXFlCewqLb0ysDI.png","isPro":false,"fullname":"Fernando Rodriguez","user":"ferjorosa","type":"user"}],"userCount":63,"collections":[{"slug":"biglam/index-card-datasets-68e38b801f349a4ef520cf03","title":"Index card datasets","description":"Index card datasets for training and evaulating models for conversion of index cards to structured data/metadata","gating":false,"lastUpdated":"2025-10-06T18:16:46.564Z","owner":{"_id":"62bd4a672dd4214c061a002c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657276137483-60107b385ac3e86b3ea4fc34.png","fullname":"BigLAM: BigScience Libraries, Archives and Museums","name":"biglam","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":156,"isUserFollowing":false},"items":[{"_id":"68e38b8be83908c1fc9560a0","position":0,"type":"dataset","author":"biglam","downloads":253,"gated":false,"id":"biglam/rubenstein-manuscript-catalog","lastModified":"2025-10-06T09:08:35.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":49654,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet","optimized-parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":3,"isLikedByUser":false,"isBenchmark":false},{"_id":"68e38b9185c3f3a4a66786f6","position":1,"type":"dataset","author":"biglam","downloads":162,"gated":false,"id":"biglam/bpl-card-catalog","lastModified":"2025-10-06T08:38:45.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":838023,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet","optimized-parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":5,"isLikedByUser":false,"isBenchmark":false},{"_id":"68e4078ef1cd8a28122300d8","position":2,"type":"dataset","author":"biglam","downloads":37,"gated":false,"id":"biglam/sloane-catalogues","lastModified":"2025-08-15T15:10:29.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":2734,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false}],"position":0,"theme":"orange","private":false,"shareUrl":"https://hf.co/collections/biglam/index-card-datasets","upvotes":1,"isUpvotedByUser":false},{"slug":"biglam/automatic-metadata-generation-and-extraction-datasets-68e3811c74d77b5aa4e6c9cc","title":"Automatic Metadata Generation and Extraction datasets","description":"Datasets which can help train or evaluate various approaches to automatic metadata generation and extraction. ","gating":false,"lastUpdated":"2025-10-16T10:01:10.046Z","owner":{"_id":"62bd4a672dd4214c061a002c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657276137483-60107b385ac3e86b3ea4fc34.png","fullname":"BigLAM: BigScience Libraries, Archives and Museums","name":"biglam","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":156,"isUserFollowing":false},"items":[{"_id":"68e3815968c4c07dcfbc9552","position":0,"type":"dataset","author":"biglam","downloads":192,"gated":false,"id":"biglam/doab-metadata-extraction","lastModified":"2025-10-16T10:37:56.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":8086,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet","optimized-parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":12,"isLikedByUser":false,"isBenchmark":false},{"_id":"68e3882974d77b5aa4e79be6","position":1,"type":"dataset","author":"biglam","downloads":253,"gated":false,"id":"biglam/rubenstein-manuscript-catalog","lastModified":"2025-10-06T09:08:35.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":49654,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet","optimized-parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":3,"isLikedByUser":false,"isBenchmark":false},{"_id":"68e38830031bbce2195c57f4","position":2,"type":"dataset","author":"biglam","downloads":162,"gated":false,"id":"biglam/bpl-card-catalog","lastModified":"2025-10-06T08:38:45.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":838023,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet","optimized-parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":5,"isLikedByUser":false,"isBenchmark":false},{"_id":"68f0c26615bbbd7a6acfa055","position":3,"type":"dataset","author":"biglam","downloads":518,"gated":false,"id":"biglam/harvard-library-bibliographic-dataset","lastModified":"2025-10-14T16:54:03.000Z","datasetsServerInfo":{"viewer":"viewer-partial","numRows":11132997,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false}],"position":1,"theme":"orange","private":false,"shareUrl":"https://hf.co/collections/biglam/automatic-metadata-generation-and-extraction-datasets","upvotes":4,"isUpvotedByUser":false},{"slug":"biglam/historic-language-modeling-64f9a0af6b021d61eee993e2","title":"Historic Language Modeling ","description":"This collection contains models, datasets and spaces related to historic language models ","gating":false,"lastUpdated":"2023-10-31T09:21:57.311Z","owner":{"_id":"62bd4a672dd4214c061a002c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657276137483-60107b385ac3e86b3ea4fc34.png","fullname":"BigLAM: BigScience Libraries, Archives and Museums","name":"biglam","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":156,"isUserFollowing":false},"items":[{"_id":"64f9b675a4a1566672a09312","position":0,"type":"model","note":{"html":"A multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).","text":"A multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT)."},"author":"dbmdz","authorData":{"_id":"5e6ad81fe7bec6b37694aeda","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584060655026-noauth.jpeg","fullname":"Bayerische Staatsbibliothek","name":"dbmdz","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":110,"isUserFollowing":false},"downloads":190,"gated":false,"id":"dbmdz/bert-base-historic-multilingual-cased","availableInferenceProviders":[],"lastModified":"2023-09-06T22:15:33.000Z","likes":8,"pipeline_tag":"fill-mask","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":111243522},{"_id":"64f9b6a73fd864082ec3e644","position":1,"type":"model","note":{"html":"A historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT) with 64k vocab size.","text":"A historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT) with 64k vocab size."},"author":"dbmdz","authorData":{"_id":"5e6ad81fe7bec6b37694aeda","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584060655026-noauth.jpeg","fullname":"Bayerische Staatsbibliothek","name":"dbmdz","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":110,"isUserFollowing":false},"downloads":11,"gated":false,"id":"dbmdz/bert-base-historic-multilingual-64k-td-cased","availableInferenceProviders":[],"lastModified":"2023-09-06T22:16:56.000Z","likes":1,"pipeline_tag":"fill-mask","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":135849984},{"_id":"64f9a0eabb86c0fa8084d67d","position":2,"type":"model","note":{"html":"A historical Swedish Bert model is released from the National Swedish Archives to better generalise to Swedish historical text.","text":"A historical Swedish Bert model is released from the National Swedish Archives to better generalise to Swedish historical text."},"author":"Riksarkivet","authorData":{"_id":"63dcb9b022cc06e76a834822","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60a4e677917119d38f6bbff8/k4rnWko6GjWy9GJSdOsYz.png","fullname":"AI Riksarkivet / AIRA","name":"Riksarkivet","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":20,"gated":false,"id":"Riksarkivet/bert-base-cased-swe-historical","availableInferenceProviders":[],"lastModified":"2023-10-11T12:31:37.000Z","likes":4,"pipeline_tag":"fill-mask","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":135259392},{"_id":"64f9a0d683d41627772ee6d7","position":3,"type":"dataset","note":{"html":"The American Stories dataset is a collection of full article texts extracted from historical U.S. newspaper images. It includes nearly 20 million scans from the public domain Chronicling America collection maintained by the Library of Congress. \n","text":"The American Stories dataset is a collection of full article texts extracted from historical U.S. newspaper images. It includes nearly 20 million scans from the public domain Chronicling America collection maintained by the Library of Congress. \n"},"author":"dell-research-harvard","downloads":14231,"gated":false,"id":"dell-research-harvard/AmericanStories","lastModified":"2025-03-26T15:49:08.000Z","private":false,"repoType":"dataset","likes":158,"isLikedByUser":false,"isBenchmark":false}],"position":2,"theme":"pink","private":false,"shareUrl":"https://hf.co/collections/biglam/historic-language-modeling","upvotes":5,"isUpvotedByUser":false},{"slug":"biglam/historic-newsaper-datasets-64fadf6567096272ae6325e9","title":"Historic Newsaper Datasets","description":"Historic Newspaper Datasets on the Hub ","gating":false,"lastUpdated":"2025-05-08T07:39:06.813Z","owner":{"_id":"62bd4a672dd4214c061a002c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657276137483-60107b385ac3e86b3ea4fc34.png","fullname":"BigLAM: BigScience Libraries, Archives and Museums","name":"biglam","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":156,"isUserFollowing":false},"items":[{"_id":"64fadf80ace5670c7344fa90","position":0,"type":"paper","id":"2005.01583","title":"The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content\n from 16 Million Historic Newspaper Pages in Chronicling America","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2005.01583.png","upvotes":3,"publishedAt":"2020-05-04T15:51:13.000Z","isUpvotedByUser":false},{"_id":"64fadf90b3eee10ba5f81afb","position":1,"type":"dataset","author":"bigscience-historical-texts","downloads":44,"gated":false,"id":"bigscience-historical-texts/hipe2020","lastModified":"2023-02-07T08:54:43.000Z","private":false,"repoType":"dataset","likes":3,"isLikedByUser":false,"isBenchmark":false},{"_id":"64fadf92b3eee10ba5f81b37","position":2,"type":"dataset","author":"bigscience-historical-texts","downloads":32,"gated":false,"id":"bigscience-historical-texts/HIPE2020_sent-split","lastModified":"2022-04-07T10:12:42.000Z","private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"_id":"64fadfbe491fad0963150140","position":3,"type":"dataset","author":"biglam","downloads":32,"gated":false,"id":"biglam/bnl_newspapers1841-1879","lastModified":"2024-11-26T15:20:32.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":630709,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false}],"position":3,"theme":"orange","private":false,"shareUrl":"https://hf.co/collections/biglam/historic-newsaper-datasets","upvotes":6,"isUpvotedByUser":false}],"datasets":[{"author":"biglam","downloads":192,"gated":false,"id":"biglam/doab-metadata-extraction","lastModified":"2025-10-16T10:37:56.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":8086,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet","optimized-parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":12,"isLikedByUser":false,"isBenchmark":false},{"author":"biglam","downloads":518,"gated":false,"id":"biglam/harvard-library-bibliographic-dataset","lastModified":"2025-10-14T16:54:03.000Z","datasetsServerInfo":{"viewer":"viewer-partial","numRows":11132997,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false},{"author":"biglam","downloads":253,"gated":false,"id":"biglam/rubenstein-manuscript-catalog","lastModified":"2025-10-06T09:08:35.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":49654,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet","optimized-parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":3,"isLikedByUser":false,"isBenchmark":false},{"author":"biglam","downloads":162,"gated":false,"id":"biglam/bpl-card-catalog","lastModified":"2025-10-06T08:38:45.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":838023,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet","optimized-parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":5,"isLikedByUser":false,"isBenchmark":false},{"author":"biglam","downloads":357,"gated":false,"id":"biglam/brill_iconclass","lastModified":"2025-09-03T13:18:16.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":87744,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":8,"isLikedByUser":false,"isBenchmark":false},{"author":"biglam","downloads":37,"gated":false,"id":"biglam/sloane-catalogues","lastModified":"2025-08-15T15:10:29.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":2734,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false},{"author":"biglam","downloads":55,"gated":false,"id":"biglam/newspaper-navigator","lastModified":"2025-05-20T11:18:03.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":6549122,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["image","tabular","text"]},"private":false,"repoType":"dataset","likes":13,"isLikedByUser":false,"isBenchmark":false},{"author":"biglam","downloads":44,"gated":false,"id":"biglam/loc_beyond_words","lastModified":"2025-05-07T10:59:45.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":3558,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":15,"isLikedByUser":false,"isBenchmark":false},{"author":"biglam","downloads":1218,"gated":false,"id":"biglam/europeana_newspapers","lastModified":"2025-05-02T16:34:32.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":11887390,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["image","tabular","text"]},"private":false,"repoType":"dataset","likes":58,"isLikedByUser":false,"isBenchmark":false},{"author":"biglam","downloads":561,"gated":false,"id":"biglam/european_art","lastModified":"2025-03-31T18:04:12.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":15154,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":20,"isLikedByUser":false,"isBenchmark":false}],"models":[{"author":"biglam","authorData":{"_id":"62bd4a672dd4214c061a002c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657276137483-60107b385ac3e86b3ea4fc34.png","fullname":"BigLAM: BigScience Libraries, Archives and Museums","name":"biglam","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":156,"isUserFollowing":false},"downloads":0,"gated":false,"id":"biglam/medieval-manuscript-yolov11","availableInferenceProviders":[],"lastModified":"2025-03-27T15:43:13.000Z","likes":4,"pipeline_tag":"object-detection","private":false,"repoType":"model","isLikedByUser":false},{"author":"biglam","authorData":{"_id":"62bd4a672dd4214c061a002c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657276137483-60107b385ac3e86b3ea4fc34.png","fullname":"BigLAM: BigScience Libraries, Archives and Museums","name":"biglam","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":156,"isUserFollowing":false},"downloads":15,"gated":false,"id":"biglam/detr-resnet-50_fine_tuned_loc-2023","availableInferenceProviders":[],"lastModified":"2025-03-13T19:18:12.000Z","likes":2,"pipeline_tag":"object-detection","private":false,"repoType":"model","isLikedByUser":false,"numParameters":41609420},{"author":"biglam","authorData":{"_id":"62bd4a672dd4214c061a002c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657276137483-60107b385ac3e86b3ea4fc34.png","fullname":"BigLAM: BigScience Libraries, Archives and Museums","name":"biglam","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":156,"isUserFollowing":false},"downloads":36,"gated":false,"id":"biglam/detr-resnet-50_fine_tuned_nls_chapbooks","availableInferenceProviders":[],"lastModified":"2023-09-19T15:31:38.000Z","likes":7,"pipeline_tag":"object-detection","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":41607878},{"author":"biglam","authorData":{"_id":"62bd4a672dd4214c061a002c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657276137483-60107b385ac3e86b3ea4fc34.png","fullname":"BigLAM: BigScience Libraries, Archives and Museums","name":"biglam","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":156,"isUserFollowing":false},"downloads":3,"gated":false,"id":"biglam/cultural_heritage_metadata_accuracy","availableInferenceProviders":[],"lastModified":"2023-06-26T12:31:05.000Z","likes":3,"pipeline_tag":"text-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":109929730},{"author":"biglam","authorData":{"_id":"62bd4a672dd4214c061a002c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657276137483-60107b385ac3e86b3ea4fc34.png","fullname":"BigLAM: BigScience Libraries, Archives and Museums","name":"biglam","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":156,"isUserFollowing":false},"downloads":4,"gated":false,"id":"biglam/autotrain-beyond-the-books","availableInferenceProviders":[],"lastModified":"2023-06-26T12:27:32.000Z","likes":0,"pipeline_tag":"text-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":109484290}],"paperPreviews":[],"spaces":[{"author":"biglam","authorData":{"_id":"62bd4a672dd4214c061a002c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1657276137483-60107b385ac3e86b3ea4fc34.png","fullname":"BigLAM: BigScience Libraries, Archives and Museums","name":"biglam","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":156,"isUserFollowing":false},"colorFrom":"yellow","colorTo":"green","createdAt":"2025-03-26T21:07:55.000Z","emoji":"🐠","id":"biglam/medieval-yolo","lastModified":"2025-03-27T14:56:21.000Z","likes":4,"pinned":false,"private":false,"sdk":"gradio","repoType":"space","runtime":{"stage":"SLEEPING","hardware":{"current":null,"requested":"cpu-basic"},"storage":null,"gcTimeout":172800,"replicas":{"requested":1},"devMode":false,"domains":[{"domain":"cultural-heritage-medieval-yolo.hf.space","stage":"READY"},{"domain":"biglam-medieval-yolo.hf.space","stage":"READY"}]},"title":"Medieval Yolo","isLikedByUser":false,"ai_short_description":"Identify objects in medieval manuscripts using YOLO models","ai_category":"Image","trendingScore":0,"tags":["gradio","region:us"],"featured":false}],"buckets":[],"numBuckets":0,"numDatasets":38,"numModels":5,"numSpaces":2,"lastOrgActivities":[{"time":"2026-01-30T10:45:21.105Z","user":"stefan-it","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","type":"paper-daily","paper":{"id":"2601.22146","title":"FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.22146.png","upvotes":9,"publishedAt":"2026-01-29T18:58:47.000Z","isUpvotedByUser":true}},{"time":"2025-12-04T13:48:37.896Z","user":"christopher","userAvatarUrl":"","type":"paper","paper":{"id":"2512.03073","title":"Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem","publishedAt":"2025-11-27T12:50:25.000Z","upvotes":6,"isUpvotedByUser":true}},{"time":"2025-10-30T14:42:32.997Z","user":"Zaid","userAvatarUrl":"","type":"paper","paper":{"id":"2510.24081","title":"Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+\n Languages and Cultures","publishedAt":"2025-10-28T05:46:25.000Z","upvotes":19,"isUpvotedByUser":true}}],"acceptLanguages":["*"],"canReadRepos":false,"canReadSpaces":false,"blogPosts":[],"currentRepoPage":0,"filters":{},"paperView":false}">

    AI & ML interests

    🤗 Hugging Face x 🌸 BigScience initiative to create open source community resources for LAMs.

    Recent Activity

    📚 BigLAM: Machine Learning for Libraries, Archives, and Museums

    BigLAM is a community-driven initiative to build an open ecosystem of machine learning models, datasets, and tools for Libraries, Archives, and Museums (LAMs).

    We aim to:

    • 🗃️ Share machine-learning-ready datasets from LAMs via the Hugging Face Hub
    • 🤖 Train and release open-source models for LAM-relevant tasks
    • 🛠️ Develop tools and approaches tailored to LAM use cases

    ✨ Background

    BigLAM began as a datasets hackathon within the BigScience 🌸 project, a large-scale, open NLP collaboration.

    Our goal: make LAM datasets more discoverable and usable to support researchers, institutions, and ML practitioners working with cultural heritage data.

    📂 What You'll Find

    The BigLAM organization hosts:

    • Datasets: image, text, and tabular data from and about libraries, archives, and museums
    • Models: fine-tuned for tasks like:
      • Art/historical image classification
      • Document layout analysis and OCR
      • Metadata quality assessment
      • Named entity recognition in heritage texts
    • Spaces: tools for interactive exploration and demonstration
    🧩 Get Involved

    We welcome contributions! You can:

    • Use our datasets and models
    • Join the discussion on GitHub
    • Contribute your own tools or data
    • Share your work using BigLAM resources

    🌍 Why It Matters

    Cultural heritage data is often underrepresented in machine learning. BigLAM helps address this by:

    • Supporting inclusive and responsible AI
    • Helping institutions experiment with ML for access, discovery, and preservation
    • Ensuring that ML systems reflect diverse human knowledge and expression
    • Developing tools and methods that work well with the unique formats, values, and needs of LAMs