Leaderboard
\nWe test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana.\nThe following table shows an overview of used datasets:
\n| Language | \nDatasets | \n
|---|---|
| English | \nAjMC - TopRes19th | \n
| German | \nAjMC - NewsEye - HIPE-2020 | \n
| French | \nAjMC - ICDAR-Europeana - LeTemps - NewsEye - HIPE-2020 | \n
| Finnish | \nNewsEye | \n
| Swedish | \nNewsEye | \n
| Dutch | \nICDAR-Europeana | \n
All results can be found in the hmLeaderboard.
Acknowledgements
\nWe thank Luisa März, Katharina Schmid and\nErion Çano for their fruitful discussions about Historical Language Models.
\nResearch supported with Cloud TPUs from Google's TPU Research Cloud (TRC).\nMany Thanks for providing access to the TPUs ❤️
\n","classNames":"hf-sanitized hf-sanitized-n0awrwNN1-ryWoEwBRs7t"},"users":[{"_id":"5e6a3d4ea9afd5125d9ec064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","isPro":true,"fullname":"Stefan Schweter","user":"stefan-it","type":"user"}],"userCount":1,"collections":[],"datasets":[],"models":[{"author":"hmbert","authorData":{"_id":"651968b0107446b24cafdc63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e6a3d4ea9afd5125d9ec064/w_Esvls-FWaJ6CW57JjGt.jpeg","fullname":"hmBERT","name":"hmbert","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"downloads":5,"gated":false,"id":"hmbert/flair-hipe-2022-hipe2020-fr","availableInferenceProviders":[],"lastModified":"2023-10-17T23:23:35.000Z","likes":0,"pipeline_tag":"token-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"hmbert","authorData":{"_id":"651968b0107446b24cafdc63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e6a3d4ea9afd5125d9ec064/w_Esvls-FWaJ6CW57JjGt.jpeg","fullname":"hmBERT","name":"hmbert","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"downloads":4,"gated":false,"id":"hmbert/flair-hipe-2022-hipe2020-de","availableInferenceProviders":[],"lastModified":"2023-10-17T23:22:54.000Z","likes":0,"pipeline_tag":"token-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"hmbert","authorData":{"_id":"651968b0107446b24cafdc63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e6a3d4ea9afd5125d9ec064/w_Esvls-FWaJ6CW57JjGt.jpeg","fullname":"hmBERT","name":"hmbert","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"downloads":0,"gated":false,"id":"hmbert/flair-hipe-2022-topres19th-en","availableInferenceProviders":[],"lastModified":"2023-10-17T23:22:07.000Z","likes":0,"pipeline_tag":"token-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"hmbert","authorData":{"_id":"651968b0107446b24cafdc63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e6a3d4ea9afd5125d9ec064/w_Esvls-FWaJ6CW57JjGt.jpeg","fullname":"hmBERT","name":"hmbert","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"downloads":1,"gated":false,"id":"hmbert/flair-hipe-2022-letemps-fr","availableInferenceProviders":[],"lastModified":"2023-10-17T23:21:20.000Z","likes":0,"pipeline_tag":"token-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"hmbert","authorData":{"_id":"651968b0107446b24cafdc63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e6a3d4ea9afd5125d9ec064/w_Esvls-FWaJ6CW57JjGt.jpeg","fullname":"hmBERT","name":"hmbert","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"downloads":2,"gated":false,"id":"hmbert/flair-icdar-fr","availableInferenceProviders":[],"lastModified":"2023-10-17T23:20:52.000Z","likes":0,"pipeline_tag":"token-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"hmbert","authorData":{"_id":"651968b0107446b24cafdc63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e6a3d4ea9afd5125d9ec064/w_Esvls-FWaJ6CW57JjGt.jpeg","fullname":"hmBERT","name":"hmbert","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"downloads":2,"gated":false,"id":"hmbert/flair-icdar-nl","availableInferenceProviders":[],"lastModified":"2023-10-17T23:20:08.000Z","likes":0,"pipeline_tag":"token-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"hmbert","authorData":{"_id":"651968b0107446b24cafdc63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e6a3d4ea9afd5125d9ec064/w_Esvls-FWaJ6CW57JjGt.jpeg","fullname":"hmBERT","name":"hmbert","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"downloads":2,"gated":false,"id":"hmbert/flair-hipe-2022-newseye-sv","availableInferenceProviders":[],"lastModified":"2023-10-17T23:19:17.000Z","likes":0,"pipeline_tag":"token-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"hmbert","authorData":{"_id":"651968b0107446b24cafdc63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e6a3d4ea9afd5125d9ec064/w_Esvls-FWaJ6CW57JjGt.jpeg","fullname":"hmBERT","name":"hmbert","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"downloads":2,"gated":false,"id":"hmbert/flair-hipe-2022-newseye-fi","availableInferenceProviders":[],"lastModified":"2023-10-17T23:18:30.000Z","likes":0,"pipeline_tag":"token-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"hmbert","authorData":{"_id":"651968b0107446b24cafdc63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e6a3d4ea9afd5125d9ec064/w_Esvls-FWaJ6CW57JjGt.jpeg","fullname":"hmBERT","name":"hmbert","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"downloads":3,"gated":false,"id":"hmbert/flair-hipe-2022-newseye-fr","availableInferenceProviders":[],"lastModified":"2023-10-17T23:17:21.000Z","likes":0,"pipeline_tag":"token-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"hmbert","authorData":{"_id":"651968b0107446b24cafdc63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e6a3d4ea9afd5125d9ec064/w_Esvls-FWaJ6CW57JjGt.jpeg","fullname":"hmBERT","name":"hmbert","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"downloads":3,"gated":false,"id":"hmbert/flair-hipe-2022-newseye-de","availableInferenceProviders":[],"lastModified":"2023-10-17T23:16:25.000Z","likes":0,"pipeline_tag":"token-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]}],"paperPreviews":[],"spaces":[],"buckets":[],"numBuckets":0,"numDatasets":0,"numModels":13,"numSpaces":1,"lastOrgActivities":[{"time":"2026-01-30T10:45:21.105Z","user":"stefan-it","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","type":"paper-daily","paper":{"id":"2601.22146","title":"FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.22146.png","upvotes":9,"publishedAt":"2026-01-29T18:58:47.000Z","isUpvotedByUser":true}},{"time":"2025-10-27T10:24:47.744Z","user":"stefan-it","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","type":"paper","paper":{"id":"2510.21364","title":"SindBERT, the Sailor: Charting the Seas of Turkish NLP","publishedAt":"2025-10-24T11:48:49.000Z","upvotes":1,"isUpvotedByUser":true}},{"time":"2025-10-17T05:56:03.619Z","user":"stefan-it","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","type":"paper","paper":{"id":"2510.13996","title":"The German Commons - 154 Billion Tokens of Openly Licensed Text for\n German Language Models","publishedAt":"2025-10-15T18:24:26.000Z","upvotes":9,"isUpvotedByUser":true}}],"acceptLanguages":["*"],"canReadRepos":false,"canReadSpaces":false,"blogPosts":[],"currentRepoPage":0,"filters":{},"paperView":false}">AI & ML interests
Pretraining Historical Multilingual Language Models
Recent Activity
hmBERT
Historical Multilingual Language Models for Named Entity Recognition. The following languages are covered by hmBERT:
- English (British Library Corpus - Books)
- German (Europeana Newspaper)
- French (Europeana Newspaper)
- Finnish (Europeana Newspaper)
- Swedish (Europeana Newspaper)
More details can be found in our GitHub repository and in our hmBERT paper.
Leaderboard
We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana. The following table shows an overview of used datasets:
| Language | Datasets |
|---|---|
| English | AjMC - TopRes19th |
| German | AjMC - NewsEye - HIPE-2020 |
| French | AjMC - ICDAR-Europeana - LeTemps - NewsEye - HIPE-2020 |
| Finnish | NewsEye |
| Swedish | NewsEye |
| Dutch | ICDAR-Europeana |
All results can be found in the hmLeaderboard.
Acknowledgements
We thank Luisa März, Katharina Schmid and Erion Çano for their fruitful discussions about Historical Language Models.
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️