Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
[go: Go Back, main page]

\"image.png\"
Hope it helps!

\n","updatedAt":"2024-12-19T20:21:58.700Z","author":{"_id":"609bbe2f4932693ca2009d6a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1620819560688-609bbe2f4932693ca2009d6a.jpeg","fullname":"Antoine Chaffin","name":"NohTow","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":67,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6903480291366577},"editors":["NohTow"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1620819560688-609bbe2f4932693ca2009d6a.jpeg"],"reactions":[{"reaction":"❀️","users":["stefan-it","tomaarsen","julien-c","pcuenq","jmackie","byteprobe","Mou11209203"],"count":7}],"isReport":false,"parentCommentId":"6764792e0520bf0508abfc3f"}},{"id":"67653aba23bf0782125b7bc5","author":{"_id":"5dd96eb166059660ed1ee413","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg","fullname":"Julien Chaumond","name":"julien-c","type":"user","isPro":true,"isHf":true,"isHfAdmin":true,"isMod":false,"followerCount":3839,"isUserFollowing":false},"createdAt":"2024-12-20T09:36:58.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"very cool work @NohTow ","html":"

very cool work \n\n@NohTow\n\t

\n","updatedAt":"2024-12-20T09:36:58.074Z","author":{"_id":"5dd96eb166059660ed1ee413","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg","fullname":"Julien Chaumond","name":"julien-c","type":"user","isPro":true,"isHf":true,"isHfAdmin":true,"isMod":false,"followerCount":3839,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8892500400543213},"editors":["julien-c"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg"],"reactions":[{"reaction":"❀️","users":["NohTow","pcuenq"],"count":2}],"isReport":false,"parentCommentId":"6764792e0520bf0508abfc3f"}},{"id":"67815d970d03aaa51d5a794d","author":{"_id":"621b497944b048c1df6526e6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678800930988-621b497944b048c1df6526e6.jpeg","fullname":"Lefteris Loukas","name":"eloukas","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false},"createdAt":"2025-01-10T17:49:11.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"@NohTow Do you have any general estimation on how many dollars did the pre-training cost?","html":"

\n\n@NohTow\n\t Do you have any general estimation on how many dollars did the pre-training cost?

\n","updatedAt":"2025-01-10T17:49:11.099Z","author":{"_id":"621b497944b048c1df6526e6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678800930988-621b497944b048c1df6526e6.jpeg","fullname":"Lefteris Loukas","name":"eloukas","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9483456015586853},"editors":["eloukas"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1678800930988-621b497944b048c1df6526e6.jpeg"],"reactions":[{"reaction":"πŸš€","users":["Mou11209203"],"count":1}],"isReport":false,"parentCommentId":"6764792e0520bf0508abfc3f"}}]},{"id":"6764c9bc2cca262c8b9f31df","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-12-20T01:34:52.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Two are better than one: Context window extension with multi-grained self-injection](https://huggingface.co/papers/2410.19318) (2024)\n* [Why Does the Effective Context Length of LLMs Fall Short?](https://huggingface.co/papers/2410.18745) (2024)\n* [Are Decoder-Only Large Language Models the Silver Bullet for Code Search?](https://huggingface.co/papers/2410.22240) (2024)\n* [Sparse Upcycling: Inference Inefficient Finetuning](https://huggingface.co/papers/2411.08968) (2024)\n* [MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning](https://huggingface.co/papers/2410.18035) (2024)\n* [A Survey of Small Language Models](https://huggingface.co/papers/2410.20011) (2024)\n* [MrT5: Dynamic Token Merging for Efficient Byte-level Language Models](https://huggingface.co/papers/2410.20771) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-12-20T01:34:52.471Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.719578742980957},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"πŸ‘","users":["davanstrien","samwit","toczkos"],"count":3}],"isReport":false}},{"id":"6764dc3ab55176ac0615d801","author":{"_id":"611156a7ddf0864f26e9081c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1628526223316-noauth.jpeg","fullname":"Jaidev","name":"shahjaidev","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-12-20T02:53:46.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Great work, especially for most industry tasks ","html":"

Great work, especially for most industry tasks

\n","updatedAt":"2024-12-20T02:53:46.083Z","author":{"_id":"611156a7ddf0864f26e9081c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1628526223316-noauth.jpeg","fullname":"Jaidev","name":"shahjaidev","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9400181770324707},"editors":["shahjaidev"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1628526223316-noauth.jpeg"],"reactions":[{"reaction":"❀️","users":["NohTow","bclavie","pcuenq","toczkos"],"count":4}],"isReport":false}},{"id":"67653c965f1520d57f1eac2a","author":{"_id":"63d29ab2b734eaa4d4f3565a","avatarUrl":"/avatars/573d75f42dafdc49b6836511822f3c38.svg","fullname":"Tom Schelsen","name":"TomSchelsen","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2024-12-20T09:44:54.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Thanks for this very welcomed modernisation of 'good old' BERT architecture ;) \nHowever, a big part of the appeal of recent LLM/decoder-only models for a lot of us is their multilingual capability. Would love to see a variant pretrained on more natural languages (instead of code to keep the same training budget, and as the two would be complementary i.e. used for different downstream applications). :)","html":"

Thanks for this very welcomed modernisation of 'good old' BERT architecture ;)
However, a big part of the appeal of recent LLM/decoder-only models for a lot of us is their multilingual capability. Would love to see a variant pretrained on more natural languages (instead of code to keep the same training budget, and as the two would be complementary i.e. used for different downstream applications). :)

\n","updatedAt":"2024-12-20T09:46:29.931Z","author":{"_id":"63d29ab2b734eaa4d4f3565a","avatarUrl":"/avatars/573d75f42dafdc49b6836511822f3c38.svg","fullname":"Tom Schelsen","name":"TomSchelsen","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9492532014846802},"editors":["TomSchelsen"],"editorAvatarUrls":["/avatars/573d75f42dafdc49b6836511822f3c38.svg"],"reactions":[{"reaction":"πŸ‘","users":["aarabil","Enigrand","samwit","djsull","julien-c"],"count":5}],"isReport":false}},{"id":"678959412b4189d8ce9f6120","author":{"_id":"65d7bdcb13f79f1dcac064dd","avatarUrl":"/avatars/881d93a9080c8eb27ab80896abe9f0dc.svg","fullname":"Markus Jonek","name":"markusjonek","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-01-16T19:08:49.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"It would be very interesting to see a training loss curve! Does a 150/300M model really need almost 2T tokens?\n\nThanks :)","html":"

It would be very interesting to see a training loss curve! Does a 150/300M model really need almost 2T tokens?

\n

Thanks :)

\n","updatedAt":"2025-01-16T19:08:49.007Z","author":{"_id":"65d7bdcb13f79f1dcac064dd","avatarUrl":"/avatars/881d93a9080c8eb27ab80896abe9f0dc.svg","fullname":"Markus Jonek","name":"markusjonek","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9779412150382996},"editors":["markusjonek"],"editorAvatarUrls":["/avatars/881d93a9080c8eb27ab80896abe9f0dc.svg"],"reactions":[],"isReport":false}},{"id":"67e29425c3e41951b607d8ff","author":{"_id":"66101603ba1204325ea3f692","avatarUrl":"/avatars/1abcff969172b214ce5c1495b54dd36c.svg","fullname":"Divya S","name":"divusree05","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-03-25T11:31:49.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"What is the performance of ModernBERT on NER tasks, for example on CoNLL-2003 dataset? Happy to see a modernized BERT after so long :) ","html":"

What is the performance of ModernBERT on NER tasks, for example on CoNLL-2003 dataset? Happy to see a modernized BERT after so long :)

\n","updatedAt":"2025-03-25T11:31:49.666Z","author":{"_id":"66101603ba1204325ea3f692","avatarUrl":"/avatars/1abcff969172b214ce5c1495b54dd36c.svg","fullname":"Divya S","name":"divusree05","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7723508477210999},"editors":["divusree05"],"editorAvatarUrls":["/avatars/1abcff969172b214ce5c1495b54dd36c.svg"],"reactions":[],"isReport":false},"replies":[{"id":"67e298335686bb8303d4e4fc","author":{"_id":"6317233cc92fd6fee317e030","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png","fullname":"Tom Aarsen","name":"tomaarsen","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":3517,"isUserFollowing":false},"createdAt":"2025-03-25T11:49:07.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"We haven't evaluated on NER, I believe, but @stefan-it may have actually ran some tests on CoNLL03.","html":"

We haven't evaluated on NER, I believe, but \n\n@stefan-it\n\t may have actually ran some tests on CoNLL03.

\n","updatedAt":"2025-03-25T11:49:07.603Z","author":{"_id":"6317233cc92fd6fee317e030","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png","fullname":"Tom Aarsen","name":"tomaarsen","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":3517,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9724524617195129},"editors":["tomaarsen"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png"],"reactions":[{"reaction":"❀️","users":["stefan-it"],"count":1}],"isReport":false,"parentCommentId":"67e29425c3e41951b607d8ff"}},{"id":"67e2c80835d3c9a9d9389a00","author":{"_id":"66101603ba1204325ea3f692","avatarUrl":"/avatars/1abcff969172b214ce5c1495b54dd36c.svg","fullname":"Divya S","name":"divusree05","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-03-25T15:13:12.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Oh cool so I can fine tune it for NER and see how it turns out since CoNLL03 is not a part of the training data!","html":"

Oh cool so I can fine tune it for NER and see how it turns out since CoNLL03 is not a part of the training data!

\n","updatedAt":"2025-03-25T15:13:12.855Z","author":{"_id":"66101603ba1204325ea3f692","avatarUrl":"/avatars/1abcff969172b214ce5c1495b54dd36c.svg","fullname":"Divya S","name":"divusree05","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9528384208679199},"editors":["divusree05"],"editorAvatarUrls":["/avatars/1abcff969172b214ce5c1495b54dd36c.svg"],"reactions":[],"isReport":false,"parentCommentId":"67e29425c3e41951b607d8ff"}},{"id":"67e2c8b4511350d64247a037","author":{"_id":"6317233cc92fd6fee317e030","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png","fullname":"Tom Aarsen","name":"tomaarsen","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":3517,"isUserFollowing":false},"createdAt":"2025-03-25T15:16:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I don't know if it's in the training data or not, but you can definitely finetune it for NER, indeed! ","html":"

I don't know if it's in the training data or not, but you can definitely finetune it for NER, indeed!

\n","updatedAt":"2025-03-25T15:16:04.066Z","author":{"_id":"6317233cc92fd6fee317e030","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png","fullname":"Tom Aarsen","name":"tomaarsen","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":3517,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9819542169570923},"editors":["tomaarsen"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png"],"reactions":[],"isReport":false,"parentCommentId":"67e29425c3e41951b607d8ff"}},{"id":"67e2c9f25786658ae47dbf78","author":{"_id":"5e6a3d4ea9afd5125d9ec064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","fullname":"Stefan Schweter","name":"stefan-it","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3686,"isUserFollowing":false},"createdAt":"2025-03-25T15:21:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Here are some experiments I've made with ModernBERT on CoNLL-2003 so far: \n\nhttps://github.com/stefan-it/modern-bert-ner\n\n:)","html":"

Here are some experiments I've made with ModernBERT on CoNLL-2003 so far:

\n

https://github.com/stefan-it/modern-bert-ner

\n

:)

\n","updatedAt":"2025-03-25T15:21:22.036Z","author":{"_id":"5e6a3d4ea9afd5125d9ec064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","fullname":"Stefan Schweter","name":"stefan-it","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3686,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8863590359687805},"editors":["stefan-it"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg"],"reactions":[{"reaction":"❀️","users":["tomaarsen","divusree05","chmadran"],"count":3}],"isReport":false,"parentCommentId":"67e29425c3e41951b607d8ff"}},{"id":"67e2dd7dc074c44798a62c9e","author":{"_id":"66101603ba1204325ea3f692","avatarUrl":"/avatars/1abcff969172b214ce5c1495b54dd36c.svg","fullname":"Divya S","name":"divusree05","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-03-25T16:44:45.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Great! I'll play around with it, thanks :) ","html":"

Great! I'll play around with it, thanks :)

\n","updatedAt":"2025-03-25T16:44:45.236Z","author":{"_id":"66101603ba1204325ea3f692","avatarUrl":"/avatars/1abcff969172b214ce5c1495b54dd36c.svg","fullname":"Divya S","name":"divusree05","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.987975001335144},"editors":["divusree05"],"editorAvatarUrls":["/avatars/1abcff969172b214ce5c1495b54dd36c.svg"],"reactions":[],"isReport":false,"parentCommentId":"67e29425c3e41951b607d8ff"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2412.13663","authors":[{"_id":"67638c8ed63e4b348e8a5070","user":{"_id":"61915ad7913429273972a602","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1649175349873-61915ad7913429273972a602.jpeg","isPro":false,"fullname":"Benjamin Warner","user":"bwarner","type":"user"},"name":"Benjamin Warner","status":"claimed_verified","statusLastChangedAt":"2024-12-20T08:33:37.192Z","hidden":false},{"_id":"67638c8ed63e4b348e8a5071","user":{"_id":"609bbe2f4932693ca2009d6a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1620819560688-609bbe2f4932693ca2009d6a.jpeg","isPro":false,"fullname":"Antoine Chaffin","user":"NohTow","type":"user"},"name":"Antoine Chaffin","status":"claimed_verified","statusLastChangedAt":"2024-12-19T18:01:27.164Z","hidden":false},{"_id":"67638c8ed63e4b348e8a5072","user":{"_id":"5ff60d4352c26e9bc240badd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5ff60d4352c26e9bc240badd/HzoknJibrSasc1ZzU71XA.png","isPro":false,"fullname":"Benjamin ClaviΓ©","user":"bclavie","type":"user"},"name":"Benjamin ClaviΓ©","status":"claimed_verified","statusLastChangedAt":"2024-12-19T18:01:19.852Z","hidden":false},{"_id":"67638c8ed63e4b348e8a5073","user":{"_id":"6362d9712691058b19de1ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6362d9712691058b19de1ba4/Hdqj5aGrFJJbF7oUSzoIh.jpeg","isPro":true,"fullname":"Orion Weller","user":"orionweller","type":"user"},"name":"Orion Weller","status":"claimed_verified","statusLastChangedAt":"2024-12-19T18:01:24.807Z","hidden":false},{"_id":"67638c8ed63e4b348e8a5074","user":{"_id":"6305ea153aed65d34e9465dc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6305ea153aed65d34e9465dc/g40hxNnctLbfEp6KAC1QG.jpeg","isPro":false,"fullname":"Oskar HallstrΓΆm","user":"ohallstrom","type":"user"},"name":"Oskar HallstrΓΆm","status":"admin_assigned","statusLastChangedAt":"2024-12-20T09:35:59.025Z","hidden":false},{"_id":"67638c8ed63e4b348e8a5075","user":{"_id":"62cd695e94b9dcedbf1818e5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cd695e94b9dcedbf1818e5/qhKhRxdNdbKPIAUfYBvtI.png","isPro":false,"fullname":"Said Taghadouini","user":"staghado","type":"user"},"name":"Said Taghadouini","status":"claimed_verified","statusLastChangedAt":"2024-12-19T18:01:22.753Z","hidden":false},{"_id":"67638c8ed63e4b348e8a5076","name":"Alexis Gallagher","hidden":false},{"_id":"67638c8ed63e4b348e8a5077","user":{"_id":"61b839889f7cfeae618e72c9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61b839889f7cfeae618e72c9/5kFRCChdqwv7MGM8T_y5v.jpeg","isPro":false,"fullname":"Raja Biswas","user":"rbiswasfc","type":"user"},"name":"Raja Biswas","status":"claimed_verified","statusLastChangedAt":"2024-12-20T13:55:57.854Z","hidden":false},{"_id":"67638c8ed63e4b348e8a5078","user":{"_id":"61ee4f4f53fdc44f4e5e197a","avatarUrl":"/avatars/f23f56809805c3e06c580d7420f48494.svg","isPro":false,"fullname":"Faisal Ladhak","user":"fladhak","type":"user"},"name":"Faisal Ladhak","status":"admin_assigned","statusLastChangedAt":"2024-12-20T09:36:44.416Z","hidden":false},{"_id":"67638c8ed63e4b348e8a5079","user":{"_id":"6317233cc92fd6fee317e030","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png","isPro":false,"fullname":"Tom Aarsen","user":"tomaarsen","type":"user"},"name":"Tom Aarsen","status":"claimed_verified","statusLastChangedAt":"2024-12-19T18:01:29.150Z","hidden":false},{"_id":"67638c8ed63e4b348e8a507a","user":{"_id":"5ebb6509a8e72729bee106a0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1633453199694-5ebb6509a8e72729bee106a0.png","isPro":false,"fullname":"Nathan Cooper","user":"ncoop57","type":"user"},"name":"Nathan Cooper","status":"claimed_verified","statusLastChangedAt":"2024-12-21T15:20:01.022Z","hidden":false},{"_id":"67638c8ed63e4b348e8a507b","user":{"_id":"611693254ef9fdfbf45dc4d6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1628869402886-noauth.jpeg","isPro":false,"fullname":"Griffin Adams","user":"griffin","type":"user"},"name":"Griffin Adams","status":"admin_assigned","statusLastChangedAt":"2024-12-20T09:36:11.852Z","hidden":false},{"_id":"67638c8ed63e4b348e8a507c","user":{"_id":"623e19ff8972a8c030af5277","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/623e19ff8972a8c030af5277/jq9jfT_Sl5B1ZSMHGmBFX.jpeg","isPro":true,"fullname":"Jeremy Howard","user":"jph00","type":"user"},"name":"Jeremy Howard","status":"claimed_verified","statusLastChangedAt":"2024-12-20T08:33:38.959Z","hidden":false},{"_id":"67638c8ed63e4b348e8a507d","user":{"_id":"62715572ab9243b5d40cbb1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62715572ab9243b5d40cbb1d/8chtMe-1epj7yVlzVNtrl.png","isPro":false,"fullname":"Iacopo Poli","user":"iacolippo","type":"user"},"name":"Iacopo Poli","status":"admin_assigned","statusLastChangedAt":"2024-12-20T09:35:51.723Z","hidden":false}],"publishedAt":"2024-12-18T09:39:44.000Z","submittedOnDailyAt":"2024-12-19T16:49:25.523Z","title":"Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for\n Fast, Memory Efficient, and Long Context Finetuning and Inference","submittedOnDailyBy":{"_id":"623e19ff8972a8c030af5277","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/623e19ff8972a8c030af5277/jq9jfT_Sl5B1ZSMHGmBFX.jpeg","isPro":true,"fullname":"Jeremy Howard","user":"jph00","type":"user"},"summary":"Encoder-only transformer models such as BERT offer a great performance-size\ntradeoff for retrieval and classification tasks with respect to larger\ndecoder-only models. Despite being the workhorse of numerous production\npipelines, there have been limited Pareto improvements to BERT since its\nrelease. In this paper, we introduce ModernBERT, bringing modern model\noptimizations to encoder-only models and representing a major Pareto\nimprovement over older encoders. Trained on 2 trillion tokens with a native\n8192 sequence length, ModernBERT models exhibit state-of-the-art results on a\nlarge pool of evaluations encompassing diverse classification tasks and both\nsingle and multi-vector retrieval on different domains (including code). In\naddition to strong downstream performance, ModernBERT is also the most speed\nand memory efficient encoder and is designed for inference on common GPUs.","upvotes":160,"discussionId":"67638c8ed63e4b348e8a50ae","ai_summary":"ModernBERT, an optimized encoder-only transformer model, demonstrates superior performance and efficiency in diverse classification and retrieval tasks compared to previous models.","ai_keywords":["encoder-only transformer models","BERT","ModernBERT","optimization","sequence length","state-of-the-art results","classification tasks","retrieval","speed","memory efficiency"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6362d9712691058b19de1ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6362d9712691058b19de1ba4/Hdqj5aGrFJJbF7oUSzoIh.jpeg","isPro":true,"fullname":"Orion Weller","user":"orionweller","type":"user"},{"_id":"5ff60d4352c26e9bc240badd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5ff60d4352c26e9bc240badd/HzoknJibrSasc1ZzU71XA.png","isPro":false,"fullname":"Benjamin ClaviΓ©","user":"bclavie","type":"user"},{"_id":"60f2fc91b92afccb7c34b8ed","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60f2fc91b92afccb7c34b8ed/W2-Nay12Ef4Ltyaf8EKE9.jpeg","isPro":true,"fullname":"Gabriel MartΓ­n BlΓ‘zquez","user":"gabrielmbmb","type":"user"},{"_id":"609bbe2f4932693ca2009d6a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1620819560688-609bbe2f4932693ca2009d6a.jpeg","isPro":false,"fullname":"Antoine Chaffin","user":"NohTow","type":"user"},{"_id":"6317233cc92fd6fee317e030","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png","isPro":false,"fullname":"Tom Aarsen","user":"tomaarsen","type":"user"},{"_id":"64d3b322778b6c5cfd7090dc","avatarUrl":"/avatars/4a3873d2569d5f52bbfdbd69ef58738a.svg","isPro":false,"fullname":"Joshua Chak","user":"JoshuaChak","type":"user"},{"_id":"603d25b75f9d390ab190b777","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1617264212503-603d25b75f9d390ab190b777.jpeg","isPro":true,"fullname":"Pedro Cuenca","user":"pcuenq","type":"user"},{"_id":"6264f9655f6f2e14d6ac981c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1650784534234-noauth.png","isPro":false,"fullname":"Tony Wu","user":"tonywu71","type":"user"},{"_id":"61b839889f7cfeae618e72c9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61b839889f7cfeae618e72c9/5kFRCChdqwv7MGM8T_y5v.jpeg","isPro":false,"fullname":"Raja Biswas","user":"rbiswasfc","type":"user"},{"_id":"6123bba50ed258ebc83f3d5b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6123bba50ed258ebc83f3d5b/d0tEbm3NpKFxA4CF814pn.png","isPro":false,"fullname":"Andrew D'addario","user":"Larxel","type":"user"},{"_id":"661e8e57ebe3616a1b084101","avatarUrl":"/avatars/b72ed568a97b147b54339a5c26185f71.svg","isPro":false,"fullname":"Travis King","user":"travisking","type":"user"},{"_id":"631b3f20f6bc4be4a64d8f28","avatarUrl":"/avatars/7046f9354ae4380d39d9f5d7e80ce07d.svg","isPro":true,"fullname":"Mike Holcomb","user":"mike-holcomb","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">
Papers
arxiv:2412.13663

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Published on Dec 18, 2024
Β· Submitted by
Jeremy Howard
on Dec 19, 2024
#1 Paper of the day

Abstract

ModernBERT, an optimized encoder-only transformer model, demonstrates superior performance and efficiency in diverse classification and retrieval tasks compared to previous models.

AI-generated summary

Encoder-only transformer models such as BERT offer a great performance-size tradeoff for retrieval and classification tasks with respect to larger decoder-only models. Despite being the workhorse of numerous production pipelines, there have been limited Pareto improvements to BERT since its release. In this paper, we introduce ModernBERT, bringing modern model optimizations to encoder-only models and representing a major Pareto improvement over older encoders. Trained on 2 trillion tokens with a native 8192 sequence length, ModernBERT models exhibit state-of-the-art results on a large pool of evaluations encompassing diverse classification tasks and both single and multi-vector retrieval on different domains (including code). In addition to strong downstream performance, ModernBERT is also the most speed and memory efficient encoder and is designed for inference on common GPUs.

Community

Paper author Paper submitter

We're very excited about the release of ModernBERT -- it feels like it could be the basis of all kinds of interesting new startups and research projects.

In fact, the stuff mentioned in the paper and blog post is only the tip of the iceberg. There's a lot of opportunities to fine tune the model in all kinds ways, which I expect will go far beyond what we've managed to achieve in our limited exploration so far.

We remove the Next-Sentence Prediction objective which introduces noticeable overhead for no performance improvement

But this is only half of the truth and mainly copied from the RoBERTa paper.

The other half: ALBERT paper (see Table 5) shows improvement (NSP over None) - not on SQuAD datasets, but on average. Additionally, their approach of introducing a sentence order prediction loss boosts performance on various downstream tasks.

I would be interested in the number of hardware that is involved in pretraining the base and large models including pretraining time :)

Β·
Paper author

Hello,

Everything is included in the Table 3 of the paper (Appendix A)
image.png
Hope it helps!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Great work, especially for most industry tasks

Thanks for this very welcomed modernisation of 'good old' BERT architecture ;)
However, a big part of the appeal of recent LLM/decoder-only models for a lot of us is their multilingual capability. Would love to see a variant pretrained on more natural languages (instead of code to keep the same training budget, and as the two would be complementary i.e. used for different downstream applications). :)

It would be very interesting to see a training loss curve! Does a 150/300M model really need almost 2T tokens?

Thanks :)

What is the performance of ModernBERT on NER tasks, for example on CoNLL-2003 dataset? Happy to see a modernized BERT after so long :)

Β·
Paper author

We haven't evaluated on NER, I believe, but @stefan-it may have actually ran some tests on CoNLL03.

Sign up or log in to comment

Models citing this paper 73

Browse 73 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.13663 in a dataset README.md to link it from this page.

Spaces citing this paper 104

Collections including this paper 46