Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Text Rendering Strategies for Pixel Language Models
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n","updatedAt":"2023-11-03T16:43:49.729Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7122935056686401},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2311.00522","authors":[{"_id":"654307824a68d5aeba5b67cc","user":{"_id":"627e4a0b1d60305a5c8fc9fd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/627e4a0b1d60305a5c8fc9fd/zbkmD2k6bILHjWxgXnsR7.png","isPro":false,"fullname":"Jonas F. Lotz","user":"jflotz","type":"user"},"name":"Jonas F. Lotz","status":"admin_assigned","statusLastChangedAt":"2023-11-02T13:54:56.838Z","hidden":false},{"_id":"654307824a68d5aeba5b67cd","user":{"_id":"60521a52d8c0f8f1ffaaf2cc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659018192837-60521a52d8c0f8f1ffaaf2cc.jpeg","isPro":false,"fullname":"Elizabeth Salesky","user":"esalesky","type":"user"},"name":"Elizabeth Salesky","status":"admin_assigned","statusLastChangedAt":"2023-11-02T13:55:05.386Z","hidden":false},{"_id":"654307824a68d5aeba5b67ce","user":{"_id":"6079adbad2cd8c150e6ae05c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648912870656-6079adbad2cd8c150e6ae05c.jpeg","isPro":false,"fullname":"Phillip Rust","user":"plip","type":"user"},"name":"Phillip Rust","status":"admin_assigned","statusLastChangedAt":"2023-11-02T13:55:13.328Z","hidden":false},{"_id":"654307824a68d5aeba5b67cf","user":{"_id":"6285f66a7cc3b7bc1b8e7b8e","avatarUrl":"/avatars/984ae22db7cc885591bc0b5bceffdfbd.svg","isPro":false,"fullname":"Desmond Elliott","user":"elliottd","type":"user"},"name":"Desmond Elliott","status":"admin_assigned","statusLastChangedAt":"2023-11-02T13:55:19.519Z","hidden":false}],"publishedAt":"2023-11-01T13:49:31.000Z","submittedOnDailyAt":"2023-11-02T00:50:51.523Z","title":"Text Rendering Strategies for Pixel Language Models","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Pixel-based language models process text rendered as images, which allows\nthem to handle any script, making them a promising approach to open vocabulary\nlanguage modelling. However, recent approaches use text renderers that produce\na large set of almost-equivalent input patches, which may prove sub-optimal for\ndownstream tasks, due to redundancy in the input representations. In this\npaper, we investigate four approaches to rendering text in the PIXEL model\n(Rust et al., 2023), and find that simple character bigram rendering brings\nimproved performance on sentence-level tasks without compromising performance\non token-level or multilingual tasks. This new rendering strategy also makes it\npossible to train a more compact model with only 22M parameters that performs\non par with the original 86M parameter model. Our analyses show that character\nbigram rendering leads to a consistently better model but with an anisotropic\npatch embedding space, driven by a patch frequency bias, highlighting the\nconnections between image patch- and tokenization-based language models.","upvotes":11,"discussionId":"654307834a68d5aeba5b67f3","ai_summary":"Character bigram rendering improves performance in pixel-based language models, enabling more compact models while maintaining equivalent performance across sentence, token, and multilingual tasks.","ai_keywords":["pixel-based language models","text renderers","input patches","character bigram rendering","sentence-level tasks","token-level tasks","multilingual tasks","patch embedding space","tokenization-based language models"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"5e6a3d4ea9afd5125d9ec064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1584020801691-noauth.jpeg","isPro":true,"fullname":"Stefan Schweter","user":"stefan-it","type":"user"},{"_id":"6079adbad2cd8c150e6ae05c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648912870656-6079adbad2cd8c150e6ae05c.jpeg","isPro":false,"fullname":"Phillip Rust","user":"plip","type":"user"},{"_id":"61e7c06064d3c6c929057bee","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e7c06064d3c6c929057bee/QxULx1EA1bgmjXxupQX4B.jpeg","isPro":false,"fullname":"蓋瑞王","user":"gary109","type":"user"},{"_id":"62722543ae11f8461cb599cc","avatarUrl":"/avatars/a1001b2c1041835b2a6ffb567918c00d.svg","isPro":false,"fullname":"terryoo","user":"Yoonsik","type":"user"},{"_id":"63ef22b2bfe4ead22ca9e1e4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676616348535-noauth.jpeg","isPro":false,"fullname":"Phú Võ","user":"phuvo","type":"user"},{"_id":"635e6b5d398ff343c4f4ed11","avatarUrl":"/avatars/3b7809b12a8980cf948be51831c05317.svg","isPro":false,"fullname":"yijicheng","user":"yiji","type":"user"},{"_id":"63eed5ff6c2c7c702bf89782","avatarUrl":"/avatars/50070b95473ad4819a1d2d7a2a7e4b7e.svg","isPro":false,"fullname":"afrideva","user":"afrideva","type":"user"},{"_id":"5df85abada6d0311fd3d5408","avatarUrl":"/avatars/2331cf703c1b5d3a62e2050b1a6eb108.svg","isPro":false,"fullname":"Li Dong","user":"unilm","type":"user"},{"_id":"627e4a0b1d60305a5c8fc9fd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/627e4a0b1d60305a5c8fc9fd/zbkmD2k6bILHjWxgXnsR7.png","isPro":false,"fullname":"Jonas F. Lotz","user":"jflotz","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"663ccbff3a74a20189d4aa2e","avatarUrl":"/avatars/83a54455e0157480f65c498cd9057cf2.svg","isPro":false,"fullname":"Nguyen Van Thanh","user":"NguyenVanThanhHust","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2311.00522

Text Rendering Strategies for Pixel Language Models

Published on Nov 1, 2023
· Submitted by
AK
on Nov 2, 2023

Abstract

Character bigram rendering improves performance in pixel-based language models, enabling more compact models while maintaining equivalent performance across sentence, token, and multilingual tasks.

AI-generated summary

Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling. However, recent approaches use text renderers that produce a large set of almost-equivalent input patches, which may prove sub-optimal for downstream tasks, due to redundancy in the input representations. In this paper, we investigate four approaches to rendering text in the PIXEL model (Rust et al., 2023), and find that simple character bigram rendering brings improved performance on sentence-level tasks without compromising performance on token-level or multilingual tasks. This new rendering strategy also makes it possible to train a more compact model with only 22M parameters that performs on par with the original 86M parameter model. Our analyses show that character bigram rendering leads to a consistently better model but with an anisotropic patch embedding space, driven by a patch frequency bias, highlighting the connections between image patch- and tokenization-based language models.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2311.00522 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2311.00522 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2311.00522 in a Space README.md to link it from this page.

Collections including this paper 2