Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and
Document Understanding
\n","updatedAt":"2025-02-24T10:43:47.775Z","author":{"_id":"656864e12d73834278a8dea7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656864e12d73834278a8dea7/sfAWS2eyPtFHb_2GZIypp.jpeg","fullname":"Ahmed Heakl","name":"ahmedheakl","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":61,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8175894021987915},"editors":["ahmedheakl"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/656864e12d73834278a8dea7/sfAWS2eyPtFHb_2GZIypp.jpeg"],"reactions":[],"isReport":false}},{"id":"67bd1eb835355c74f90fb602","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-02-25T01:36:56.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments](https://huggingface.co/papers/2502.06445) (2025)\n* [Ocean-OCR: Towards General OCR Application via a Vision-Language Model](https://huggingface.co/papers/2501.15558) (2025)\n* [Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study](https://huggingface.co/papers/2412.20613) (2024)\n* [OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning](https://huggingface.co/papers/2501.00321) (2024)\n* [Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway](https://huggingface.co/papers/2501.07300) (2025)\n* [OCR Error Post-Correction with LLMs in Historical Documents: No Free Lunches](https://huggingface.co/papers/2502.01205) (2025)\n* [AIN: The Arabic INclusive Large Multimodal Model](https://huggingface.co/papers/2502.00094) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-02-25T01:36:56.489Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7369399666786194},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2502.14949","authors":[{"_id":"67bc4ced7727595ca5b108f1","user":{"_id":"656864e12d73834278a8dea7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656864e12d73834278a8dea7/sfAWS2eyPtFHb_2GZIypp.jpeg","isPro":true,"fullname":"Ahmed Heakl","user":"ahmedheakl","type":"user"},"name":"Ahmed Heakl","status":"claimed_verified","statusLastChangedAt":"2025-02-24T10:58:23.973Z","hidden":false},{"_id":"67bc4ced7727595ca5b108f2","user":{"_id":"672e4574b60c3a27d783a1ac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/aut4W4hJcOT8jvQnlWs-y.png","isPro":false,"fullname":"Muhammad Abdullah","user":"mabdullahsohail","type":"user"},"name":"Abdullah Sohail","status":"claimed_verified","statusLastChangedAt":"2025-02-24T13:05:04.733Z","hidden":false},{"_id":"67bc4ced7727595ca5b108f3","user":{"_id":"65262a396b41932089fd7bae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65262a396b41932089fd7bae/6YIEoAfJojuTW1UOKlwZT.png","isPro":false,"fullname":"Mukul Ranjan","user":"mukul54","type":"user"},"name":"Mukul Ranjan","status":"claimed_verified","statusLastChangedAt":"2025-02-24T11:54:56.690Z","hidden":false},{"_id":"67bc4ced7727595ca5b108f4","name":"Rania Hossam","hidden":false},{"_id":"67bc4ced7727595ca5b108f5","name":"Ghazi Ahmed","hidden":false},{"_id":"67bc4ced7727595ca5b108f6","user":{"_id":"5e796e1230dc073f817a2b92","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e796e1230dc073f817a2b92/3i3MVu2XezcCpEOOB-OQH.jpeg","isPro":true,"fullname":"Mohamed El-Geish","user":"elgeish","type":"user"},"name":"Mohamed El-Geish","status":"claimed_verified","statusLastChangedAt":"2025-04-22T10:16:22.469Z","hidden":false},{"_id":"67bc4ced7727595ca5b108f7","name":"Omar Maher","hidden":false},{"_id":"67bc4ced7727595ca5b108f8","name":"Zhiqiang Shen","hidden":false},{"_id":"67bc4ced7727595ca5b108f9","name":"Fahad Khan","hidden":false},{"_id":"67bc4ced7727595ca5b108fa","name":"Salman Khan","hidden":false}],"publishedAt":"2025-02-20T18:41:23.000Z","submittedOnDailyAt":"2025-02-24T08:13:47.767Z","title":"KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and\n Document Understanding","submittedOnDailyBy":{"_id":"656864e12d73834278a8dea7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656864e12d73834278a8dea7/sfAWS2eyPtFHb_2GZIypp.jpeg","isPro":true,"fullname":"Ahmed Heakl","user":"ahmedheakl","type":"user"},"summary":"With the growing adoption of Retrieval-Augmented Generation (RAG) in document\nprocessing, robust text recognition has become increasingly critical for\nknowledge extraction. While OCR (Optical Character Recognition) for English and\nother languages benefits from large datasets and well-established benchmarks,\nArabic OCR faces unique challenges due to its cursive script, right-to-left\ntext flow, and complex typographic and calligraphic features. We present\nKITAB-Bench, a comprehensive Arabic OCR benchmark that fills the gaps in\ncurrent evaluation systems. Our benchmark comprises 8,809 samples across 9\nmajor domains and 36 sub-domains, encompassing diverse document types including\nhandwritten text, structured tables, and specialized coverage of 21 chart types\nfor business intelligence. Our findings show that modern vision-language models\n(such as GPT-4, Gemini, and Qwen) outperform traditional OCR approaches (like\nEasyOCR, PaddleOCR, and Surya) by an average of 60% in Character Error Rate\n(CER). Furthermore, we highlight significant limitations of current Arabic OCR\nmodels, particularly in PDF-to-Markdown conversion, where the best model\nGemini-2.0-Flash achieves only 65% accuracy. This underscores the challenges in\naccurately recognizing Arabic text, including issues with complex fonts,\nnumeral recognition errors, word elongation, and table structure detection.\nThis work establishes a rigorous evaluation framework that can drive\nimprovements in Arabic document analysis methods and bridge the performance gap\nwith English OCR technologies.","upvotes":9,"discussionId":"67bc4cee7727595ca5b10967","projectPage":"https://mbzuai-oryx.github.io/KITAB-Bench/","githubRepo":"https://github.com/mbzuai-oryx/KITAB-Bench","githubRepoAddedBy":"user","ai_summary":"KITAB-Bench, a comprehensive Arabic OCR benchmark, highlights the strengths and limitations of modern vision-language models in recognizing Arabic text, especially in complex scenarios like PDF-to-Markdown conversion.","ai_keywords":["Retrieval-Augmented Generation","RAG","text recognition","KITAB-Bench","OCR","Arabic OCR","cursive script","right-to-left text","character error rate","CER","GPT-4","Gemini","Qwen","EasyOCR","PaddleOCR","Surya","vision-language models","PDF-to-Markdown","complex fonts","numeral recognition","word elongation","table structure detection"],"githubStars":63,"organization":{"_id":"61fb9e24dc607a42af5f193f","name":"MBZUAI","fullname":"Mohamed Bin Zayed University of Artificial Intelligence","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1643879908583-603ab5664a944b99e81476e8.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"656864e12d73834278a8dea7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656864e12d73834278a8dea7/sfAWS2eyPtFHb_2GZIypp.jpeg","isPro":true,"fullname":"Ahmed Heakl","user":"ahmedheakl","type":"user"},{"_id":"65262a396b41932089fd7bae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65262a396b41932089fd7bae/6YIEoAfJojuTW1UOKlwZT.png","isPro":false,"fullname":"Mukul Ranjan","user":"mukul54","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"672e4574b60c3a27d783a1ac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/aut4W4hJcOT8jvQnlWs-y.png","isPro":false,"fullname":"Muhammad Abdullah","user":"mabdullahsohail","type":"user"},{"_id":"62676a94dacab364889bb36c","avatarUrl":"/avatars/0ead41b44957eb30564ea685ed22781a.svg","isPro":false,"fullname":"SARIM HASHMI","user":"Sarim-Hash","type":"user"},{"_id":"65decc75beffeb39ba679eba","avatarUrl":"/avatars/735b678bd5863a0c1b1bdd3bbf8858fa.svg","isPro":true,"fullname":"r","user":"oceansweep","type":"user"},{"_id":"6359e1a37523ad34680577a8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6359e1a37523ad34680577a8/cZXMNXVot2dqCRKLwdrbO.jpeg","isPro":false,"fullname":"Youssef Hosni","user":"YoussefHosni","type":"user"},{"_id":"5e796e1230dc073f817a2b92","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e796e1230dc073f817a2b92/3i3MVu2XezcCpEOOB-OQH.jpeg","isPro":true,"fullname":"Mohamed El-Geish","user":"elgeish","type":"user"},{"_id":"6194cd85f7226159c230ce1d","avatarUrl":"/avatars/1ec3f909ce8d3223d76506d0bd9ade9d.svg","isPro":false,"fullname":"Loay Amin","user":"loay","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"61fb9e24dc607a42af5f193f","name":"MBZUAI","fullname":"Mohamed Bin Zayed University of Artificial Intelligence","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1643879908583-603ab5664a944b99e81476e8.jpeg"}}">
KITAB-Bench, a comprehensive Arabic OCR benchmark, highlights the strengths and limitations of modern vision-language models in recognizing Arabic text, especially in complex scenarios like PDF-to-Markdown conversion.
AI-generated summary
With the growing adoption of Retrieval-Augmented Generation (RAG) in document
processing, robust text recognition has become increasingly critical for
knowledge extraction. While OCR (Optical Character Recognition) for English and
other languages benefits from large datasets and well-established benchmarks,
Arabic OCR faces unique challenges due to its cursive script, right-to-left
text flow, and complex typographic and calligraphic features. We present
KITAB-Bench, a comprehensive Arabic OCR benchmark that fills the gaps in
current evaluation systems. Our benchmark comprises 8,809 samples across 9
major domains and 36 sub-domains, encompassing diverse document types including
handwritten text, structured tables, and specialized coverage of 21 chart types
for business intelligence. Our findings show that modern vision-language models
(such as GPT-4, Gemini, and Qwen) outperform traditional OCR approaches (like
EasyOCR, PaddleOCR, and Surya) by an average of 60% in Character Error Rate
(CER). Furthermore, we highlight significant limitations of current Arabic OCR
models, particularly in PDF-to-Markdown conversion, where the best model
Gemini-2.0-Flash achieves only 65% accuracy. This underscores the challenges in
accurately recognizing Arabic text, including issues with complex fonts,
numeral recognition errors, word elongation, and table structure detection.
This work establishes a rigorous evaluation framework that can drive
improvements in Arabic document analysis methods and bridge the performance gap
with English OCR technologies.
With the growing adoption of Retrieval-Augmented Generation (RAG) in document processing, robust text recognition has become increasingly critical for knowledge extraction. While OCR (Optical Character Recognition) for English and other languages benefits from large datasets and well-established benchmarks, Arabic OCR faces unique challenges due to its cursive script, right-to-left text flow, and complex typographic and calligraphic features. We present KITAB-Bench, a comprehensive Arabic OCR benchmark that fills the gaps in current evaluation systems. Our benchmark comprises 8,809 samples across 9 major domains and 36 sub-domains, encompassing diverse document types including handwritten text, structured tables, and specialized coverage of 21 chart types for business intelligence. Our findings show that modern vision-language models (such as GPT-4, Gemini, and Qwen) outperform traditional OCR approaches (like EasyOCR, PaddleOCR, and Surya) by an average of 60% in Character Error Rate (CER). Furthermore, we highlight significant limitations of current Arabic OCR models, particularly in PDF-to-Markdown conversion, where the best model Gemini-2.0-Flash achieves only 65% accuracy. This underscores the challenges in accurately recognizing Arabic text, including issues with complex fonts, numeral recognition errors, word elongation, and table structure detection. This work establishes a rigorous evaluation framework that can drive improvements in Arabic document analysis methods and bridge the performance gap with English OCR technologies.