Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Bridging Language Barriers in Healthcare: A Study on Arabic LLMs
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-01-21T01:33:52.432Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7159744501113892},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2501.09825","authors":[{"_id":"678e3e0c8aeb001443af5cb1","user":{"_id":"66bb35988b09ede0b7b92313","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bb35988b09ede0b7b92313/M06mQ3ifyRwuladTNwMS2.png","isPro":false,"fullname":"Nada Saadi","user":"Nadas31","type":"user"},"name":"Nada Saadi","status":"admin_assigned","statusLastChangedAt":"2025-01-20T14:12:06.609Z","hidden":false},{"_id":"678e3e0c8aeb001443af5cb2","user":{"_id":"5f5f6c113c67af20d9945afb","avatarUrl":"/avatars/06b2eb3a5d27864280d4d02e6d00d782.svg","isPro":false,"fullname":"Tathagata Raha","user":"tathagataraha","type":"user"},"name":"Tathagata Raha","status":"admin_assigned","statusLastChangedAt":"2025-01-20T14:12:13.246Z","hidden":false},{"_id":"678e3e0c8aeb001443af5cb3","user":{"_id":"628e39f4b1596566033b8d7b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/628e39f4b1596566033b8d7b/-Y807up1cgMmAQsczdOPn.jpeg","isPro":false,"fullname":"Clément Christophe","user":"cchristophe","type":"user"},"name":"Clément Christophe","status":"admin_assigned","statusLastChangedAt":"2025-01-20T14:12:19.413Z","hidden":false},{"_id":"678e3e0c8aeb001443af5cb4","name":"Marco AF Pimentel","hidden":false},{"_id":"678e3e0c8aeb001443af5cb5","user":{"_id":"65281d6ef61ca80b9c2ee707","avatarUrl":"/avatars/090ea7210a4bb6549b0f7fee71525625.svg","isPro":false,"fullname":"Ronnie Rajan","user":"ronnierajan","type":"user"},"name":"Ronnie Rajan","status":"admin_assigned","statusLastChangedAt":"2025-01-20T14:12:33.526Z","hidden":false},{"_id":"678e3e0c8aeb001443af5cb6","user":{"_id":"65280984b794fe3d06544d77","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65280984b794fe3d06544d77/tyrxbxtDG02On1uiRaVbL.jpeg","isPro":false,"fullname":"Praveenkumar","user":"pkanithi","type":"user"},"name":"Praveen K Kanithi","status":"claimed_verified","statusLastChangedAt":"2025-01-20T14:07:22.890Z","hidden":false}],"publishedAt":"2025-01-16T20:24:56.000Z","submittedOnDailyAt":"2025-01-20T09:44:47.264Z","title":"Bridging Language Barriers in Healthcare: A Study on Arabic LLMs","submittedOnDailyBy":{"_id":"628e39f4b1596566033b8d7b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/628e39f4b1596566033b8d7b/-Y807up1cgMmAQsczdOPn.jpeg","isPro":false,"fullname":"Clément Christophe","user":"cchristophe","type":"user"},"summary":"This paper investigates the challenges of developing large language models\n(LLMs) proficient in both multilingual understanding and medical knowledge. We\ndemonstrate that simply translating medical data does not guarantee strong\nperformance on clinical tasks in the target language. Our experiments reveal\nthat the optimal language mix in training data varies significantly across\ndifferent medical tasks. We find that larger models with carefully calibrated\nlanguage ratios achieve superior performance on native-language clinical tasks.\nFurthermore, our results suggest that relying solely on fine-tuning may not be\nthe most effective approach for incorporating new language knowledge into LLMs.\nInstead, data and computationally intensive pretraining methods may still be\nnecessary to achieve optimal performance in multilingual medical settings.\nThese findings provide valuable guidance for building effective and inclusive\nmedical AI systems for diverse linguistic communities.","upvotes":14,"discussionId":"678e3e0d8aeb001443af5cf4","ai_summary":"Training large language models with carefully calibrated language ratios in multilingual medical data improves performance on clinical tasks more effectively than fine-tuning alone.","ai_keywords":["large language models","multilingual understanding","medical knowledge","clinical tasks","language mix","language ratios","pretraining methods","fine-tuning","medical AI systems"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"628e39f4b1596566033b8d7b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/628e39f4b1596566033b8d7b/-Y807up1cgMmAQsczdOPn.jpeg","isPro":false,"fullname":"Clément Christophe","user":"cchristophe","type":"user"},{"_id":"65281d6ef61ca80b9c2ee707","avatarUrl":"/avatars/090ea7210a4bb6549b0f7fee71525625.svg","isPro":false,"fullname":"Ronnie Rajan","user":"ronnierajan","type":"user"},{"_id":"6506cfafd55dd4e15caeea09","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/uTkC6G8Cj51av4i7kAaI8.png","isPro":false,"fullname":"Svetlana Maslenkova","user":"maslenkovas","type":"user"},{"_id":"66bb35988b09ede0b7b92313","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bb35988b09ede0b7b92313/M06mQ3ifyRwuladTNwMS2.png","isPro":false,"fullname":"Nada Saadi","user":"Nadas31","type":"user"},{"_id":"5f5f6c113c67af20d9945afb","avatarUrl":"/avatars/06b2eb3a5d27864280d4d02e6d00d782.svg","isPro":false,"fullname":"Tathagata Raha","user":"tathagataraha","type":"user"},{"_id":"65280984b794fe3d06544d77","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65280984b794fe3d06544d77/tyrxbxtDG02On1uiRaVbL.jpeg","isPro":false,"fullname":"Praveenkumar","user":"pkanithi","type":"user"},{"_id":"6767b9bdfe8020a5347fbe95","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/h_DNzkFz9ZqM2SMYk4yuT.png","isPro":false,"fullname":"Raneem Mohammed","user":"RaneemM55","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"647437a26a972f252de6b0ce","avatarUrl":"/avatars/02e6bed173eee14a18e30e0d247b8aa1.svg","isPro":false,"fullname":"Nasir Hayat","user":"nasirhayat","type":"user"},{"_id":"6418354aedc5a69a66963935","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6418354aedc5a69a66963935/AUZRKKvqPaDJUJrQgeA9L.jpeg","isPro":false,"fullname":"Pavan Kumar Balijepalli","user":"pavankumarbalijepalli","type":"user"},{"_id":"663ccbff3a74a20189d4aa2e","avatarUrl":"/avatars/83a54455e0157480f65c498cd9057cf2.svg","isPro":false,"fullname":"Nguyen Van Thanh","user":"NguyenVanThanhHust","type":"user"},{"_id":"6776340dd3ceb4493fda0c6e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6776340dd3ceb4493fda0c6e/JzUAaFFPICKhZLgJR3pgP.png","isPro":false,"fullname":"Ruben Roy","user":"rubenroy","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Training large language models with carefully calibrated language ratios in multilingual medical data improves performance on clinical tasks more effectively than fine-tuning alone.
AI-generated summary
This paper investigates the challenges of developing large language models
(LLMs) proficient in both multilingual understanding and medical knowledge. We
demonstrate that simply translating medical data does not guarantee strong
performance on clinical tasks in the target language. Our experiments reveal
that the optimal language mix in training data varies significantly across
different medical tasks. We find that larger models with carefully calibrated
language ratios achieve superior performance on native-language clinical tasks.
Furthermore, our results suggest that relying solely on fine-tuning may not be
the most effective approach for incorporating new language knowledge into LLMs.
Instead, data and computationally intensive pretraining methods may still be
necessary to achieve optimal performance in multilingual medical settings.
These findings provide valuable guidance for building effective and inclusive
medical AI systems for diverse linguistic communities.