Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in
Expert-Domain Information Retrieval
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-03-08T01:32:12.325Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7056019902229309},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["YedsonUQ","siyue"],"count":2}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.04644","authors":[{"_id":"67ca5d2783ac16a063a56241","user":{"_id":"64dc29d9b5d625e0e9a6ecb9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/QxGBsnk1cNsBEPqSx4ae-.jpeg","isPro":false,"fullname":"Tingyu Song","user":"songtingyu","type":"user"},"name":"Tingyu Song","status":"admin_assigned","statusLastChangedAt":"2025-03-07T09:53:24.813Z","hidden":false},{"_id":"67ca5d2783ac16a063a56242","user":{"_id":"65dfeee3d16fb170031df293","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65dfeee3d16fb170031df293/DZefH96Ol6joeAaGVBe3j.jpeg","isPro":false,"fullname":"gan","user":"guo9","type":"user"},"name":"Guo Gan","status":"claimed_verified","statusLastChangedAt":"2025-03-07T09:09:52.646Z","hidden":false},{"_id":"67ca5d2783ac16a063a56243","name":"Mingsheng Shang","hidden":false},{"_id":"67ca5d2783ac16a063a56244","user":{"_id":"62f662bcc58915315c4eccea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62f662bcc58915315c4eccea/zOAQLONfMP88zr70sxHK-.jpeg","isPro":true,"fullname":"Yilun Zhao","user":"yilunzhao","type":"user"},"name":"Yilun Zhao","status":"admin_assigned","statusLastChangedAt":"2025-03-07T09:52:59.499Z","hidden":false}],"publishedAt":"2025-03-06T17:32:22.000Z","submittedOnDailyAt":"2025-03-07T07:07:52.576Z","title":"IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in\n Expert-Domain Information Retrieval","submittedOnDailyBy":{"_id":"65dfeee3d16fb170031df293","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65dfeee3d16fb170031df293/DZefH96Ol6joeAaGVBe3j.jpeg","isPro":false,"fullname":"gan","user":"guo9","type":"user"},"summary":"We introduce IFIR, the first comprehensive benchmark designed to evaluate\ninstruction-following information retrieval (IR) in expert domains. IFIR\nincludes 2,426 high-quality examples and covers eight subsets across four\nspecialized domains: finance, law, healthcare, and science literature. Each\nsubset addresses one or more domain-specific retrieval tasks, replicating\nreal-world scenarios where customized instructions are critical. IFIR enables a\ndetailed analysis of instruction-following retrieval capabilities by\nincorporating instructions at different levels of complexity. We also propose a\nnovel LLM-based evaluation method to provide a more precise and reliable\nassessment of model performance in following instructions. Through extensive\nexperiments on 15 frontier retrieval models, including those based on LLMs, our\nresults reveal that current models face significant challenges in effectively\nfollowing complex, domain-specific instructions. We further provide in-depth\nanalyses to highlight these limitations, offering valuable insights to guide\nfuture advancements in retriever development.","upvotes":21,"discussionId":"67ca5d2983ac16a063a562a1","githubRepo":"https://github.com/SighingSnow/IFIR","githubRepoAddedBy":"user","ai_summary":"A benchmark and evaluation method for instruction-following information retrieval in expert domains reveal challenges for current models in handling complex, domain-specific instructions.","ai_keywords":["instruction-following information retrieval","benchmark","LLM-based evaluation method","retriever development"],"githubStars":6},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64dc29d9b5d625e0e9a6ecb9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/QxGBsnk1cNsBEPqSx4ae-.jpeg","isPro":false,"fullname":"Tingyu Song","user":"songtingyu","type":"user"},{"_id":"66f11199467bd6d089a0c315","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/no8kuI8UT-3AM133ZLNb-.png","isPro":false,"fullname":"Hulu","user":"yugust-0912","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"665b133508d536a8ac804f7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Uwi0OnANdTbRbHHQvGqvR.png","isPro":false,"fullname":"Paulson","user":"Pnaomi","type":"user"},{"_id":"62f662bcc58915315c4eccea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62f662bcc58915315c4eccea/zOAQLONfMP88zr70sxHK-.jpeg","isPro":true,"fullname":"Yilun Zhao","user":"yilunzhao","type":"user"},{"_id":"615c4fc9ce94868478737e38","avatarUrl":"/avatars/66af6ea2113fe070cba6e3b48c153c3b.svg","isPro":false,"fullname":"Charlie","user":"Charliezyl","type":"user"},{"_id":"67917b0f2da0d4ed3f9128f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/s9Yi2cT7zxPdGoqpsV0Sg.png","isPro":false,"fullname":"John Schaefer","user":"johnschaefer","type":"user"},{"_id":"679182a0b8d9e6c42f92fdef","avatarUrl":"/avatars/d2d8c2e9555585269a68632bbcba3da1.svg","isPro":false,"fullname":"Hao Li","user":"Richardleee","type":"user"},{"_id":"679185119afe88fb031405e1","avatarUrl":"/avatars/aac8d1a818bfa9ee09cf982cf1d724b3.svg","isPro":false,"fullname":"Lily","user":"chenyingli","type":"user"},{"_id":"67cb2261042a34eac5a2645b","avatarUrl":"/avatars/81f16b8c5d0b507a1caa2ef01bd70fe7.svg","isPro":false,"fullname":"Vincent Jason","user":"jasonvicent","type":"user"},{"_id":"638f1803c67af472d317a922","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638f1803c67af472d317a922/9BMVXqHa-AsdZPmBprcbd.jpeg","isPro":false,"fullname":"siyue zhang","user":"siyue","type":"user"},{"_id":"650825e65aa2ad0324ff3f69","avatarUrl":"/avatars/ae503bcb0abf3906bc1991de415b99c8.svg","isPro":false,"fullname":"Ulyssesharriman","user":"harriman","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
A benchmark and evaluation method for instruction-following information retrieval in expert domains reveal challenges for current models in handling complex, domain-specific instructions.
AI-generated summary
We introduce IFIR, the first comprehensive benchmark designed to evaluate
instruction-following information retrieval (IR) in expert domains. IFIR
includes 2,426 high-quality examples and covers eight subsets across four
specialized domains: finance, law, healthcare, and science literature. Each
subset addresses one or more domain-specific retrieval tasks, replicating
real-world scenarios where customized instructions are critical. IFIR enables a
detailed analysis of instruction-following retrieval capabilities by
incorporating instructions at different levels of complexity. We also propose a
novel LLM-based evaluation method to provide a more precise and reliable
assessment of model performance in following instructions. Through extensive
experiments on 15 frontier retrieval models, including those based on LLMs, our
results reveal that current models face significant challenges in effectively
following complex, domain-specific instructions. We further provide in-depth
analyses to highlight these limitations, offering valuable insights to guide
future advancements in retriever development.