Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - DUQGen: Effective Unsupervised Domain Adaptation of Neural Rankers by
Diversifying Synthetic Query Generation
@librarian-bot\n\t recommend\n","updatedAt":"2024-05-20T20:51:22.719Z","author":{"_id":"646b8e6f31968a60a0201a12","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646b8e6f31968a60a0201a12/SU2Gs1NPuk1zoXHwFHl0U.jpeg","fullname":")))?!?(((","name":"stereoplegic","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3927,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7918877601623535},"editors":["stereoplegic"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/646b8e6f31968a60a0201a12/SU2Gs1NPuk1zoXHwFHl0U.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"664bb7d17e60cb973208bb47","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-05-20T20:51:29.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models](https://huggingface.co/papers/2404.04522) (2024)\n* [Better Synthetic Data by Retrieving and Transforming Existing Datasets](https://huggingface.co/papers/2404.14361) (2024)\n* [Prompting-based Synthetic Data Generation for Few-Shot Question Answering](https://huggingface.co/papers/2405.09335) (2024)\n* [Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data](https://huggingface.co/papers/2404.16637) (2024)\n* [GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning](https://huggingface.co/papers/2404.09163) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-05-20T20:51:29.376Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7317848801612854},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"664bb7ca5dfaf30f2d4f15c6"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2404.02489","authors":[{"_id":"66212a2c648ec8aaf5e39f3a","name":"Ramraj Chandradevan","hidden":false},{"_id":"66212a2c648ec8aaf5e39f3b","user":{"_id":"63a3fb0df91ad3ea5702fab9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a3fb0df91ad3ea5702fab9/quI-UhzdRvKM2bUtK7TDh.png","isPro":false,"fullname":"Kaustubh Dholé","user":"kdhole","type":"user"},"name":"Kaustubh D. Dhole","status":"claimed_verified","statusLastChangedAt":"2024-12-06T09:10:26.319Z","hidden":false},{"_id":"66212a2c648ec8aaf5e39f3c","name":"Eugene Agichtein","hidden":false}],"publishedAt":"2024-04-03T05:50:42.000Z","title":"DUQGen: Effective Unsupervised Domain Adaptation of Neural Rankers by\n Diversifying Synthetic Query Generation","summary":"State-of-the-art neural rankers pre-trained on large task-specific training\ndata such as MS-MARCO, have been shown to exhibit strong performance on various\nranking tasks without domain adaptation, also called zero-shot. However,\nzero-shot neural ranking may be sub-optimal, as it does not take advantage of\nthe target domain information. Unfortunately, acquiring sufficiently large and\nhigh quality target training data to improve a modern neural ranker can be\ncostly and time-consuming. To address this problem, we propose a new approach\nto unsupervised domain adaptation for ranking, DUQGen, which addresses a\ncritical gap in prior literature, namely how to automatically generate both\neffective and diverse synthetic training data to fine tune a modern neural\nranker for a new domain. Specifically, DUQGen produces a more effective\nrepresentation of the target domain by identifying clusters of similar\ndocuments; and generates a more diverse training dataset by probabilistic\nsampling over the resulting document clusters. Our extensive experiments, over\nthe standard BEIR collection, demonstrate that DUQGen consistently outperforms\nall zero-shot baselines and substantially outperforms the SOTA baselines on 16\nout of 18 datasets, for an average of 4% relative improvement across all\ndatasets. We complement our results with a thorough analysis for more in-depth\nunderstanding of the proposed method's performance and to identify promising\nareas for further improvements.","upvotes":1,"discussionId":"66212a2c648ec8aaf5e39f61","githubRepo":"https://github.com/emory-irlab/duqgen","githubRepoAddedBy":"auto","ai_summary":"DUQGen, an unsupervised domain adaptation approach, generates effective and diverse synthetic data to enhance neural rankers' performance in new domains, outperforming zero-shot and state-of-the-art baselines.","ai_keywords":["neural rankers","MS-MARCO","zero-shot","unsupervised domain adaptation","synthetic training data","effective representation","document clusters","probabilistic sampling","BEIR collection"],"githubStars":15},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"686db5d4af2b856fabbf13aa","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/6BjMv2LVNoqvbX8fQSTPI.png","isPro":false,"fullname":"V bbbb","user":"Bbbbbnnn","type":"user"}],"acceptLanguages":["*"]}">
DUQGen, an unsupervised domain adaptation approach, generates effective and diverse synthetic data to enhance neural rankers' performance in new domains, outperforming zero-shot and state-of-the-art baselines.
AI-generated summary
State-of-the-art neural rankers pre-trained on large task-specific training
data such as MS-MARCO, have been shown to exhibit strong performance on various
ranking tasks without domain adaptation, also called zero-shot. However,
zero-shot neural ranking may be sub-optimal, as it does not take advantage of
the target domain information. Unfortunately, acquiring sufficiently large and
high quality target training data to improve a modern neural ranker can be
costly and time-consuming. To address this problem, we propose a new approach
to unsupervised domain adaptation for ranking, DUQGen, which addresses a
critical gap in prior literature, namely how to automatically generate both
effective and diverse synthetic training data to fine tune a modern neural
ranker for a new domain. Specifically, DUQGen produces a more effective
representation of the target domain by identifying clusters of similar
documents; and generates a more diverse training dataset by probabilistic
sampling over the resulting document clusters. Our extensive experiments, over
the standard BEIR collection, demonstrate that DUQGen consistently outperforms
all zero-shot baselines and substantially outperforms the SOTA baselines on 16
out of 18 datasets, for an average of 4% relative improvement across all
datasets. We complement our results with a thorough analysis for more in-depth
understanding of the proposed method's performance and to identify promising
areas for further improvements.