Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Prompting-based Synthetic Data Generation for Few-Shot Question
Answering
@librarian-bot\n\t recommend\n","updatedAt":"2024-06-12T02:57:51.380Z","author":{"_id":"646b8e6f31968a60a0201a12","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646b8e6f31968a60a0201a12/SU2Gs1NPuk1zoXHwFHl0U.jpeg","fullname":")))?!?(((","name":"stereoplegic","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3927,"isUserFollowing":false}},"numEdits":0,"editors":["stereoplegic"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/646b8e6f31968a60a0201a12/SU2Gs1NPuk1zoXHwFHl0U.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"66690eb49167cb63039e51b5","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-06-12T02:57:56.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning](https://huggingface.co/papers/2404.09163) (2024)\n* [Enhancing Pre-Trained Generative Language Models with Question Attended Span Extraction on Machine Reading Comprehension](https://huggingface.co/papers/2404.17991) (2024)\n* [Asking and Answering Questions to Extract Event-Argument Structures](https://huggingface.co/papers/2404.16413) (2024)\n* [Better Synthetic Data by Retrieving and Transforming Existing Datasets](https://huggingface.co/papers/2404.14361) (2024)\n* [Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation](https://huggingface.co/papers/2406.03703) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-06-12T02:57:56.308Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"66690eaf8efc279ea9be0cfc"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2405.09335","authors":[{"_id":"664bce1ea8f6e1c876fc970b","name":"Maximilian Schmidt","hidden":false},{"_id":"664bce1ea8f6e1c876fc970c","name":"Andrea Bartezzaghi","hidden":false},{"_id":"664bce1ea8f6e1c876fc970d","name":"Ngoc Thang Vu","hidden":false}],"publishedAt":"2024-05-15T13:36:43.000Z","title":"Prompting-based Synthetic Data Generation for Few-Shot Question\n Answering","summary":"Although language models (LMs) have boosted the performance of Question\nAnswering, they still need plenty of data. Data annotation, in contrast, is a\ntime-consuming process. This especially applies to Question Answering, where\npossibly large documents have to be parsed and annotated with questions and\ntheir corresponding answers. Furthermore, Question Answering models often only\nwork well for the domain they were trained on. Since annotation is costly, we\nargue that domain-agnostic knowledge from LMs, such as linguistic\nunderstanding, is sufficient to create a well-curated dataset. With this\nmotivation, we show that using large language models can improve Question\nAnswering performance on various datasets in the few-shot setting compared to\nstate-of-the-art approaches. For this, we perform data generation leveraging\nthe Prompting framework, suggesting that language models contain valuable\ntask-agnostic knowledge that can be used beyond the common\npre-training/fine-tuning scheme. As a result, we consistently outperform\nprevious approaches on few-shot Question Answering.","upvotes":0,"discussionId":"664bce1ea8f6e1c876fc972b","githubRepo":"https://github.com/mxschmdt/mrqa-prompting-gen","githubRepoAddedBy":"auto","ai_summary":"Using large language models for prompt-based data generation enhances few-shot Question Answering performance beyond traditional methods.","ai_keywords":["large language models","data generation","Prompting framework","few-shot Question Answering","domain-agnostic knowledge","task-agnostic knowledge"],"githubStars":2},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["*"]}">
Using large language models for prompt-based data generation enhances few-shot Question Answering performance beyond traditional methods.
AI-generated summary
Although language models (LMs) have boosted the performance of Question
Answering, they still need plenty of data. Data annotation, in contrast, is a
time-consuming process. This especially applies to Question Answering, where
possibly large documents have to be parsed and annotated with questions and
their corresponding answers. Furthermore, Question Answering models often only
work well for the domain they were trained on. Since annotation is costly, we
argue that domain-agnostic knowledge from LMs, such as linguistic
understanding, is sufficient to create a well-curated dataset. With this
motivation, we show that using large language models can improve Question
Answering performance on various datasets in the few-shot setting compared to
state-of-the-art approaches. For this, we perform data generation leveraging
the Prompting framework, suggesting that language models contain valuable
task-agnostic knowledge that can be used beyond the common
pre-training/fine-tuning scheme. As a result, we consistently outperform
previous approaches on few-shot Question Answering.