@librarian-bot\n\t recommend

\n","updatedAt":"2024-06-12T02:57:51.380Z","author":{"_id":"646b8e6f31968a60a0201a12","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646b8e6f31968a60a0201a12/SU2Gs1NPuk1zoXHwFHl0U.jpeg","fullname":")))?!?(((","name":"stereoplegic","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3927,"isUserFollowing":false}},"numEdits":0,"editors":["stereoplegic"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/646b8e6f31968a60a0201a12/SU2Gs1NPuk1zoXHwFHl0U.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"66690eb49167cb63039e51b5","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-06-12T02:57:56.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning](https://huggingface.co/papers/2404.09163) (2024)\n* [Enhancing Pre-Trained Generative Language Models with Question Attended Span Extraction on Machine Reading Comprehension](https://huggingface.co/papers/2404.17991) (2024)\n* [Asking and Answering Questions to Extract Event-Argument Structures](https://huggingface.co/papers/2404.16413) (2024)\n* [Better Synthetic Data by Retrieving and Transforming Existing Datasets](https://huggingface.co/papers/2404.14361) (2024)\n* [Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation](https://huggingface.co/papers/2406.03703) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-06-12T02:57:56.308Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"66690eaf8efc279ea9be0cfc"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2405.09335","authors":[{"_id":"664bce1ea8f6e1c876fc970b","name":"Maximilian Schmidt","hidden":false},{"_id":"664bce1ea8f6e1c876fc970c","name":"Andrea Bartezzaghi","hidden":false},{"_id":"664bce1ea8f6e1c876fc970d","name":"Ngoc Thang Vu","hidden":false}],"publishedAt":"2024-05-15T13:36:43.000Z","title":"Prompting-based Synthetic Data Generation for Few-Shot Question\n Answering","summary":"Although language models (LMs) have boosted the performance of Question\nAnswering, they still need plenty of data. Data annotation, in contrast, is a\ntime-consuming process. This especially applies to Question Answering, where\npossibly large documents have to be parsed and annotated with questions and\ntheir corresponding answers. Furthermore, Question Answering models often only\nwork well for the domain they were trained on. Since annotation is costly, we\nargue that domain-agnostic knowledge from LMs, such as linguistic\nunderstanding, is sufficient to create a well-curated dataset. With this\nmotivation, we show that using large language models can improve Question\nAnswering performance on various datasets in the few-shot setting compared to\nstate-of-the-art approaches. For this, we perform data generation leveraging\nthe Prompting framework, suggesting that language models contain valuable\ntask-agnostic knowledge that can be used beyond the common\npre-training/fine-tuning scheme. As a result, we consistently outperform\nprevious approaches on few-shot Question Answering.","upvotes":0,"discussionId":"664bce1ea8f6e1c876fc972b","githubRepo":"https://github.com/mxschmdt/mrqa-prompting-gen","githubRepoAddedBy":"auto","ai_summary":"Using large language models for prompt-based data generation enhances few-shot Question Answering performance beyond traditional methods.","ai_keywords":["large language models","data generation","Prompting framework","few-shot Question Answering","domain-agnostic knowledge","task-agnostic knowledge"],"githubStars":2},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["*"]}">

Papers

arxiv:2405.09335

Prompting-based Synthetic Data Generation for Few-Shot Question Answering

Published on May 15, 2024

Upvote

Authors:

Abstract

Using large language models for prompt-based data generation enhances few-shot Question Answering performance beyond traditional methods.

AI-generated summary

Although language models (LMs) have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large documents have to be parsed and annotated with questions and their corresponding answers. Furthermore, Question Answering models often only work well for the domain they were trained on. Since annotation is costly, we argue that domain-agnostic knowledge from LMs, such as linguistic understanding, is sufficient to create a well-curated dataset. With this motivation, we show that using large language models can improve Question Answering performance on various datasets in the few-shot setting compared to state-of-the-art approaches. For this, we perform data generation leveraging the Prompting framework, suggesting that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme. As a result, we consistently outperform previous approaches on few-shot Question Answering.