Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Textbooks Are All You Need II: phi-1.5 technical report
@librarian-bot\n\t \n","updatedAt":"2024-02-06T18:32:14.227Z","author":{"_id":"62481c93a263b67dc1c52088","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62481c93a263b67dc1c52088/htJfKhx_Wu5ZvB0Bz0ldw.jpeg","fullname":"Raphael Kalandadze","name":"RaphaelKalandadze","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8227800726890564},"editors":["RaphaelKalandadze"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/62481c93a263b67dc1c52088/htJfKhx_Wu5ZvB0Bz0ldw.jpeg"],"reactions":[],"isReport":false}},{"id":"65c349852a023e7ea98edf67","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-02-07T09:12:37.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling](https://huggingface.co/papers/2401.16380) (2024)\n* [OMPGPT: A Generative Pre-trained Transformer Model for OpenMP](https://huggingface.co/papers/2401.16445) (2024)\n* [Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4](https://huggingface.co/papers/2312.16171) (2023)\n* [Language Resources for Dutch Large Language Modelling](https://huggingface.co/papers/2312.12852) (2023)\n* [TinyGSM: achieving >80% on GSM8k with small language models](https://huggingface.co/papers/2312.09241) (2023)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-02-07T09:12:37.476Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7639641165733337},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2309.05463","authors":[{"_id":"64ffd61085a884a964aec59a","user":{"_id":"641d5a313f04f9bf2da17e8c","avatarUrl":"/avatars/4035f73c69ddb045573ad96db4f13f04.svg","isPro":false,"fullname":"Yuanzhi Li","user":"Uushizhu1234","type":"user"},"name":"Yuanzhi Li","status":"admin_assigned","statusLastChangedAt":"2023-09-12T10:58:57.017Z","hidden":false},{"_id":"64ffd61085a884a964aec59b","user":{"_id":"64557408c9c0dcc8c24a7a92","avatarUrl":"/avatars/911ba8fc3c224578d5946dddd94bf4c0.svg","isPro":false,"fullname":"Sebastien Bubeck","user":"sebubeck","type":"user"},"name":"Sรฉbastien Bubeck","status":"admin_assigned","statusLastChangedAt":"2023-09-12T11:00:18.756Z","hidden":false},{"_id":"64ffd61085a884a964aec59c","user":{"_id":"633723a80267ebcf0264c06b","avatarUrl":"/avatars/22bb971597e9f3abfa343280a9d0f65f.svg","isPro":false,"fullname":"Ronen Eldan","user":"roneneldan","type":"user"},"name":"Ronen Eldan","status":"admin_assigned","statusLastChangedAt":"2023-09-12T10:59:17.424Z","hidden":false},{"_id":"64ffd61085a884a964aec59d","user":{"_id":"63f5562471a5d395c721cd8e","avatarUrl":"/avatars/ac296b6017e86ea04c73803fe2c44433.svg","isPro":false,"fullname":"Allie Del Giorno","user":"microallie","type":"user"},"name":"Allie Del Giorno","status":"admin_assigned","statusLastChangedAt":"2023-09-12T10:59:39.427Z","hidden":false},{"_id":"64ffd61085a884a964aec59e","user":{"_id":"63e1b4f77fbb6ae4d4f36aa4","avatarUrl":"/avatars/8e03bf9143be5e6456cc8e732ed3daaf.svg","isPro":false,"fullname":"Suriya Gunasekar","user":"suriyagunasekar","type":"user"},"name":"Suriya Gunasekar","status":"admin_assigned","statusLastChangedAt":"2023-09-12T10:59:54.922Z","hidden":false},{"_id":"64ffd61085a884a964aec59f","user":{"_id":"6303297604c75db08b8972e2","avatarUrl":"/avatars/693b2dada2244ba5d97df9518f473ccb.svg","isPro":false,"fullname":"Yin Tat Lee","user":"yintat","type":"user"},"name":"Yin Tat Lee","status":"admin_assigned","statusLastChangedAt":"2023-09-12T11:00:03.159Z","hidden":false}],"publishedAt":"2023-09-11T14:01:45.000Z","submittedOnDailyAt":"2023-09-12T01:38:01.010Z","title":"Textbooks Are All You Need II: phi-1.5 technical report","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We continue the investigation into the power of smaller Transformer-based\nlanguage models as initiated by TinyStories -- a 10 million parameter\nmodel that can produce coherent English -- and the follow-up work on\nphi-1, a 1.3 billion parameter model with Python coding performance\nclose to the state-of-the-art. The latter work proposed to use existing Large\nLanguage Models (LLMs) to generate ``textbook quality\" data as a way to enhance\nthe learning process compared to traditional web data. We follow the\n``Textbooks Are All You Need\" approach, focusing this time on common sense\nreasoning in natural language, and create a new 1.3 billion parameter model\nnamed phi-1.5, with performance on natural language tasks comparable\nto models 5x larger, and surpassing most non-frontier LLMs on more complex\nreasoning tasks such as grade-school mathematics and basic coding. More\ngenerally, phi-1.5 exhibits many of the traits of much larger LLMs,\nboth good -- such as the ability to ``think step by step\" or perform some\nrudimentary in-context learning -- and bad, including hallucinations and the\npotential for toxic and biased generations -- encouragingly though, we are\nseeing improvement on that front thanks to the absence of web data. We\nopen-source phi-1.5 to promote further research on these urgent\ntopics.","upvotes":89,"discussionId":"64ffd61085a884a964aec5ab","ai_summary":"A new 1.3 billion parameter Transformer-based language model, phi-1.5, demonstrates comparable performance to much larger models on common sense reasoning and complex tasks despite the absence of web data.","ai_keywords":["Transformer-based language models","TinyStories","phi-1","Large Language Models (LLMs)","textbook quality data","natural language tasks","common sense reasoning","grade-school mathematics","basic coding","in-context learning","hallucinations","toxic and biased generations"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"638f5d3b7879bae278a82f14","avatarUrl":"/avatars/91be1f19336fef0c8671ea936df78c5d.svg","isPro":false,"fullname":"Charan","user":"aiscientist","type":"user"},{"_id":"6307adee161cfe1383a234f6","avatarUrl":"/avatars/c7334107e736282d9a18fa1f19659a13.svg","isPro":false,"fullname":"Emmanuel Kahembwe","user":"mannykayy","type":"user"},{"_id":"64ba8cfcc0f19c90256cb56f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ba8cfcc0f19c90256cb56f/2zQ5BjDl8zpiIkR2A8Fhv.jpeg","isPro":false,"fullname":"Seungyoo Lee","user":"DopeorNope","type":"user"},{"_id":"64d10cad86e19d5db1a5fa1f","avatarUrl":"/avatars/c3e0187d9644c24caaa929f1e1ea6612.svg","isPro":false,"fullname":"Richard Shoemake","user":"rshoemake-launch","type":"user"},{"_id":"63358001686c20e55973298d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665133631770-63358001686c20e55973298d.png","isPro":false,"fullname":"Mathias Nielsen","user":"mathiasn1","type":"user"},{"_id":"63895b328a5dbe2f3dcba4ac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1669946101720-noauth.png","isPro":false,"fullname":"Cosmo","user":"cosmojg","type":"user"},{"_id":"64635dfcefb4e855048516ca","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64635dfcefb4e855048516ca/x4K3hyGMIuPZEQElBipWW.png","isPro":false,"fullname":"Mastane Achab","user":"Mastane","type":"user"},{"_id":"6362a0bad3be91534c2e4da9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1669909319302-6362a0bad3be91534c2e4da9.jpeg","isPro":false,"fullname":"Georg","user":"waltherg","type":"user"},{"_id":"64f3d15b77b0eb97ea1ec8b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64f3d15b77b0eb97ea1ec8b2/y_3DjdOr5reXzTvHwn-xT.jpeg","isPro":false,"fullname":"Christopher Snyder","user":"csnyder","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"609653c1146ef3bfe2fc7392","avatarUrl":"/avatars/1639b6552a419209ae67b6562183bc2f.svg","isPro":false,"fullname":"Inui","user":"Norm","type":"user"},{"_id":"63ef22b2bfe4ead22ca9e1e4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676616348535-noauth.jpeg","isPro":false,"fullname":"Phรบ Vรต","user":"phuvo","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">
A new 1.3 billion parameter Transformer-based language model, phi-1.5, demonstrates comparable performance to much larger models on common sense reasoning and complex tasks despite the absence of web data.
AI-generated summary
We continue the investigation into the power of smaller Transformer-based
language models as initiated by TinyStories -- a 10 million parameter
model that can produce coherent English -- and the follow-up work on
phi-1, a 1.3 billion parameter model with Python coding performance
close to the state-of-the-art. The latter work proposed to use existing Large
Language Models (LLMs) to generate ``textbook quality" data as a way to enhance
the learning process compared to traditional web data. We follow the
``Textbooks Are All You Need" approach, focusing this time on common sense
reasoning in natural language, and create a new 1.3 billion parameter model
named phi-1.5, with performance on natural language tasks comparable
to models 5x larger, and surpassing most non-frontier LLMs on more complex
reasoning tasks such as grade-school mathematics and basic coding. More
generally, phi-1.5 exhibits many of the traits of much larger LLMs,
both good -- such as the ability to ``think step by step" or perform some
rudimentary in-context learning -- and bad, including hallucinations and the
potential for toxic and biased generations -- encouragingly though, we are
seeing improvement on that front thanks to the absence of web data. We
open-source phi-1.5 to promote further research on these urgent
topics.