https://twitter.com/arankomatsuzaki/status/1775354511252459782

\n","updatedAt":"2024-04-03T09:04:31.226Z","author":{"_id":"5dd96eb166059660ed1ee413","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg","fullname":"Julien Chaumond","name":"julien-c","type":"user","isPro":true,"isHf":true,"isHfAdmin":true,"isMod":false,"followerCount":3842,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6125346422195435},"editors":["julien-c"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg"],"reactions":[],"isReport":false}},{"id":"660d50b77bed164f7ec2e32b","author":{"_id":"644e1b1d9b4e87c31bab0a14","avatarUrl":"/avatars/88bb4c4a67dc8958069e9014f5e73a0b.svg","fullname":"Michael Barry","name":"MichaelBarryUK","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false},"createdAt":"2024-04-03T12:51:03.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Very interesting 👍\n\nThe single symbol function name is a neat little trick, feels so obvious in hindsight. \n\nIt would be very interesting to see just how much information we can encapsulate in a single symbol... And then re-use those symbols to increase throughput \n\nA kind of meta-language that represents a topology of layers of abstraction, on top of layers of abstraction \n\nEach time a new concept is learned, it gets a symbol, and that symbol is then used to further train the model. This new alphabet would effectively represent a map of the knowledge in the model. \n\nKind of like database normalisation for embeddings. \n","html":"

Very interesting 👍

The single symbol function name is a neat little trick, feels so obvious in hindsight.

It would be very interesting to see just how much information we can encapsulate in a single symbol... And then re-use those symbols to increase throughput

A kind of meta-language that represents a topology of layers of abstraction, on top of layers of abstraction

Each time a new concept is learned, it gets a symbol, and that symbol is then used to further train the model. This new alphabet would effectively represent a map of the knowledge in the model.

Kind of like database normalisation for embeddings.

\n","updatedAt":"2024-04-03T12:51:03.343Z","author":{"_id":"644e1b1d9b4e87c31bab0a14","avatarUrl":"/avatars/88bb4c4a67dc8958069e9014f5e73a0b.svg","fullname":"Michael Barry","name":"MichaelBarryUK","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.92112135887146},"editors":["MichaelBarryUK"],"editorAvatarUrls":["/avatars/88bb4c4a67dc8958069e9014f5e73a0b.svg"],"reactions":[{"reaction":"🤯","users":["julien-c","juandavidgf","DoodleBear","BlakeRain","MichaelBarryUK"],"count":5},{"reaction":"🔥","users":["maveriq"],"count":1}],"isReport":false},"replies":[{"id":"660d6ea1341f0fa3de96cf78","author":{"_id":"5dd96eb166059660ed1ee413","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg","fullname":"Julien Chaumond","name":"julien-c","type":"user","isPro":true,"isHf":true,"isHfAdmin":true,"isMod":false,"followerCount":3842,"isUserFollowing":false},"createdAt":"2024-04-03T14:58:41.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"🤯","html":"

🤯

\n","updatedAt":"2024-04-03T14:58:41.345Z","author":{"_id":"5dd96eb166059660ed1ee413","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg","fullname":"Julien Chaumond","name":"julien-c","type":"user","isPro":true,"isHf":true,"isHfAdmin":true,"isMod":false,"followerCount":3842,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"oc","probability":0.10113807767629623},"editors":["julien-c"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"660d50b77bed164f7ec2e32b"}},{"id":"6618087e0796ae11aad36ca3","author":{"_id":"65c082c626c639b0009a3c82","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65c082c626c639b0009a3c82/w_Q_5iLOM-nh_drd9Tu4X.jpeg","fullname":"Alex Chen","name":"alexchen4ai","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":21,"isUserFollowing":false},"createdAt":"2024-04-11T15:57:50.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Hi Michael, thanks for your sharing. I also agree that a single symbol could be powerful. Previously, we've seen many special tokens like the `EOS` token or `` in Llava. According to our more experiments, function as a token could be pretty effective when we want the model to perform function call well. We can also let the model get feedback from the functions to be executed. ","html":"

Hi Michael, thanks for your sharing. I also agree that a single symbol could be powerful. Previously, we've seen many special tokens like the EOS token or <image> in Llava. According to our more experiments, function as a token could be pretty effective when we want the model to perform function call well. We can also let the model get feedback from the functions to be executed.

\n","updatedAt":"2024-04-11T15:58:44.166Z","author":{"_id":"65c082c626c639b0009a3c82","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65c082c626c639b0009a3c82/w_Q_5iLOM-nh_drd9Tu4X.jpeg","fullname":"Alex Chen","name":"alexchen4ai","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":21,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.9706878662109375},"editors":["alexchen4ai"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65c082c626c639b0009a3c82/w_Q_5iLOM-nh_drd9Tu4X.jpeg"],"reactions":[{"reaction":"👍","users":["MichaelBarryUK"],"count":1}],"isReport":false,"parentCommentId":"660d50b77bed164f7ec2e32b"}}]},{"id":"660d6a7c43298574de19549b","author":{"_id":"635f9fd1ae7144a6674c839b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1667211208219-noauth.jpeg","fullname":"Marcus Gawronsky","name":"marcusinthesky","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false},"createdAt":"2024-04-03T14:41:00.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"I think the idea of using special tokens makes a lot of sense. I think we under appreciate the power and expressiveness of token-space in LLMs. \n\nIf you look at techniques LlaVa, ViTs need registers and Prompt Fine Tuning, all of these effectively hack the expressiveness of token-space. With long-context the opportunity to use token-space is larger. If you look at models like Bert, almost 30% of the model is embedding weight, but unlike most layers the ability to add just a single new token can be extremely efficient and extremely powerful. I think as a research community there is a lot of exciting stuff on the horizon here. ","html":"

I think the idea of using special tokens makes a lot of sense. I think we under appreciate the power and expressiveness of token-space in LLMs.

If you look at techniques LlaVa, ViTs need registers and Prompt Fine Tuning, all of these effectively hack the expressiveness of token-space. With long-context the opportunity to use token-space is larger. If you look at models like Bert, almost 30% of the model is embedding weight, but unlike most layers the ability to add just a single new token can be extremely efficient and extremely powerful. I think as a research community there is a lot of exciting stuff on the horizon here.

\n","updatedAt":"2024-04-12T07:54:50.385Z","author":{"_id":"635f9fd1ae7144a6674c839b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1667211208219-noauth.jpeg","fullname":"Marcus Gawronsky","name":"marcusinthesky","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":3,"identifiedLanguage":{"language":"en","probability":0.9335092306137085},"editors":["marcusinthesky"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1667211208219-noauth.jpeg"],"reactions":[],"isReport":false}},{"id":"660e013dffd736d867fb1488","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-04-04T01:24:13.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following](https://huggingface.co/papers/2403.03129) (2024)\n* [MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases](https://huggingface.co/papers/2402.14905) (2024)\n* [AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls](https://huggingface.co/papers/2402.04253) (2024)\n* [SwissNYF: Tool Grounded LLM Agents for Black Box Setting](https://huggingface.co/papers/2402.10051) (2024)\n* [A Survey of using Large Language Models for Generating Infrastructure as Code](https://huggingface.co/papers/2404.00227) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-04-04T01:24:13.220Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.728067934513092},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"660e0d5418c0fac1a153fe49","author":{"_id":"641dfddf3bae5a77636817c5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641dfddf3bae5a77636817c5/2IwNwh9kK98eCHUmOGoWD.png","fullname":"wing lian","name":"winglian","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4396,"isUserFollowing":false},"createdAt":"2024-04-04T02:15:48.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Will the dataset for this be released in full as open source?","html":"

Will the dataset for this be released in full as open source?

\n","updatedAt":"2024-04-04T02:15:48.116Z","author":{"_id":"641dfddf3bae5a77636817c5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641dfddf3bae5a77636817c5/2IwNwh9kK98eCHUmOGoWD.png","fullname":"wing lian","name":"winglian","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4396,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.977699875831604},"editors":["winglian"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/641dfddf3bae5a77636817c5/2IwNwh9kK98eCHUmOGoWD.png"],"reactions":[{"reaction":"🔥","users":["huangyt"],"count":1}],"isReport":false}},{"id":"66659269efbf2cc81e10fa0f","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false},"createdAt":"2024-06-09T11:30:49.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# Octopus v2: Revolutionizing On-Device AI for Super Agents!\n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/Ef76SslELhzhJE4bVVAw5.mp4 \n\n## Links 🔗:\n👉 Subscribe: https://www.youtube.com/@Arxflix\n👉 Twitter: https://x.com/arxflix\n👉 LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

\n\t\n\t\t\n\t\n\t\n\t\tOctopus v2: Revolutionizing On-Device AI for Super Agents!\n\t\n

\n\n

\n\t\n\t\t\n\t\n\t\n\t\tLinks 🔗:\n\t\n

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix
$\"9t4iCUHx_400x400-1.jpg\"$

\n","updatedAt":"2024-06-09T11:30:49.778Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5050802826881409},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2404.01744","authors":[{"_id":"660cb8f527b5ec27265bc10c","user":{"_id":"633a0c23c0b7fd23493d21cf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/633a0c23c0b7fd23493d21cf/aeMrS-8fPyGQ0LjemnnmN.webp","isPro":false,"fullname":"Wei Chen","user":"NexaAIalex","type":"user"},"name":"Wei Chen","status":"claimed_verified","statusLastChangedAt":"2024-04-11T20:47:27.720Z","hidden":false},{"_id":"660cb8f527b5ec27265bc10d","user":{"_id":"64a07bb3d5c3d57173da0103","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64a07bb3d5c3d57173da0103/F0GsjtPbwNMfVZ3bohadR.jpeg","isPro":false,"fullname":"Zack Zhiyuan Li","user":"zack4ai","type":"user"},"name":"Zhiyuan Li","status":"claimed_verified","statusLastChangedAt":"2024-04-15T06:45:39.282Z","hidden":false}],"publishedAt":"2024-04-02T09:01:32.000Z","submittedOnDailyAt":"2024-04-03T01:56:09.361Z","title":"Octopus v2: On-device language model for super agent","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Language models have shown effectiveness in a variety of software\napplications, particularly in tasks related to automatic workflow. These models\npossess the crucial ability to call functions, which is essential in creating\nAI agents. Despite the high performance of large-scale language models in cloud\nenvironments, they are often associated with concerns over privacy and cost.\nCurrent on-device models for function calling face issues with latency and\naccuracy. Our research presents a new method that empowers an on-device model\nwith 2 billion parameters to surpass the performance of GPT-4 in both accuracy\nand latency, and decrease the context length by 95\\%. When compared to Llama-7B\nwith a RAG-based function calling mechanism, our method enhances latency by\n35-fold. This method reduces the latency to levels deemed suitable for\ndeployment across a variety of edge devices in production environments,\naligning with the performance requisites for real-world applications.","upvotes":58,"discussionId":"660cb8f727b5ec27265bc148","ai_summary":"A new on-device method improves the performance of a large language model, matching GPT-4's accuracy and latency while reducing context length and enhancing deployment on edge devices.","ai_keywords":["function calling","large-scale language models","on-device models","GPT-4","RAG-based function calling mechanism","context length","edge devices"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6101c620900eaa0057c2ce1d","avatarUrl":"/avatars/bd282166c120711c65b5409dc860ac58.svg","isPro":false,"fullname":"Abdel-Dayane Marcos","user":"admarcosai","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"663265889292069aed62ac85","avatarUrl":"/avatars/b65e978a221ebebd575b03f35bb6a1c4.svg","isPro":false,"fullname":"Barathi","user":"IshwarTG","type":"user"},{"_id":"64c31d983395771ecbb8eb17","avatarUrl":"/avatars/fdf8109e30ca02578a1b8c7efc850a23.svg","isPro":false,"fullname":"ren","user":"murphyyyyy","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"5ff5d596f244529b3ec0fb89","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1624629516652-5ff5d596f244529b3ec0fb89.png","isPro":false,"fullname":"Philipp Schmid","user":"philschmid","type":"user"},{"_id":"61766719596f673069f64dbc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61766719596f673069f64dbc/n9JgviFMfhcoPr9Swb500.png","isPro":false,"fullname":"Akash Singh","user":"akashicmarga","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"5dd96eb166059660ed1ee413","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg","isPro":true,"fullname":"Julien Chaumond","user":"julien-c","type":"user"},{"_id":"5f106ce5348d4c7346cd19ab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f106ce5348d4c7346cd19ab/Uu08yZZlFuj3dtG4wld3n.jpeg","isPro":false,"fullname":"Abdullah Abdelrhim","user":"abdullah","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">

Papers

arxiv:2404.01744

Octopus v2: On-device language model for super agent

Published on Apr 2, 2024

· Submitted by

AK on Apr 3, 2024

#1 Paper of the day

Upvote

Authors:

Wei Chen ,

Zhiyuan Li

Abstract

A new on-device method improves the performance of a large language model, matching GPT-4's accuracy and latency while reducing context length and enhancing deployment on edge devices.

AI-generated summary

Language models have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale language models in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95\%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.

View arXiv page View PDF Add to collection

Community

julien-c

Apr 3, 2024

tweeted by Aran Komatsuzaki: https://twitter.com/arankomatsuzaki/status/1775354511252459782