Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-03-21T01:23:02.334Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7359945774078369},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2403.12596","authors":[{"_id":"65fa55c8e483037d83fa56cb","user":{"_id":"656054efc40c3a6a0d9d7b3e","avatarUrl":"/avatars/6d93e933df1cd59b836f5f79b5e938bf.svg","isPro":false,"fullname":"Victor Carbune","user":"vcarbune","type":"user"},"name":"Victor Carbune","status":"admin_assigned","statusLastChangedAt":"2024-03-21T12:55:55.218Z","hidden":false},{"_id":"65fa55c8e483037d83fa56cc","user":{"_id":"629098fd5463575364e7697a","avatarUrl":"/avatars/e45f3ee28bff3571f76caf847d3c36db.svg","isPro":false,"fullname":"Hassan Mansoor","user":"HassanMansoor","type":"user"},"name":"Hassan Mansoor","status":"admin_assigned","statusLastChangedAt":"2024-03-21T12:55:48.884Z","hidden":false},{"_id":"65fa55c8e483037d83fa56cd","user":{"_id":"5f881856ee5616341bc51e67","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f881856ee5616341bc51e67/9UCZCuhBTpJC9tGPyGmMb.jpeg","isPro":false,"fullname":"Fangyu Liu","user":"fl399","type":"user"},"name":"Fangyu Liu","status":"admin_assigned","statusLastChangedAt":"2024-03-21T12:56:01.262Z","hidden":false},{"_id":"65fa55c8e483037d83fa56ce","user":{"_id":"60270a7c32856987162c641a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60270a7c32856987162c641a/yNO2n0MMOiTqRaHFD2bv1.jpeg","isPro":false,"fullname":"Rahul","user":"rahular","type":"user"},"name":"Rahul Aralikatte","status":"admin_assigned","statusLastChangedAt":"2024-03-21T12:56:08.802Z","hidden":false},{"_id":"65fa55c8e483037d83fa56cf","user":{"_id":"633bf74e9a0fb78266be26cf","avatarUrl":"/avatars/360e37dff6841ee97779743e94e62213.svg","isPro":false,"fullname":"Gilles Baechler","user":"g-les","type":"user"},"name":"Gilles Baechler","status":"admin_assigned","statusLastChangedAt":"2024-03-21T12:56:15.562Z","hidden":false},{"_id":"65fa55c8e483037d83fa56d0","user":{"_id":"635242b8b3678a43742baec6","avatarUrl":"/avatars/fc9f6d3922c76ab902cb11ed23554c54.svg","isPro":false,"fullname":"chenjindong","user":"chenjindong","type":"user"},"name":"Jindong Chen","status":"admin_assigned","statusLastChangedAt":"2024-03-21T12:56:28.070Z","hidden":false},{"_id":"65fa55c8e483037d83fa56d1","name":"Abhanshu Sharma","hidden":false}],"publishedAt":"2024-03-19T10:03:07.000Z","submittedOnDailyAt":"2024-03-20T01:49:37.905Z","title":"Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Vision-language models (VLMs) are achieving increasingly strong performance\non multimodal tasks. However, reasoning capabilities remain limited\nparticularly for smaller VLMs, while those of large-language models (LLMs) have\nseen numerous improvements. We propose a technique to transfer capabilities\nfrom LLMs to VLMs. On the recently introduced ChartQA, our method obtains\nstate-of-the-art performance when applied on the PaLI3-5B VLM by\nchen2023pali3, while also enabling much better performance on PlotQA\nand FigureQA.\n We first improve the chart representation by continuing the pre-training\nstage using an improved version of the chart-to-table translation task by\nliu2023deplot. We then propose constructing a 20x larger dataset than\nthe original training set. To improve general reasoning capabilities and\nimprove numerical operations, we synthesize reasoning traces using the table\nrepresentation of charts. Lastly, our model is fine-tuned using the multitask\nloss introduced by hsieh2023distilling.\n Our variant ChartPaLI-5B outperforms even 10x larger models such as PaLIX-55B\nwithout using an upstream OCR system, while keeping inference time constant\ncompared to the PaLI3-5B baseline. When rationales are further refined with a\nsimple program-of-thought prompt chen2023program, our model outperforms\nthe recently introduced Gemini Ultra and GPT-4V.","upvotes":11,"discussionId":"65fa55c9e483037d83fa5719","ai_summary":"A method transfers reasoning capabilities from large-language models to vision-language models, achieving state-of-the-art performance on ChartQA and superior performance on PlotQA and FigureQA by enhancing chart representation and training with a larger, synthesized dataset and multitask loss.","ai_keywords":["vision-language models","large-language models","ChartQA","PaLI3-5B","plotQA","FigureQA","chart-to-table translation","reasoning traces","multitask loss","ChartPaLI-5B","PaLIX-55B","OCR system","program-of-thought prompt","Gemini Ultra","GPT-4V"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","isPro":false,"fullname":"Byung-Kwan Lee","user":"BK-Lee","type":"user"},{"_id":"5ecea265968f6028e0559fa5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1619623771844-5ecea265968f6028e0559fa5.jpeg","isPro":true,"fullname":"Victor Sanh","user":"VictorSanh","type":"user"},{"_id":"6555125a4f361968f0e3aad7","avatarUrl":"/avatars/e7692d82804338f21ecdc6e731f5c5ea.svg","isPro":false,"fullname":"marinaretikof","user":"marinaretik","type":"user"},{"_id":"630c2ddb86b8b9904c3860a6","avatarUrl":"/avatars/9b6cec2e9e269ccac1533eb7bf1ac2c5.svg","isPro":false,"fullname":"Igor Melnyk","user":"imelnyk","type":"user"},{"_id":"644f10d267a3dd3d072a2669","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/644f10d267a3dd3d072a2669/r7AA6gkm-AuQLHQ77G7d5.png","isPro":false,"fullname":"Neil Van","user":"nvhf","type":"user"},{"_id":"663ccbff3a74a20189d4aa2e","avatarUrl":"/avatars/83a54455e0157480f65c498cd9057cf2.svg","isPro":false,"fullname":"Nguyen Van Thanh","user":"NguyenVanThanhHust","type":"user"},{"_id":"67b8713377a173b77b11fc19","avatarUrl":"/avatars/065960d94c605fe599d60f74c2516e25.svg","isPro":false,"fullname":"Kasturi Gnanaguna Sagar","user":"kasturi-sarvam","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2403.12596

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Published on Mar 19, 2024
· Submitted by
AK
on Mar 20, 2024

Abstract

A method transfers reasoning capabilities from large-language models to vision-language models, achieving state-of-the-art performance on ChartQA and superior performance on PlotQA and FigureQA by enhancing chart representation and training with a larger, synthesized dataset and multitask loss.

AI-generated summary

Vision-language models (VLMs) are achieving increasingly strong performance on multimodal tasks. However, reasoning capabilities remain limited particularly for smaller VLMs, while those of large-language models (LLMs) have seen numerous improvements. We propose a technique to transfer capabilities from LLMs to VLMs. On the recently introduced ChartQA, our method obtains state-of-the-art performance when applied on the PaLI3-5B VLM by chen2023pali3, while also enabling much better performance on PlotQA and FigureQA. We first improve the chart representation by continuing the pre-training stage using an improved version of the chart-to-table translation task by liu2023deplot. We then propose constructing a 20x larger dataset than the original training set. To improve general reasoning capabilities and improve numerical operations, we synthesize reasoning traces using the table representation of charts. Lastly, our model is fine-tuned using the multitask loss introduced by hsieh2023distilling. Our variant ChartPaLI-5B outperforms even 10x larger models such as PaLIX-55B without using an upstream OCR system, while keeping inference time constant compared to the PaLI3-5B baseline. When rationales are further refined with a simple program-of-thought prompt chen2023program, our model outperforms the recently introduced Gemini Ultra and GPT-4V.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2403.12596 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.12596 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.12596 in a Space README.md to link it from this page.

Collections including this paper 7