Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-11-05T01:34:05.776Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7463781833648682},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2410.21157","authors":[{"_id":"672840ed89116e24a4296489","user":{"_id":"65377c30e48353201e6fdda0","avatarUrl":"/avatars/a8f803b6f2e598eaee9c52c0d2ddfc16.svg","isPro":false,"fullname":"Jiaheng Liu","user":"CheeryLJH","type":"user"},"name":"Jiaheng Liu","status":"admin_assigned","statusLastChangedAt":"2024-11-04T21:30:50.375Z","hidden":false},{"_id":"672840ed89116e24a429648a","name":"Ken Deng","hidden":false},{"_id":"672840ed89116e24a429648b","name":"Congnan Liu","hidden":false},{"_id":"672840ed89116e24a429648c","name":"Jian Yang","hidden":false},{"_id":"672840ed89116e24a429648d","user":{"_id":"666914ba38d9327ca72134c4","avatarUrl":"/avatars/e65e2e27fc01064909ba257565387d10.svg","isPro":false,"fullname":"Shukai Liu","user":"skLiu","type":"user"},"name":"Shukai Liu","status":"admin_assigned","statusLastChangedAt":"2024-11-04T21:31:17.337Z","hidden":false},{"_id":"672840ed89116e24a429648e","name":"He Zhu","hidden":false},{"_id":"672840ed89116e24a429648f","name":"Peng Zhao","hidden":false},{"_id":"672840ed89116e24a4296490","user":{"_id":"64ba096e760936217a3ad2e2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ba096e760936217a3ad2e2/aNQK83Jg5PsBkY0UDg-RA.jpeg","isPro":false,"fullname":"Linzheng Chai","user":"Challenging666","type":"user"},"name":"Linzheng Chai","status":"admin_assigned","statusLastChangedAt":"2024-11-04T21:31:28.233Z","hidden":false},{"_id":"672840ed89116e24a4296491","user":{"_id":"65a33a99db5c00652eaa5e4c","avatarUrl":"/avatars/23346ff87dfd2db2463c11137de9fbf2.svg","isPro":false,"fullname":"Yanan Wu","user":"wuyanan513","type":"user"},"name":"Yanan Wu","status":"admin_assigned","statusLastChangedAt":"2024-11-04T21:31:41.476Z","hidden":false},{"_id":"672840ed89116e24a4296492","name":"Ke Jin","hidden":false},{"_id":"672840ed89116e24a4296493","user":{"_id":"638efcf4c67af472d316d424","avatarUrl":"/avatars/97a57859d7d87a3a8f1bb41d32a72bc2.svg","isPro":false,"fullname":"Ge Zhang","user":"zhangysk","type":"user"},"name":"Ge Zhang","status":"admin_assigned","statusLastChangedAt":"2024-11-04T21:32:54.677Z","hidden":false},{"_id":"672840ed89116e24a4296494","name":"Zekun Wang","hidden":false},{"_id":"672840ed89116e24a4296495","name":"Guoan Zhang","hidden":false},{"_id":"672840ed89116e24a4296496","user":{"_id":"646f40e4a6a58aa295064bff","avatarUrl":"/avatars/8e31117e4f49e6404d139592cfb2bff1.svg","isPro":false,"fullname":"XIANG BANGYU","user":"BANGYU","type":"user"},"name":"Bangyu Xiang","status":"admin_assigned","statusLastChangedAt":"2024-11-04T21:31:55.129Z","hidden":false},{"_id":"672840ed89116e24a4296497","name":"Wenbo Su","hidden":false},{"_id":"672840ed89116e24a4296498","user":{"_id":"62c695ad5aae1c624ca992e2","avatarUrl":"/avatars/20d10fb3338e4bd4dc59e88a18cb2617.svg","isPro":false,"fullname":"Bo Zheng","user":"bzheng","type":"user"},"name":"Bo Zheng","status":"admin_assigned","statusLastChangedAt":"2024-11-04T21:32:13.499Z","hidden":false}],"publishedAt":"2024-10-28T15:58:41.000Z","submittedOnDailyAt":"2024-11-04T01:05:25.498Z","title":"M2rc-Eval: Massively Multilingual Repository-level Code Completion\n Evaluation","submittedOnDailyBy":{"_id":"6149a9e95347647e6bb68882","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6149a9e95347647e6bb68882/Jddln1FxScCeVgTSCNBpr.png","isPro":false,"fullname":"Zekun Moore Wang","user":"ZenMoore","type":"user"},"summary":"Repository-level code completion has drawn great attention in software\nengineering, and several benchmark datasets have been introduced. However,\nexisting repository-level code completion benchmarks usually focus on a limited\nnumber of languages (<5), which cannot evaluate the general code intelligence\nabilities across different languages for existing code Large Language Models\n(LLMs). Besides, the existing benchmarks usually report overall average scores\nof different languages, where the fine-grained abilities in different\ncompletion scenarios are ignored. Therefore, to facilitate the research of code\nLLMs in multilingual scenarios, we propose a massively multilingual\nrepository-level code completion benchmark covering 18 programming languages\n(called M2RC-EVAL), and two types of fine-grained annotations (i.e.,\nbucket-level and semantic-level) on different completion scenarios are\nprovided, where we obtain these annotations based on the parsed abstract syntax\ntree. Moreover, we also curate a massively multilingual instruction corpora\nM2RC- INSTRUCT dataset to improve the repository-level code completion\nabilities of existing code LLMs. Comprehensive experimental results demonstrate\nthe effectiveness of our M2RC-EVAL and M2RC-INSTRUCT.","upvotes":6,"discussionId":"672840ef89116e24a42964fe","ai_summary":"A new multilingual benchmark and instruction dataset are introduced to evaluate and improve repository-level code completion for Large Language Models across 18 programming languages.","ai_keywords":["repository-level code completion","Large Language Models","benchmark datasets","multilingual scenarios","massively multilingual","bucket-level annotations","semantic-level annotations","abstract syntax tree","code intelligence"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6149a9e95347647e6bb68882","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6149a9e95347647e6bb68882/Jddln1FxScCeVgTSCNBpr.png","isPro":false,"fullname":"Zekun Moore Wang","user":"ZenMoore","type":"user"},{"_id":"64ba096e760936217a3ad2e2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ba096e760936217a3ad2e2/aNQK83Jg5PsBkY0UDg-RA.jpeg","isPro":false,"fullname":"Linzheng Chai","user":"Challenging666","type":"user"},{"_id":"65377c30e48353201e6fdda0","avatarUrl":"/avatars/a8f803b6f2e598eaee9c52c0d2ddfc16.svg","isPro":false,"fullname":"Jiaheng Liu","user":"CheeryLJH","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"641b754d1911d3be6745cce9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641b754d1911d3be6745cce9/Ydjcjd4VuNUGj5Cd4QHdB.png","isPro":false,"fullname":"atayloraerospace","user":"Taylor658","type":"user"},{"_id":"6555125a4f361968f0e3aad7","avatarUrl":"/avatars/e7692d82804338f21ecdc6e731f5c5ea.svg","isPro":false,"fullname":"marinaretikof","user":"marinaretik","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2410.21157

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

Published on Oct 28, 2024
· Submitted by
Zekun Moore Wang
on Nov 4, 2024
Authors:
,
,
,
,
,
,
,
,
,

Abstract

A new multilingual benchmark and instruction dataset are introduced to evaluate and improve repository-level code completion for Large Language Models across 18 programming languages.

AI-generated summary

Repository-level code completion has drawn great attention in software engineering, and several benchmark datasets have been introduced. However, existing repository-level code completion benchmarks usually focus on a limited number of languages (<5), which cannot evaluate the general code intelligence abilities across different languages for existing code Large Language Models (LLMs). Besides, the existing benchmarks usually report overall average scores of different languages, where the fine-grained abilities in different completion scenarios are ignored. Therefore, to facilitate the research of code LLMs in multilingual scenarios, we propose a massively multilingual repository-level code completion benchmark covering 18 programming languages (called M2RC-EVAL), and two types of fine-grained annotations (i.e., bucket-level and semantic-level) on different completion scenarios are provided, where we obtain these annotations based on the parsed abstract syntax tree. Moreover, we also curate a massively multilingual instruction corpora M2RC- INSTRUCT dataset to improve the repository-level code completion abilities of existing code LLMs. Comprehensive experimental results demonstrate the effectiveness of our M2RC-EVAL and M2RC-INSTRUCT.

Community

Paper submitter

Repository-level code completion has drawn great attention in software engineering, and several benchmark datasets have been introduced. However, existing repository-level code completion benchmarks usually focus on a limited number of languages (<5), which cannot evaluate the general code intelligence abilities across different languages for existing code Large Language Models (LLMs). Besides, the existing benchmarks usually report overall average scores of different languages, where the fine-grained abilities in different completion scenarios are ignored. Therefore, to facilitate the research of code LLMs in multilingual scenarios, we propose a massively multilingual repository-level code completion benchmark covering 18 programming languages (called M2RC-EVAL), and two types of fine-grained annotations (i.e., bucket-level and semantic-level) on different completion scenarios are provided, where we obtain these annotations based on the parsed abstract syntax tree. Moreover, we also curate a massively multilingual instruction corpora M2RC- INSTRUCT dataset to improve the repository-level code completion abilities of existing code LLMs. Comprehensive experimental results demonstrate the effectiveness of our M2RC-EVAL and M2RC-INSTRUCT.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.21157 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.21157 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.21157 in a Space README.md to link it from this page.

Collections including this paper 2