$\"UCFE.png\"$

\n","updatedAt":"2024-10-21T07:16:41.632Z","author":{"_id":"643c047326f177a3e41627b6","avatarUrl":"/avatars/ade75cebd049daf080ba80a80d516240.svg","fullname":"Yifei Zhang","name":"amstrongzyf","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.36222150921821594},"editors":["amstrongzyf"],"editorAvatarUrls":["/avatars/ade75cebd049daf080ba80a80d516240.svg"],"reactions":[],"isReport":false}},{"id":"671700d5a3113c4c3ead42ab","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-10-22T01:33:09.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback](https://huggingface.co/papers/2408.15549) (2024)\n* [LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models](https://huggingface.co/papers/2408.13338) (2024)\n* [A Dutch Financial Large Language Model](https://huggingface.co/papers/2410.12835) (2024)\n* [CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation for Meeting Summarization](https://huggingface.co/papers/2409.10883) (2024)\n* [Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs](https://huggingface.co/papers/2410.11507) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-10-22T01:33:09.155Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7242987155914307},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2410.14059","authors":[{"_id":"6715fc2cac8cb216a41877a5","user":{"_id":"63f622c69cbd6730302783eb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f622c69cbd6730302783eb/9cb96JVKiOm_JhF-shbFw.jpeg","isPro":false,"fullname":"Yuzhe Yang","user":"TobyYang7","type":"user"},"name":"Yuzhe Yang","status":"claimed_verified","statusLastChangedAt":"2024-10-21T07:40:43.520Z","hidden":false},{"_id":"6715fc2cac8cb216a41877a6","user":{"_id":"643c047326f177a3e41627b6","avatarUrl":"/avatars/ade75cebd049daf080ba80a80d516240.svg","isPro":false,"fullname":"Yifei Zhang","user":"amstrongzyf","type":"user"},"name":"Yifei Zhang","status":"claimed_verified","statusLastChangedAt":"2024-10-21T07:40:45.362Z","hidden":false},{"_id":"6715fc2cac8cb216a41877a7","name":"Yan Hu","hidden":false},{"_id":"6715fc2cac8cb216a41877a8","name":"Yilin Guo","hidden":false},{"_id":"6715fc2cac8cb216a41877a9","user":{"_id":"651fdadbf998876ea7515265","avatarUrl":"/avatars/d43c4dc13764a0c43783c3b4e3fcda0e.svg","isPro":false,"fullname":"ganruoli","user":"wittenberg","type":"user"},"name":"Ruoli Gan","status":"admin_assigned","statusLastChangedAt":"2024-10-21T12:04:36.004Z","hidden":false},{"_id":"6715fc2cac8cb216a41877aa","user":{"_id":"65bd14e8ce846f8aa94db1d1","avatarUrl":"/avatars/76eaad15bf32eba75271f3dc315527c2.svg","isPro":false,"fullname":"Yueru He","user":"Yueru1","type":"user"},"name":"Yueru He","status":"admin_assigned","statusLastChangedAt":"2024-10-21T12:04:42.030Z","hidden":false},{"_id":"6715fc2cac8cb216a41877ab","user":{"_id":"6628c6107751d297d7025a71","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6628c6107751d297d7025a71/S1rm5VIwV2Uxfv8GetKMU.jpeg","isPro":false,"fullname":"Lei Mingcong","user":"SP4595","type":"user"},"name":"Mingcong Lei","status":"admin_assigned","statusLastChangedAt":"2024-10-21T12:05:00.431Z","hidden":false},{"_id":"6715fc2cac8cb216a41877ac","name":"Xiao Zhang","hidden":false},{"_id":"6715fc2cac8cb216a41877ad","user":{"_id":"61f40f56b34a114b58db2fed","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1643384634931-noauth.png","isPro":false,"fullname":"Haining Wang","user":"haining","type":"user"},"name":"Haining Wang","status":"admin_assigned","statusLastChangedAt":"2024-10-21T12:05:09.237Z","hidden":false},{"_id":"6715fc2cac8cb216a41877ae","user":{"_id":"6479f4317c18dca75e9a9324","avatarUrl":"/avatars/9aa709230b057f57ee4415c04a622c63.svg","isPro":false,"fullname":"Xie","user":"QianqianXie1994","type":"user"},"name":"Qianqian Xie","status":"admin_assigned","statusLastChangedAt":"2024-10-21T12:05:41.140Z","hidden":false},{"_id":"6715fc2cac8cb216a41877af","user":{"_id":"63b58ed5889aa6707f0bb0f4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b58ed5889aa6707f0bb0f4/znl74_aMswlV8VtHrfj3G.jpeg","isPro":true,"fullname":"Jimin Huang","user":"jiminHuang","type":"user"},"name":"Jimin Huang","status":"admin_assigned","statusLastChangedAt":"2024-10-21T12:06:03.105Z","hidden":false},{"_id":"6715fc2cac8cb216a41877b0","name":"Honghai Yu","hidden":false},{"_id":"6715fc2cac8cb216a41877b1","name":"Benyou Wang","hidden":false}],"publishedAt":"2024-10-17T22:03:52.000Z","submittedOnDailyAt":"2024-10-21T05:46:41.625Z","title":"UCFE: A User-Centric Financial Expertise Benchmark for Large Language\n Models","submittedOnDailyBy":{"_id":"643c047326f177a3e41627b6","avatarUrl":"/avatars/ade75cebd049daf080ba80a80d516240.svg","isPro":false,"fullname":"Yifei Zhang","user":"amstrongzyf","type":"user"},"summary":"This paper introduces the UCFE: User-Centric Financial Expertise benchmark,\nan innovative framework designed to evaluate the ability of large language\nmodels (LLMs) to handle complex real-world financial tasks. UCFE benchmark\nadopts a hybrid approach that combines human expert evaluations with dynamic,\ntask-specific interactions to simulate the complexities of evolving financial\nscenarios. Firstly, we conducted a user study involving 804 participants,\ncollecting their feedback on financial tasks. Secondly, based on this feedback,\nwe created our dataset that encompasses a wide range of user intents and\ninteractions. This dataset serves as the foundation for benchmarking 12 LLM\nservices using the LLM-as-Judge methodology. Our results show a significant\nalignment between benchmark scores and human preferences, with a Pearson\ncorrelation coefficient of 0.78, confirming the effectiveness of the UCFE\ndataset and our evaluation approach. UCFE benchmark not only reveals the\npotential of LLMs in the financial sector but also provides a robust framework\nfor assessing their performance and user satisfaction.The benchmark dataset and\nevaluation code are available.","upvotes":63,"discussionId":"6715fc2dac8cb216a41877db","githubRepo":"https://github.com/TobyYang7/UCFE-Benchmark","githubRepoAddedBy":"auto","ai_summary":"A user-centric framework evaluates LLMs in financial tasks through human feedback and task-specific interactions, achieving high correlation with human preferences.","ai_keywords":["Large language models (LLMs)","LLM-as-Judge methodology","User-Centric Financial Expertise (UCFE)","hybrid approach","human expert evaluations","task-specific interactions","user intentions","benchmark scores","Pearson correlation coefficient"],"githubStars":3},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66cef20050bb4cbe31fc2247","avatarUrl":"/avatars/4c063b4caa059ae71232b72ff7c49ed5.svg","isPro":false,"fullname":"zhang","user":"stevzhang","type":"user"},{"_id":"66a7975b2f8a3f480a8f59b9","avatarUrl":"/avatars/bee7af9a76e157fbe98b9f9e529c3015.svg","isPro":false,"fullname":"Ming","user":"Karthusw","type":"user"},{"_id":"6715ff68d3505f282a191259","avatarUrl":"/avatars/163dafeb6e7adb5a33e7922d0466d190.svg","isPro":false,"fullname":"aoran","user":"cgar123","type":"user"},{"_id":"6716017031eba367c630b96b","avatarUrl":"/avatars/a6a12aa2e52320d2e260fc3cf6a39a56.svg","isPro":false,"fullname":"c","user":"long1231223","type":"user"},{"_id":"6716023c1f90c4c9c3a010f2","avatarUrl":"/avatars/4dc47951bd89a5b2986351ecd326369a.svg","isPro":false,"fullname":"houpeng xia","user":"riceshower11","type":"user"},{"_id":"671603bc002047ccd18e398f","avatarUrl":"/avatars/9f72f7ab10332ff2eb5e34517e124cd0.svg","isPro":false,"fullname":"RyanGu","user":"RyanGu","type":"user"},{"_id":"671604c5cea4db5528fe4aa1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/KeADwm7JG0cYDRzDHfCRx.jpeg","isPro":false,"fullname":"yuankai66","user":"yuankai66","type":"user"},{"_id":"671607127a8f3964a9f5002a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/dZsjY6CJKBwH97mzU_2s9.png","isPro":false,"fullname":"Xiao Cai","user":"largechestnut","type":"user"},{"_id":"639883cb11095028d87b78c1","avatarUrl":"/avatars/0bd2e430affd0a1a1a85a61a8394a438.svg","isPro":false,"fullname":"Melih Özcan","user":"staycoolish","type":"user"},{"_id":"6168218a4ed0b975c18f82a8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6168218a4ed0b975c18f82a8/vD4Q6KVcz5Td39QWTG-s7.png","isPro":true,"fullname":"NIONGOLO Chrys Fé-Marty","user":"Svngoku","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"65fbdbc8fc9132a2dfd67c8f","avatarUrl":"/avatars/9e404fca5f7b53d49e1aa73d525f834d.svg","isPro":false,"fullname":"Minghao Wu","user":"magicc0nch","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">

Papers

arxiv:2410.14059

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Published on Oct 17, 2024

· Submitted by

Yifei Zhang on Oct 21, 2024

#1 Paper of the day

Upvote

Authors:

Yuzhe Yang ,

Yifei Zhang ,

Ruoli Gan ,

Yueru He ,

Mingcong Lei ,

Haining Wang ,

Qianqian Xie ,

Jimin Huang ,

Abstract

A user-centric framework evaluates LLMs in financial tasks through human feedback and task-specific interactions, achieving high correlation with human preferences.

AI-generated summary

This paper introduces the UCFE: User-Centric Financial Expertise benchmark, an innovative framework designed to evaluate the ability of large language models (LLMs) to handle complex real-world financial tasks. UCFE benchmark adopts a hybrid approach that combines human expert evaluations with dynamic, task-specific interactions to simulate the complexities of evolving financial scenarios. Firstly, we conducted a user study involving 804 participants, collecting their feedback on financial tasks. Secondly, based on this feedback, we created our dataset that encompasses a wide range of user intents and interactions. This dataset serves as the foundation for benchmarking 12 LLM services using the LLM-as-Judge methodology. Our results show a significant alignment between benchmark scores and human preferences, with a Pearson correlation coefficient of 0.78, confirming the effectiveness of the UCFE dataset and our evaluation approach. UCFE benchmark not only reveals the potential of LLMs in the financial sector but also provides a robust framework for assessing their performance and user satisfaction.The benchmark dataset and evaluation code are available.