Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-10-04T01:34:37.023Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7226567268371582},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"68e0ead7e83908c1fc417e60","author":{"_id":"648a374f00f7a3374ee64b99","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648a374f00f7a3374ee64b99/YPwSOrronoozwHbJchPn3.jpeg","fullname":"Caleb Fahlgren","name":"cfahlgren1","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":309,"isUserFollowing":false},"createdAt":"2025-10-04T09:37:27.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"really cool paper!","html":"

really cool paper!

\n","updatedAt":"2025-10-04T09:37:27.179Z","author":{"_id":"648a374f00f7a3374ee64b99","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648a374f00f7a3374ee64b99/YPwSOrronoozwHbJchPn3.jpeg","fullname":"Caleb Fahlgren","name":"cfahlgren1","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":309,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9305157661437988},"editors":["cfahlgren1"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/648a374f00f7a3374ee64b99/YPwSOrronoozwHbJchPn3.jpeg"],"reactions":[],"isReport":false}},{"id":"68e457bd98f7b51dd01c2c10","author":{"_id":"68d4296c5b1f91f686586755","avatarUrl":"/avatars/2da6e0292a8737e144e0af5acb84e87a.svg","fullname":"phil d.","name":"y8phi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-10-06T23:58:53.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"fascinating!\n","html":"

fascinating!

\n","updatedAt":"2025-10-06T23:58:53.489Z","author":{"_id":"68d4296c5b1f91f686586755","avatarUrl":"/avatars/2da6e0292a8737e144e0af5acb84e87a.svg","fullname":"phil d.","name":"y8phi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5243691802024841},"editors":["y8phi"],"editorAvatarUrls":["/avatars/2da6e0292a8737e144e0af5acb84e87a.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2510.02209","authors":[{"_id":"68df3031df49fb0df1e03c3a","user":{"_id":"66495ca132236a0fad5d3124","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66495ca132236a0fad5d3124/M_hHgHtV0n2OCV28PKRr9.jpeg","isPro":false,"fullname":"Chen Yanxu","user":"CCCCCyx","type":"user"},"name":"Yanxu Chen","status":"claimed_verified","statusLastChangedAt":"2025-10-10T09:39:14.977Z","hidden":false},{"_id":"68df3031df49fb0df1e03c3b","name":"Zijun Yao","hidden":false},{"_id":"68df3031df49fb0df1e03c3c","name":"Yantao Liu","hidden":false},{"_id":"68df3031df49fb0df1e03c3d","name":"Jin Ye","hidden":false},{"_id":"68df3031df49fb0df1e03c3e","name":"Jianing Yu","hidden":false},{"_id":"68df3031df49fb0df1e03c3f","name":"Lei Hou","hidden":false},{"_id":"68df3031df49fb0df1e03c40","name":"Juanzi Li","hidden":false}],"publishedAt":"2025-10-02T16:54:57.000Z","submittedOnDailyAt":"2025-10-03T00:38:58.116Z","title":"StockBench: Can LLM Agents Trade Stocks Profitably In Real-world\n Markets?","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},"summary":"Large language models (LLMs) have recently demonstrated strong capabilities\nas autonomous agents, showing promise in reasoning, tool use, and sequential\ndecision-making. While prior benchmarks have evaluated LLM agents in domains\nsuch as software engineering and scientific discovery, the finance domain\nremains underexplored, despite its direct relevance to economic value and\nhigh-stakes decision-making. Existing financial benchmarks primarily test\nstatic knowledge through question answering, but they fall short of capturing\nthe dynamic and iterative nature of trading. To address this gap, we introduce\nStockBench, a contamination-free benchmark designed to evaluate LLM agents in\nrealistic, multi-month stock trading environments. Agents receive daily market\nsignals -- including prices, fundamentals, and news -- and must make sequential\nbuy, sell, or hold decisions. Performance is assessed using financial metrics\nsuch as cumulative return, maximum drawdown, and the Sortino ratio. Our\nevaluation of state-of-the-art proprietary (e.g., GPT-5, Claude-4) and\nopen-weight (e.g., Qwen3, Kimi-K2, GLM-4.5) models shows that while most LLM\nagents struggle to outperform the simple buy-and-hold baseline, several models\ndemonstrate the potential to deliver higher returns and manage risk more\neffectively. These findings highlight both the challenges and opportunities in\ndeveloping LLM-powered financial agents, showing that excelling at static\nfinancial knowledge tasks does not necessarily translate into successful\ntrading strategies. We release StockBench as an open-source resource to support\nreproducibility and advance future research in this domain.","upvotes":56,"discussionId":"68df3031df49fb0df1e03c41","projectPage":"https://stockbench.github.io/","githubRepo":"https://github.com/ChenYXxxx/stockbench","githubRepoAddedBy":"auto","ai_summary":"StockBench evaluates large language models in realistic stock trading environments, revealing challenges and opportunities in developing LLM-powered financial agents.","ai_keywords":["large language models","LLM agents","reasoning","tool use","sequential decision-making","software engineering","scientific discovery","finance domain","financial benchmarks","question answering","dynamic trading","contamination-free benchmark","daily market signals","prices","fundamentals","news","sequential buy","sell","hold decisions","cumulative return","maximum drawdown","Sortino ratio","GPT-5","Claude-4","Qwen3","Kimi-K2","GLM-4.5","buy-and-hold baseline","static financial knowledge tasks","trading strategies"],"githubStars":142},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646def60df618b303b419323/JLJGYen4-5M8ivsLsSk0w.jpeg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"},{"_id":"62e25e2247678ea5ce1b1786","avatarUrl":"/avatars/1bb32e7597a9b1c89c434cbf550b5382.svg","isPro":false,"fullname":"Yantao Liu","user":"RicardoL1u","type":"user"},{"_id":"648c48d8c0ddeee6df5b6d22","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648c48d8c0ddeee6df5b6d22/BlrYDv3eQxZ-Y5vtVGegX.jpeg","isPro":false,"fullname":"Shangqing Tu","user":"tsq2000","type":"user"},{"_id":"64b74920fe6a108d03fed767","avatarUrl":"/avatars/a2c05b809c36fa5fab8e1a43b3e67051.svg","isPro":false,"fullname":"Minki Kang","user":"Nardien","type":"user"},{"_id":"6613a7d542da659656d85d28","avatarUrl":"/avatars/d052f49b3ae62708c5bcdb4bc34ffc5a.svg","isPro":false,"fullname":"Fatty","user":"FattyFatty","type":"user"},{"_id":"652542861e9db26e407aa1fc","avatarUrl":"/avatars/4c47ef4564f498a7f34b4a17a1e209a8.svg","isPro":false,"fullname":"Lee Zhicheng","user":"ZhiCheng0326","type":"user"},{"_id":"64ed568ccf6118a9379a61b8","avatarUrl":"/avatars/6d040cbcb4a9b624cbe64c9d01cd5c88.svg","isPro":false,"fullname":"Yushi Bai","user":"bys0318","type":"user"},{"_id":"648c4b46e549be47af1aafcd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648c4b46e549be47af1aafcd/YgOHbmUM2EDM-lb7GdqXz.jpeg","isPro":false,"fullname":"Zijun","user":"TranSirius","type":"user"},{"_id":"66235c6544bc96a4aa5f1bcb","avatarUrl":"/avatars/e3cc4503cc130ec093dd1cf471ba7b91.svg","isPro":false,"fullname":"sp","user":"Sparidae","type":"user"},{"_id":"65ba0cee23ca901c2f1cae2f","avatarUrl":"/avatars/9af64253c12142bfd0c6205fb19079dd.svg","isPro":false,"fullname":"Galling","user":"GallingYu","type":"user"},{"_id":"660bf98c3336a7e128a0e918","avatarUrl":"/avatars/3e3f2886bd4a730ec19b13aecc99279f.svg","isPro":false,"fullname":"Amy Xin","user":"amyxx2001","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2510.02209

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?

Published on Oct 2, 2025
· Submitted by
taesiri
on Oct 3, 2025
Authors:
,
,
,
,
,

Abstract

StockBench evaluates large language models in realistic stock trading environments, revealing challenges and opportunities in developing LLM-powered financial agents.

AI-generated summary

Large language models (LLMs) have recently demonstrated strong capabilities as autonomous agents, showing promise in reasoning, tool use, and sequential decision-making. While prior benchmarks have evaluated LLM agents in domains such as software engineering and scientific discovery, the finance domain remains underexplored, despite its direct relevance to economic value and high-stakes decision-making. Existing financial benchmarks primarily test static knowledge through question answering, but they fall short of capturing the dynamic and iterative nature of trading. To address this gap, we introduce StockBench, a contamination-free benchmark designed to evaluate LLM agents in realistic, multi-month stock trading environments. Agents receive daily market signals -- including prices, fundamentals, and news -- and must make sequential buy, sell, or hold decisions. Performance is assessed using financial metrics such as cumulative return, maximum drawdown, and the Sortino ratio. Our evaluation of state-of-the-art proprietary (e.g., GPT-5, Claude-4) and open-weight (e.g., Qwen3, Kimi-K2, GLM-4.5) models shows that while most LLM agents struggle to outperform the simple buy-and-hold baseline, several models demonstrate the potential to deliver higher returns and manage risk more effectively. These findings highlight both the challenges and opportunities in developing LLM-powered financial agents, showing that excelling at static financial knowledge tasks does not necessarily translate into successful trading strategies. We release StockBench as an open-source resource to support reproducibility and advance future research in this domain.

Community

Paper submitter

Large language models (LLMs) have recently demonstrated strong capabilities as autonomous agents, showing promise in reasoning, tool use, and sequential decision-making. While prior benchmarks have evaluated LLM agents in domains such as software engineering and scientific discovery, the finance domain remains underexplored, despite its direct relevance to economic value and high-stakes decision-making. Existing financial benchmarks primarily test static knowledge through question answering, but they fall short of capturing the dynamic and iterative nature of trading. To address this gap, we introduce StockBench, a contamination-free benchmark designed to evaluate LLM agents in realistic, multi-month stock trading environments. Agents receive daily market signals -- including prices, fundamentals, and news -- and must make sequential buy, sell, or hold decisions. Performance is assessed using financial metrics such as cumulative return, maximum drawdown, and the Sortino ratio. Our evaluation of state-of-the-art proprietary (e.g., GPT-5, Claude-4) and open-weight (e.g., Qwen3, Kimi-K2, GLM-4.5) models shows that while most LLM agents struggle to outperform the simple buy-and-hold baseline, several models demonstrate the potential to deliver higher returns and manage risk more effectively. These findings highlight both the challenges and opportunities in developing LLM-powered financial agents, showing that excelling at static financial knowledge tasks does not necessarily translate into successful trading strategies. We release StockBench as an open-source resource to support reproducibility and advance future research in this domain.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

really cool paper!

fascinating!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.02209 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.02209 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.02209 in a Space README.md to link it from this page.

Collections including this paper 17