$\"longmemeval_dailypaper.png\"$

\n","updatedAt":"2024-10-15T04:31:29.127Z","author":{"_id":"639bf367445b133a4e97ef9c","avatarUrl":"/avatars/51b59f4616a01796e07c05c9aa5286f8.svg","fullname":"Di Wu","name":"xiaowu0162","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false}},"numEdits":0,"editors":["xiaowu0162"],"editorAvatarUrls":["/avatars/51b59f4616a01796e07c05c9aa5286f8.svg"],"reactions":[],"isReport":false}},{"id":"670f17dc97bbe79a502b0a32","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-10-16T01:33:16.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [LongGenBench: Long-context Generation Benchmark](https://huggingface.co/papers/2410.04199) (2024)\n* [MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery](https://huggingface.co/papers/2409.05591) (2024)\n* [SEGMENT+: Long Text Processing with Short-Context Language Models](https://huggingface.co/papers/2410.06519) (2024)\n* [ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering](https://huggingface.co/papers/2410.03227) (2024)\n* [MemLong: Memory-Augmented Retrieval for Long Text Modeling](https://huggingface.co/papers/2408.16967) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

LongGenBench: Long-context Generation Benchmark (2024)
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery (2024)
SEGMENT+: Long Text Processing with Short-Context Language Models (2024)
ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering (2024)
MemLong: Memory-Augmented Retrieval for Long Text Modeling (2024)

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-10-16T01:33:16.993Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2410.10813","authors":[{"_id":"670defb480363ae01368542c","user":{"_id":"639bf367445b133a4e97ef9c","avatarUrl":"/avatars/51b59f4616a01796e07c05c9aa5286f8.svg","isPro":false,"fullname":"Di Wu","user":"xiaowu0162","type":"user"},"name":"Di Wu","status":"claimed_verified","statusLastChangedAt":"2024-10-15T08:07:50.570Z","hidden":false},{"_id":"670defb480363ae01368542d","name":"Hongwei Wang","hidden":false},{"_id":"670defb480363ae01368542e","user":{"_id":"5feab3a28a3201f8e554c969","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1660795228685-5feab3a28a3201f8e554c969.png","isPro":false,"fullname":"Wenhao Yu","user":"wyu1","type":"user"},"name":"Wenhao Yu","status":"admin_assigned","statusLastChangedAt":"2024-10-15T16:41:35.824Z","hidden":false},{"_id":"670defb480363ae01368542f","user":{"_id":"663187c1a2354b0f50ab10a0","avatarUrl":"/avatars/bef0fd9d2afa6a4990bc32bd55cbe163.svg","isPro":false,"fullname":"Yuwei Zhang","user":"YWZBrandon","type":"user"},"name":"Yuwei Zhang","status":"claimed_verified","statusLastChangedAt":"2024-10-15T21:17:57.174Z","hidden":false},{"_id":"670defb480363ae013685430","user":{"_id":"60b7b9d71b90c5d07c23fbd0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1622653364258-noauth.jpeg","isPro":false,"fullname":"Kai-Wei Chang","user":"kaiweichang","type":"user"},"name":"Kai-Wei Chang","status":"admin_assigned","statusLastChangedAt":"2024-10-15T16:42:29.388Z","hidden":false},{"_id":"670defb480363ae013685431","name":"Dong Yu","hidden":false}],"publishedAt":"2024-10-14T17:59:44.000Z","submittedOnDailyAt":"2024-10-15T03:01:29.103Z","title":"LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive\n Memory","submittedOnDailyBy":{"_id":"639bf367445b133a4e97ef9c","avatarUrl":"/avatars/51b59f4616a01796e07c05c9aa5286f8.svg","isPro":false,"fullname":"Di Wu","user":"xiaowu0162","type":"user"},"summary":"Recent large language model (LLM)-driven chat assistant systems have\nintegrated memory components to track user-assistant chat histories, enabling\nmore accurate and personalized responses. However, their long-term memory\ncapabilities in sustained interactions remain underexplored. This paper\nintroduces LongMemEval, a comprehensive benchmark designed to evaluate five\ncore long-term memory abilities of chat assistants: information extraction,\nmulti-session reasoning, temporal reasoning, knowledge updates, and abstention.\nWith 500 meticulously curated questions embedded within freely scalable\nuser-assistant chat histories, LongMemEval presents a significant challenge to\nexisting long-term memory systems, with commercial chat assistants and\nlong-context LLMs showing 30% accuracy drop on memorizing information across\nsustained interactions. We then present a unified framework that breaks down\nthe long-term memory design into four design choices across the indexing,\nretrieval, and reading stages. Built upon key experimental insights, we propose\nseveral memory designs including session decomposition for optimizing value\ngranularity, fact-augmented key expansion for enhancing the index structure,\nand time-aware query expansion for refining the search scope. Experiment\nresults show that these optimizations greatly improve both memory recall and\ndownstream question answering on LongMemEval. Overall, our study provides\nvaluable resources and guidance for advancing the long-term memory capabilities\nof LLM-based chat assistants, paving the way toward more personalized and\nreliable conversational AI.","upvotes":14,"discussionId":"670defb680363ae0136854c2","githubRepo":"https://github.com/xiaowu0162/longmemeval","githubRepoAddedBy":"auto","ai_summary":"LongMemEval assesses long-term memory in chat assistants through five core abilities, identifying gaps and proposing memory design optimizations that enhance recall and question answering.","ai_keywords":["long-term memory","LongMemEval","information extraction","multi-session reasoning","temporal reasoning","knowledge updates","abstention","indexing","retrieval","reading stages","session decomposition","fact-augmented key expansion","time-aware query expansion"],"githubStars":407},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"639bf367445b133a4e97ef9c","avatarUrl":"/avatars/51b59f4616a01796e07c05c9aa5286f8.svg","isPro":false,"fullname":"Di Wu","user":"xiaowu0162","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"643b19f8a856622f978df30f","avatarUrl":"/avatars/c82779fdf94f80cdb5020504f83c818b.svg","isPro":false,"fullname":"Yatharth Sharma","user":"YaTharThShaRma999","type":"user"},{"_id":"65025370b6595dc45c397340","avatarUrl":"/avatars/9469599b176034548042922c0afa7051.svg","isPro":false,"fullname":"J C","user":"dark-pen","type":"user"},{"_id":"663187c1a2354b0f50ab10a0","avatarUrl":"/avatars/bef0fd9d2afa6a4990bc32bd55cbe163.svg","isPro":false,"fullname":"Yuwei Zhang","user":"YWZBrandon","type":"user"},{"_id":"64c8b2c5c547ed5243e14a6e","avatarUrl":"/avatars/96d4a9010f96001c8cff235915926390.svg","isPro":false,"fullname":"Feng Yao","user":"fengyao1909","type":"user"},{"_id":"64ae4f6280f308a395fd7c19","avatarUrl":"/avatars/5f1330f8187cd5e66aa517303659f110.svg","isPro":false,"fullname":"Kaixin Ma","user":"kaixinm","type":"user"},{"_id":"65decc75beffeb39ba679eba","avatarUrl":"/avatars/735b678bd5863a0c1b1bdd3bbf8858fa.svg","isPro":true,"fullname":"r","user":"oceansweep","type":"user"},{"_id":"5f32b2367e583543386214d9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1635314457124-5f32b2367e583543386214d9.jpeg","isPro":false,"fullname":"Sergei Averkiev","user":"averoo","type":"user"},{"_id":"67b19419ed70237ba49d29da","avatarUrl":"/avatars/10231935b867dd425c0c2f2969448a63.svg","isPro":false,"fullname":"weixuchen","user":"KageXu","type":"user"},{"_id":"660a87a6e5a164f3f53d9025","avatarUrl":"/avatars/d7918a03238abaed397a90e2ae62f9be.svg","isPro":false,"fullname":"Đoàn Ngọc Cường","user":"doanngoccuong","type":"user"},{"_id":"637ecb6c6df7e8f7df7694ba","avatarUrl":"/avatars/9f4d532e7bae2467c8839e102d02e3a9.svg","isPro":false,"fullname":"Geoff","user":"xbno","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">

Papers

arxiv:2410.10813

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Published on Oct 14, 2024

· Submitted by

Di Wu on Oct 15, 2024

Upvote

Authors:

Di Wu ,

Wenhao Yu ,

Yuwei Zhang ,

Kai-Wei Chang ,

Abstract

LongMemEval assesses long-term memory in chat assistants through five core abilities, identifying gaps and proposing memory design optimizations that enhance recall and question answering.

AI-generated summary

Recent large language model (LLM)-driven chat assistant systems have integrated memory components to track user-assistant chat histories, enabling more accurate and personalized responses. However, their long-term memory capabilities in sustained interactions remain underexplored. This paper introduces LongMemEval, a comprehensive benchmark designed to evaluate five core long-term memory abilities of chat assistants: information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention. With 500 meticulously curated questions embedded within freely scalable user-assistant chat histories, LongMemEval presents a significant challenge to existing long-term memory systems, with commercial chat assistants and long-context LLMs showing 30% accuracy drop on memorizing information across sustained interactions. We then present a unified framework that breaks down the long-term memory design into four design choices across the indexing, retrieval, and reading stages. Built upon key experimental insights, we propose several memory designs including session decomposition for optimizing value granularity, fact-augmented key expansion for enhancing the index structure, and time-aware query expansion for refining the search scope. Experiment results show that these optimizations greatly improve both memory recall and downstream question answering on LongMemEval. Overall, our study provides valuable resources and guidance for advancing the long-term memory capabilities of LLM-based chat assistants, paving the way toward more personalized and reliable conversational AI.