Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - VideoMamba: State Space Model for Efficient Video Understanding
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-03-14T01:22:17.855Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7658246755599976},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"๐Ÿ‘","users":["Parthai"],"count":1}],"isReport":false}},{"id":"66652957382f6b0a96ce204a","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false},"createdAt":"2024-06-09T04:02:31.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# VideoMamba Unleashed: Next-Gen State Space Model for Video Mastery\n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/wJsDVn8BwAMuDj7SgYvjj.mp4 \n\n## Links ๐Ÿ”—:\n๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix\n๐Ÿ‘‰ Twitter: https://x.com/arxflix\n๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

\n\t\n\t\t\n\t\n\t\n\t\tVideoMamba Unleashed: Next-Gen State Space Model for Video Mastery\n\t\n

\n

\n\n

\n\t\n\t\t\n\t\n\t\n\t\tLinks ๐Ÿ”—:\n\t\n

\n

๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix
๐Ÿ‘‰ Twitter: https://x.com/arxflix
๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/

\n

By Arxflix
\"9t4iCUHx_400x400-1.jpg\"

\n","updatedAt":"2024-06-09T04:02:31.869Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5122768878936768},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2403.06977","authors":[{"_id":"65efca60e2f3a2d6588c8360","user":{"_id":"61fb81006374891646732f37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1643872995181-61fb81006374891646732f37.jpeg","isPro":false,"fullname":"Kunchang Li","user":"Andy1621","type":"user"},"name":"Kunchang Li","status":"claimed_verified","statusLastChangedAt":"2024-03-12T12:06:04.510Z","hidden":false},{"_id":"65efca60e2f3a2d6588c8361","user":{"_id":"64171b10f1e86908935a493a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64171b10f1e86908935a493a/tAwbHgTw714Vz-Qudz6Ij.jpeg","isPro":false,"fullname":"Xinhao Li","user":"lixinhao","type":"user"},"name":"Xinhao Li","status":"claimed_verified","statusLastChangedAt":"2025-05-26T08:48:46.432Z","hidden":false},{"_id":"65efca60e2f3a2d6588c8362","name":"Yi Wang","hidden":false},{"_id":"65efca60e2f3a2d6588c8363","user":{"_id":"62aafa49f29ff279b51f0182","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62aafa49f29ff279b51f0182/rQx8QFQGOY2qIhqJ8zSRj.jpeg","isPro":false,"fullname":"yinanhe","user":"ynhe","type":"user"},"name":"Yinan He","status":"claimed_verified","statusLastChangedAt":"2024-03-18T08:36:40.805Z","hidden":false},{"_id":"65efca60e2f3a2d6588c8364","name":"Yali Wang","hidden":false},{"_id":"65efca60e2f3a2d6588c8365","user":{"_id":"62c77f4352d8ae531f5511f9","avatarUrl":"/avatars/50198ccb02ccd286975a4613fbabee28.svg","isPro":false,"fullname":"Limin Wang","user":"lmwang","type":"user"},"name":"Limin Wang","status":"admin_assigned","statusLastChangedAt":"2024-03-12T12:24:18.661Z","hidden":false},{"_id":"65efca60e2f3a2d6588c8366","name":"Yu Qiao","hidden":false}],"publishedAt":"2024-03-11T17:59:34.000Z","submittedOnDailyAt":"2024-03-12T01:52:08.669Z","title":"VideoMamba: State Space Model for Efficient Video Understanding","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Addressing the dual challenges of local redundancy and global dependencies in\nvideo understanding, this work innovatively adapts the Mamba to the video\ndomain. The proposed VideoMamba overcomes the limitations of existing 3D\nconvolution neural networks and video transformers. Its linear-complexity\noperator enables efficient long-term modeling, which is crucial for\nhigh-resolution long video understanding. Extensive evaluations reveal\nVideoMamba's four core abilities: (1) Scalability in the visual domain without\nextensive dataset pretraining, thanks to a novel self-distillation technique;\n(2) Sensitivity for recognizing short-term actions even with fine-grained\nmotion differences; (3) Superiority in long-term video understanding,\nshowcasing significant advancements over traditional feature-based models; and\n(4) Compatibility with other modalities, demonstrating robustness in\nmulti-modal contexts. Through these distinct advantages, VideoMamba sets a new\nbenchmark for video understanding, offering a scalable and efficient solution\nfor comprehensive video understanding. All the code and models are available at\nhttps://github.com/OpenGVLab/VideoMamba.","upvotes":29,"discussionId":"65efca60e2f3a2d6588c8395","githubRepo":"https://github.com/opengvlab/videomamba","githubRepoAddedBy":"auto","ai_summary":"VideoMamba, an adaptation of Mamba, addresses local and global challenges in video understanding through a linear-complexity operator, self-distillation, and compatibility with multiple modalities, setting a new benchmark.","ai_keywords":["Mamba","VideoMamba","3D convolution neural networks","video transformers","linear-complexity operator","self-distillation","short-term actions","long-term video understanding","feature-based models","multi-modal contexts"],"githubStars":1080},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"630c2ddb86b8b9904c3860a6","avatarUrl":"/avatars/9b6cec2e9e269ccac1533eb7bf1ac2c5.svg","isPro":false,"fullname":"Igor Melnyk","user":"imelnyk","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"61fb81006374891646732f37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1643872995181-61fb81006374891646732f37.jpeg","isPro":false,"fullname":"Kunchang Li","user":"Andy1621","type":"user"},{"_id":"633b71b47af633cbcd0671d8","avatarUrl":"/avatars/6671941ced18ae516db6ebfbf73e239f.svg","isPro":false,"fullname":"juand4bot","user":"juandavidgf","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6289e1e6c65096f8c63be40e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653203427026-noauth.png","isPro":false,"fullname":"LazyPig","user":"SakuraD","type":"user"},{"_id":"6289e290edfa7a816db76774","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653203591668-noauth.png","isPro":false,"fullname":"Jack","user":"Jack9585","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"63ddc7b80f6d2d6c3efe3600","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ddc7b80f6d2d6c3efe3600/RX5q9T80Jl3tn6z03ls0l.jpeg","isPro":false,"fullname":"J","user":"dashfunnydashdash","type":"user"},{"_id":"63053858acc17ce4ad3580e6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63053858acc17ce4ad3580e6/Fg1bMOPRpOhk6xMhnCOi4.jpeg","isPro":false,"fullname":"Zhongpai Gao","user":"gaozhongpai","type":"user"},{"_id":"628de07e47dd49ea6c959d71","avatarUrl":"/avatars/1bd61550d8966f3aec3acf57cd4894f7.svg","isPro":false,"fullname":"linxi","user":"linxi","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2403.06977

VideoMamba: State Space Model for Efficient Video Understanding

Published on Mar 11, 2024
ยท Submitted by
AK
on Mar 12, 2024
Authors:
,
,

Abstract

VideoMamba, an adaptation of Mamba, addresses local and global challenges in video understanding through a linear-complexity operator, self-distillation, and compatibility with multiple modalities, setting a new benchmark.

AI-generated summary

Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain. The proposed VideoMamba overcomes the limitations of existing 3D convolution neural networks and video transformers. Its linear-complexity operator enables efficient long-term modeling, which is crucial for high-resolution long video understanding. Extensive evaluations reveal VideoMamba's four core abilities: (1) Scalability in the visual domain without extensive dataset pretraining, thanks to a novel self-distillation technique; (2) Sensitivity for recognizing short-term actions even with fine-grained motion differences; (3) Superiority in long-term video understanding, showcasing significant advancements over traditional feature-based models; and (4) Compatibility with other modalities, demonstrating robustness in multi-modal contexts. Through these distinct advantages, VideoMamba sets a new benchmark for video understanding, offering a scalable and efficient solution for comprehensive video understanding. All the code and models are available at https://github.com/OpenGVLab/VideoMamba.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

VideoMamba Unleashed: Next-Gen State Space Model for Video Mastery

Links ๐Ÿ”—:

๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix
๐Ÿ‘‰ Twitter: https://x.com/arxflix
๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.06977 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 14