Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
[go: Go Back, main page]

https://rookiexiong7.github.io/projects/SeC/
šŸ’» Code: https://github.com/OpenIXCLab/SeC
šŸ“Š Dataset: https://huggingface.co/datasets/OpenIXCLab/SeCVOS

\n","updatedAt":"2025-07-22T05:54:25.637Z","author":{"_id":"64b4eec4faa3181a5eab9c46","avatarUrl":"/avatars/bcc9bf5cbf67546ad2b4c9ec8b96ac96.svg","fullname":"Jiaqi Wang","name":"myownskyW7","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":25,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7187488675117493},"editors":["myownskyW7"],"editorAvatarUrls":["/avatars/bcc9bf5cbf67546ad2b4c9ec8b96ac96.svg"],"reactions":[{"reaction":"šŸ”„","users":["AdinaY"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2507.15852","authors":[{"_id":"687efbab33947f780d9b4aee","user":{"_id":"65ab5332043d53781a115475","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ab5332043d53781a115475/UaxSFDWteYsByzx7G_KKy.jpeg","isPro":false,"fullname":"Zhixiong Zhang (SII)","user":"rookiexiong","type":"user"},"name":"Zhixiong Zhang","status":"claimed_verified","statusLastChangedAt":"2025-07-22T07:48:03.855Z","hidden":false},{"_id":"687efbab33947f780d9b4aef","name":"Shuangrui Ding","hidden":false},{"_id":"687efbab33947f780d9b4af0","user":{"_id":"68943a6e8d3fb6db77ce2874","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/7LBz6WhQmiKxOqZByJhr9.jpeg","isPro":false,"fullname":"Xiaoyi Dong","user":"LightDong","type":"user"},"name":"Xiaoyi Dong","status":"claimed_verified","statusLastChangedAt":"2025-08-07T10:42:58.628Z","hidden":false},{"_id":"687efbab33947f780d9b4af1","name":"Songxin He","hidden":false},{"_id":"687efbab33947f780d9b4af2","name":"Jianfan Lin","hidden":false},{"_id":"687efbab33947f780d9b4af3","name":"Junsong Tang","hidden":false},{"_id":"687efbab33947f780d9b4af4","user":{"_id":"63859cf3b2906edaf83af9f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63859cf3b2906edaf83af9f0/kajwuVzd4pDucSPlwghxo.png","isPro":true,"fullname":"Yuhang Zang","user":"yuhangzang","type":"user"},"name":"Yuhang Zang","status":"claimed_verified","statusLastChangedAt":"2025-07-22T13:05:45.709Z","hidden":false},{"_id":"687efbab33947f780d9b4af5","name":"Yuhang Cao","hidden":false},{"_id":"687efbab33947f780d9b4af6","name":"Dahua Lin","hidden":false},{"_id":"687efbab33947f780d9b4af7","user":{"_id":"64b4eec4faa3181a5eab9c46","avatarUrl":"/avatars/bcc9bf5cbf67546ad2b4c9ec8b96ac96.svg","isPro":true,"fullname":"Jiaqi Wang","user":"myownskyW7","type":"user"},"name":"Jiaqi Wang","status":"claimed_verified","statusLastChangedAt":"2025-08-14T13:40:31.337Z","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/64b4eec4faa3181a5eab9c46/pgY_J9jZ7-tdgqX8uLYeA.mp4"],"publishedAt":"2025-07-21T17:59:02.000Z","submittedOnDailyAt":"2025-07-22T04:24:25.629Z","title":"SeC: Advancing Complex Video Object Segmentation via Progressive Concept\n Construction","submittedOnDailyBy":{"_id":"64b4eec4faa3181a5eab9c46","avatarUrl":"/avatars/bcc9bf5cbf67546ad2b4c9ec8b96ac96.svg","isPro":true,"fullname":"Jiaqi Wang","user":"myownskyW7","type":"user"},"summary":"Video Object Segmentation (VOS) is a core task in computer vision, requiring\nmodels to track and segment target objects across video frames. Despite notable\nadvances with recent efforts, current techniques still lag behind human\ncapabilities in handling drastic visual variations, occlusions, and complex\nscene changes. This limitation arises from their reliance on appearance\nmatching, neglecting the human-like conceptual understanding of objects that\nenables robust identification across temporal dynamics. Motivated by this gap,\nwe propose Segment Concept (SeC), a concept-driven segmentation framework that\nshifts from conventional feature matching to the progressive construction and\nutilization of high-level, object-centric representations. SeC employs Large\nVision-Language Models (LVLMs) to integrate visual cues across diverse frames,\nconstructing robust conceptual priors. During inference, SeC forms a\ncomprehensive semantic representation of the target based on processed frames,\nrealizing robust segmentation of follow-up frames. Furthermore, SeC adaptively\nbalances LVLM-based semantic reasoning with enhanced feature matching,\ndynamically adjusting computational efforts based on scene complexity. To\nrigorously assess VOS methods in scenarios demanding high-level conceptual\nreasoning and robust semantic understanding, we introduce the Semantic Complex\nScenarios Video Object Segmentation benchmark (SeCVOS). SeCVOS comprises 160\nmanually annotated multi-scenario videos designed to challenge models with\nsubstantial appearance variations and dynamic scene transformations. In\nparticular, SeC achieves an 11.8-point improvement over SAM 2.1 on SeCVOS,\nestablishing a new state-of-the-art in concept-aware video object segmentation.","upvotes":38,"discussionId":"687efbab33947f780d9b4af8","projectPage":"https://rookiexiong7.github.io/projects/SeC","githubRepo":"https://github.com/OpenIXCLab/SeC","githubRepoAddedBy":"user","ai_summary":"A concept-driven segmentation framework using Large Vision-Language Models improves video object segmentation by integrating high-level semantic reasoning and adapting to scene complexity.","ai_keywords":["Video Object Segmentation","VOS","Large Vision-Language Models","LVLMs","semantic representation","feature matching","SeCVOS","concept-aware segmentation"],"githubStars":269},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64b4eec4faa3181a5eab9c46","avatarUrl":"/avatars/bcc9bf5cbf67546ad2b4c9ec8b96ac96.svg","isPro":true,"fullname":"Jiaqi Wang","user":"myownskyW7","type":"user"},{"_id":"6564a63de8d6a163c0a4fd2b","avatarUrl":"/avatars/a4b7cc9ea6e4023f150fd745cab3a883.svg","isPro":false,"fullname":"Azusa Aisaka","user":"Bujiazi","type":"user"},{"_id":"64adfeac4beffa272dfaef21","avatarUrl":"/avatars/883f6ba38b993476115dfafcef9ce3c1.svg","isPro":false,"fullname":"Yifei Li","user":"JoeLeelyf","type":"user"},{"_id":"65000bef18830fabea469fdd","avatarUrl":"/avatars/b320c77dfad039d9f9c54127f610d44f.svg","isPro":false,"fullname":"Cao Yuhang","user":"yhcao","type":"user"},{"_id":"64b51fd8bcfd8542d6473d9a","avatarUrl":"/avatars/ceaa73b79f448996187f07733d96b800.svg","isPro":false,"fullname":"yujie","user":"yujieouo","type":"user"},{"_id":"6444f0a8b272430bdbf11785","avatarUrl":"/avatars/5135f817e638e97b280a28ba90d4381c.svg","isPro":false,"fullname":"laolao","user":"laolao77","type":"user"},{"_id":"64de20c5808492ba6e65d124","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64de20c5808492ba6e65d124/58IX_TI5vJw73qS1knw56.jpeg","isPro":false,"fullname":"Zhang Mengchen","user":"Dubhe-zmc","type":"user"},{"_id":"63fda3fced9eead590ff6918","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1677566802735-noauth.jpeg","isPro":false,"fullname":"Zeyi Sun","user":"Zery","type":"user"},{"_id":"667a5d6669c3e097d57356f6","avatarUrl":"/avatars/630a3bf4c06e8e87c37b829831912b39.svg","isPro":false,"fullname":"Rui","user":"srding","type":"user"},{"_id":"65ab5332043d53781a115475","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ab5332043d53781a115475/UaxSFDWteYsByzx7G_KKy.jpeg","isPro":false,"fullname":"Zhixiong Zhang (SII)","user":"rookiexiong","type":"user"},{"_id":"63859cf3b2906edaf83af9f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63859cf3b2906edaf83af9f0/kajwuVzd4pDucSPlwghxo.png","isPro":true,"fullname":"Yuhang Zang","user":"yuhangzang","type":"user"},{"_id":"6373037cd3ba6bd3f9bc32fa","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6373037cd3ba6bd3f9bc32fa/dIL4AIcL24QL_Rp4B9ewS.png","isPro":false,"fullname":"Shuai Yang","user":"ysmikey","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2507.15852

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

Published on Jul 21, 2025
Ā· Submitted by
Jiaqi Wang
on Jul 22, 2025
Authors:
,
,
,
,
,
,

Abstract

A concept-driven segmentation framework using Large Vision-Language Models improves video object segmentation by integrating high-level semantic reasoning and adapting to scene complexity.

AI-generated summary

Video Object Segmentation (VOS) is a core task in computer vision, requiring models to track and segment target objects across video frames. Despite notable advances with recent efforts, current techniques still lag behind human capabilities in handling drastic visual variations, occlusions, and complex scene changes. This limitation arises from their reliance on appearance matching, neglecting the human-like conceptual understanding of objects that enables robust identification across temporal dynamics. Motivated by this gap, we propose Segment Concept (SeC), a concept-driven segmentation framework that shifts from conventional feature matching to the progressive construction and utilization of high-level, object-centric representations. SeC employs Large Vision-Language Models (LVLMs) to integrate visual cues across diverse frames, constructing robust conceptual priors. During inference, SeC forms a comprehensive semantic representation of the target based on processed frames, realizing robust segmentation of follow-up frames. Furthermore, SeC adaptively balances LVLM-based semantic reasoning with enhanced feature matching, dynamically adjusting computational efforts based on scene complexity. To rigorously assess VOS methods in scenarios demanding high-level conceptual reasoning and robust semantic understanding, we introduce the Semantic Complex Scenarios Video Object Segmentation benchmark (SeCVOS). SeCVOS comprises 160 manually annotated multi-scenario videos designed to challenge models with substantial appearance variations and dynamic scene transformations. In particular, SeC achieves an 11.8-point improvement over SAM 2.1 on SeCVOS, establishing a new state-of-the-art in concept-aware video object segmentation.

Community

Paper author Paper submitter

Sign up or log in to comment

Models citing this paper 3

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 6