Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding
with Explicit Logic Reasoning
@librarian-bot\n\t recommend\n","updatedAt":"2025-03-27T09:37:52.154Z","author":{"_id":"633de44a797cf1b030553d64","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/633de44a797cf1b030553d64/uvrAP6uYrKnpRxdcK8RKR.png","fullname":"ControlNet","name":"ControlNet","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":15,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7918877601623535},"editors":["ControlNet"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/633de44a797cf1b030553d64/uvrAP6uYrKnpRxdcK8RKR.png"],"reactions":[],"isReport":false},"replies":[{"id":"67e51c7c2b91be14f7cb2a5d","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-03-27T09:38:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Mind with Eyes: from Language Reasoning to Multimodal Reasoning](https://huggingface.co/papers/2503.18071) (2025)\n* [Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering](https://huggingface.co/papers/2503.14957) (2025)\n* [SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs](https://huggingface.co/papers/2502.03283) (2025)\n* [R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization](https://huggingface.co/papers/2503.10615) (2025)\n* [From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning](https://huggingface.co/papers/2502.05843) (2025)\n* [Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement](https://huggingface.co/papers/2503.06520) (2025)\n* [VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework](https://huggingface.co/papers/2502.00711) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-03-27T09:38:04.431Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7080087661743164},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"67e51c70080a33e3957ff335"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2502.00372","authors":[{"_id":"67d023e55df5f726b7cfb6ba","user":{"_id":"633de44a797cf1b030553d64","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/633de44a797cf1b030553d64/uvrAP6uYrKnpRxdcK8RKR.png","isPro":false,"fullname":"ControlNet","user":"ControlNet","type":"user"},"name":"Zhixi Cai","status":"claimed_verified","statusLastChangedAt":"2025-03-11T13:42:34.063Z","hidden":false},{"_id":"67d023e55df5f726b7cfb6bb","user":{"_id":"644fd2f317b6189cda578764","avatarUrl":"/avatars/d2a637ab50f11473aca17a0d82707ba3.svg","isPro":false,"fullname":"Fucai Ke","user":"Pokerme","type":"user"},"name":"Fucai Ke","status":"claimed_verified","statusLastChangedAt":"2025-09-04T08:48:37.398Z","hidden":false},{"_id":"67d023e55df5f726b7cfb6bc","name":"Simindokht Jahangard","hidden":false},{"_id":"67d023e55df5f726b7cfb6bd","name":"Maria Garcia de la Banda","hidden":false},{"_id":"67d023e55df5f726b7cfb6be","name":"Reza Haffari","hidden":false},{"_id":"67d023e55df5f726b7cfb6bf","name":"Peter J. Stuckey","hidden":false},{"_id":"67d023e55df5f726b7cfb6c0","name":"Hamid Rezatofighi","hidden":false}],"publishedAt":"2025-02-01T09:19:08.000Z","title":"NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding\n with Explicit Logic Reasoning","summary":"Visual Grounding (VG) tasks, such as referring expression detection and\nsegmentation tasks are important for linking visual entities to context,\nespecially in complex reasoning tasks that require detailed query\ninterpretation. This paper explores VG beyond basic perception, highlighting\nchallenges for methods that require reasoning like human cognition. Recent\nadvances in large language methods (LLMs) and Vision-Language methods (VLMs)\nhave improved abilities for visual comprehension, contextual understanding, and\nreasoning. These methods are mainly split into end-to-end and compositional\nmethods, with the latter offering more flexibility. Compositional approaches\nthat integrate LLMs and foundation models show promising performance but still\nstruggle with complex reasoning with language-based logical representations. To\naddress these limitations, we propose NAVER, a compositional visual grounding\nmethod that integrates explicit probabilistic logic reasoning within a\nfinite-state automaton, equipped with a self-correcting mechanism. This design\nimproves robustness and interpretability in inference through explicit logic\nreasoning. Our results show that NAVER achieves SoTA performance comparing to\nrecent end-to-end and compositional baselines. The code is available at\nhttps://github.com/ControlNet/NAVER .","upvotes":1,"discussionId":"67d023e75df5f726b7cfb741","githubRepo":"https://github.com/ControlNet/NAVER","githubRepoAddedBy":"user","ai_summary":"NAVER integrates probabilistic logic reasoning into a compositional visual grounding method, improving robustness and interpretability in complex reasoning tasks compared to existing baselines.","ai_keywords":["visual grounding","referring expression detection","segmentation tasks","large language methods (LLMs)","Vision-Language methods (VLMs)","end-to-end methods","compositional methods","finite-state automaton","self-correcting mechanism","SoTA performance"],"githubStars":28},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"633de44a797cf1b030553d64","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/633de44a797cf1b030553d64/uvrAP6uYrKnpRxdcK8RKR.png","isPro":false,"fullname":"ControlNet","user":"ControlNet","type":"user"}],"acceptLanguages":["*"]}">
NAVER integrates probabilistic logic reasoning into a compositional visual grounding method, improving robustness and interpretability in complex reasoning tasks compared to existing baselines.
AI-generated summary
Visual Grounding (VG) tasks, such as referring expression detection and
segmentation tasks are important for linking visual entities to context,
especially in complex reasoning tasks that require detailed query
interpretation. This paper explores VG beyond basic perception, highlighting
challenges for methods that require reasoning like human cognition. Recent
advances in large language methods (LLMs) and Vision-Language methods (VLMs)
have improved abilities for visual comprehension, contextual understanding, and
reasoning. These methods are mainly split into end-to-end and compositional
methods, with the latter offering more flexibility. Compositional approaches
that integrate LLMs and foundation models show promising performance but still
struggle with complex reasoning with language-based logical representations. To
address these limitations, we propose NAVER, a compositional visual grounding
method that integrates explicit probabilistic logic reasoning within a
finite-state automaton, equipped with a self-correcting mechanism. This design
improves robustness and interpretability in inference through explicit logic
reasoning. Our results show that NAVER achieves SoTA performance comparing to
recent end-to-end and compositional baselines. The code is available at
https://github.com/ControlNet/NAVER .