Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Exploring the Evolution of Physics Cognition in Video Generation: A
Survey
https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation.\n","updatedAt":"2025-03-28T02:49:58.451Z","author":{"_id":"65fd82762bf2cd20ddaa193f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/yBYbWp_mT7UusYdkqtAvw.png","fullname":"Siteng Huang","name":"huangsiteng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8788859248161316},"editors":["huangsiteng"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/yBYbWp_mT7UusYdkqtAvw.png"],"reactions":[],"isReport":false}},{"id":"67e74e02678a6bfffa64f049","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-03-29T01:33:54.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Advances in 4D Generation: A Survey](https://huggingface.co/papers/2503.14501) (2025)\n* [Simulating the Real World: A Unified Survey of Multimodal Generative Models](https://huggingface.co/papers/2503.04641) (2025)\n* [A Survey on Human Interaction Motion Generation](https://huggingface.co/papers/2503.12763) (2025)\n* [Generative Artificial Intelligence in Robotic Manipulation: A Survey](https://huggingface.co/papers/2503.03464) (2025)\n* [3D Human Interaction Generation: A Survey](https://huggingface.co/papers/2503.13120) (2025)\n* [Human Motion Prediction, Reconstruction, and Generation](https://huggingface.co/papers/2502.15956) (2025)\n* [VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation](https://huggingface.co/papers/2502.07531) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-03-29T01:33:54.198Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7079426050186157},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.21765","authors":[{"_id":"67e60da88decdc7da4bf69a9","user":{"_id":"67e617ebd0fd66b1f393eedc","avatarUrl":"/avatars/97a5ba0d0422f04e396399da1b74e8d4.svg","isPro":false,"fullname":"Minghui Lin","user":"minnielin","type":"user"},"name":"Minghui Lin","status":"admin_assigned","statusLastChangedAt":"2025-03-28T09:01:37.704Z","hidden":false},{"_id":"67e60da88decdc7da4bf69aa","name":"Xiang Wang","hidden":false},{"_id":"67e60da88decdc7da4bf69ab","user":{"_id":"65dcaf16287a93e081d9c2f0","avatarUrl":"/avatars/2db4e25c6924461abb5634f8ffd1ee87.svg","isPro":false,"fullname":"Yishanwang","user":"yishanwang","type":"user"},"name":"Yishan Wang","status":"admin_assigned","statusLastChangedAt":"2025-03-28T09:01:51.882Z","hidden":false},{"_id":"67e60da88decdc7da4bf69ac","user":{"_id":"65fd82762bf2cd20ddaa193f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/yBYbWp_mT7UusYdkqtAvw.png","isPro":false,"fullname":"Siteng Huang","user":"huangsiteng","type":"user"},"name":"Shu Wang","status":"extracted_confirmed","statusLastChangedAt":"2025-12-11T10:51:29.383Z","hidden":false},{"_id":"67e60da88decdc7da4bf69ad","name":"Fengqi Dai","hidden":false},{"_id":"67e60da88decdc7da4bf69ae","name":"Pengxiang Ding","hidden":false},{"_id":"67e60da88decdc7da4bf69af","user":{"_id":"65eaf755ab0a6a90da55ab58","avatarUrl":"/avatars/a46890a9d067a913513edf3759f12c85.svg","isPro":false,"fullname":"Cunxiang Wang","user":"wangcunxiang","type":"user"},"name":"Cunxiang Wang","status":"admin_assigned","statusLastChangedAt":"2025-03-28T09:02:15.357Z","hidden":false},{"_id":"67e60da88decdc7da4bf69b0","name":"Zhengrong Zuo","hidden":false},{"_id":"67e60da88decdc7da4bf69b1","name":"Nong Sang","hidden":false},{"_id":"67e60da88decdc7da4bf69b2","user":{"_id":"65fd82762bf2cd20ddaa193f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/yBYbWp_mT7UusYdkqtAvw.png","isPro":false,"fullname":"Siteng Huang","user":"huangsiteng","type":"user"},"name":"Siteng Huang","status":"admin_assigned","statusLastChangedAt":"2025-03-28T09:02:43.147Z","hidden":false},{"_id":"67e60da88decdc7da4bf69b3","user":{"_id":"67597be2f3cd6492d4162ef8","avatarUrl":"/avatars/ba580c04b7057927d4a22dcb44c52400.svg","isPro":false,"fullname":"DONGLIN","user":"wangdonglin130","type":"user"},"name":"Donglin Wang","status":"admin_assigned","statusLastChangedAt":"2025-03-28T09:02:37.839Z","hidden":false}],"publishedAt":"2025-03-27T17:58:33.000Z","submittedOnDailyAt":"2025-03-28T01:19:58.442Z","title":"Exploring the Evolution of Physics Cognition in Video Generation: A\n Survey","submittedOnDailyBy":{"_id":"65fd82762bf2cd20ddaa193f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/yBYbWp_mT7UusYdkqtAvw.png","isPro":false,"fullname":"Siteng Huang","user":"huangsiteng","type":"user"},"summary":"Recent advancements in video generation have witnessed significant progress,\nespecially with the rapid advancement of diffusion models. Despite this, their\ndeficiencies in physical cognition have gradually received widespread attention\n- generated content often violates the fundamental laws of physics, falling\ninto the dilemma of ''visual realism but physical absurdity\". Researchers began\nto increasingly recognize the importance of physical fidelity in video\ngeneration and attempted to integrate heuristic physical cognition such as\nmotion representations and physical knowledge into generative systems to\nsimulate real-world dynamic scenarios. Considering the lack of a systematic\noverview in this field, this survey aims to provide a comprehensive summary of\narchitecture designs and their applications to fill this gap. Specifically, we\ndiscuss and organize the evolutionary process of physical cognition in video\ngeneration from a cognitive science perspective, while proposing a three-tier\ntaxonomy: 1) basic schema perception for generation, 2) passive cognition of\nphysical knowledge for generation, and 3) active cognition for world\nsimulation, encompassing state-of-the-art methods, classical paradigms, and\nbenchmarks. Subsequently, we emphasize the inherent key challenges in this\ndomain and delineate potential pathways for future research, contributing to\nadvancing the frontiers of discussion in both academia and industry. Through\nstructured review and interdisciplinary analysis, this survey aims to provide\ndirectional guidance for developing interpretable, controllable, and physically\nconsistent video generation paradigms, thereby propelling generative models\nfrom the stage of ''visual mimicry'' towards a new phase of ''human-like\nphysical comprehension''.","upvotes":11,"discussionId":"67e60da98decdc7da4bf6a28","githubRepo":"https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation","githubRepoAddedBy":"user","ai_summary":"This survey reviews the integration of physical cognition into video generation to improve realism while maintaining physical consistency, proposing a taxonomy for understanding the evolutionary process and challenges in this domain.","ai_keywords":["diffusion models","physical cognition","motion representations","physical knowledge","generative systems","cognitive science","schema perception","passive cognition","active cognition","world simulation","physical consistency"],"githubStars":263},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65fd82762bf2cd20ddaa193f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/yBYbWp_mT7UusYdkqtAvw.png","isPro":false,"fullname":"Siteng Huang","user":"huangsiteng","type":"user"},{"_id":"65eaf755ab0a6a90da55ab58","avatarUrl":"/avatars/a46890a9d067a913513edf3759f12c85.svg","isPro":false,"fullname":"Cunxiang Wang","user":"wangcunxiang","type":"user"},{"_id":"6698730f0e6b494e9539fd7e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6698730f0e6b494e9539fd7e/C1bsVKAyBVezknkewE58z.jpeg","isPro":false,"fullname":"wanyisn","user":"wanyisn","type":"user"},{"_id":"67e617ebd0fd66b1f393eedc","avatarUrl":"/avatars/97a5ba0d0422f04e396399da1b74e8d4.svg","isPro":false,"fullname":"Minghui Lin","user":"minnielin","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6486d967d29294981d375505","avatarUrl":"/avatars/8b2100ce7523e39d51814a1adb22de53.svg","isPro":false,"fullname":"Zhe Wang","user":"ZheWang123","type":"user"},{"_id":"64b1125204c172aef94c68c1","avatarUrl":"/avatars/96319eb4a382240637db037c564451a9.svg","isPro":false,"fullname":"Shu Wang","user":"wan95hu","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"665b133508d536a8ac804f7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Uwi0OnANdTbRbHHQvGqvR.png","isPro":false,"fullname":"Paulson","user":"Pnaomi","type":"user"},{"_id":"65994959816bb94a4f35f4ee","avatarUrl":"/avatars/25414fa5790345c4b5772e551e6f5008.svg","isPro":false,"fullname":"redaelkate","user":"Pussinsilicon","type":"user"},{"_id":"65a4567e212d6aca9a3e8f5a","avatarUrl":"/avatars/ed944797230b5460381209bf76e4a0e4.svg","isPro":false,"fullname":"Catherine Liu","user":"Liu12uiL","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
This survey reviews the integration of physical cognition into video generation to improve realism while maintaining physical consistency, proposing a taxonomy for understanding the evolutionary process and challenges in this domain.
AI-generated summary
Recent advancements in video generation have witnessed significant progress,
especially with the rapid advancement of diffusion models. Despite this, their
deficiencies in physical cognition have gradually received widespread attention
- generated content often violates the fundamental laws of physics, falling
into the dilemma of ''visual realism but physical absurdity". Researchers began
to increasingly recognize the importance of physical fidelity in video
generation and attempted to integrate heuristic physical cognition such as
motion representations and physical knowledge into generative systems to
simulate real-world dynamic scenarios. Considering the lack of a systematic
overview in this field, this survey aims to provide a comprehensive summary of
architecture designs and their applications to fill this gap. Specifically, we
discuss and organize the evolutionary process of physical cognition in video
generation from a cognitive science perspective, while proposing a three-tier
taxonomy: 1) basic schema perception for generation, 2) passive cognition of
physical knowledge for generation, and 3) active cognition for world
simulation, encompassing state-of-the-art methods, classical paradigms, and
benchmarks. Subsequently, we emphasize the inherent key challenges in this
domain and delineate potential pathways for future research, contributing to
advancing the frontiers of discussion in both academia and industry. Through
structured review and interdisciplinary analysis, this survey aims to provide
directional guidance for developing interpretable, controllable, and physically
consistent video generation paradigms, thereby propelling generative models
from the stage of ''visual mimicry'' towards a new phase of ''human-like
physical comprehension''.