Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - PhyCritic: Multimodal Critic Models for Physical AI
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-02-13T01:40:27.211Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6927450299263},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.11124","authors":[{"_id":"698d486865c0d15a6d162162","user":{"_id":"6570977f87a92b76922c9950","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6570977f87a92b76922c9950/AQGto1w6ugBvH2yCV46YU.jpeg","isPro":false,"fullname":"Tianyi Xiong","user":"txiong23","type":"user"},"name":"Tianyi Xiong","status":"claimed_verified","statusLastChangedAt":"2026-02-12T13:28:23.141Z","hidden":false},{"_id":"698d486865c0d15a6d162163","name":"Shihao Wang","hidden":false},{"_id":"698d486865c0d15a6d162164","name":"Guilin Liu","hidden":false},{"_id":"698d486865c0d15a6d162165","name":"Yi Dong","hidden":false},{"_id":"698d486865c0d15a6d162166","name":"Ming Li","hidden":false},{"_id":"698d486865c0d15a6d162167","name":"Heng Huang","hidden":false},{"_id":"698d486865c0d15a6d162168","name":"Jan Kautz","hidden":false},{"_id":"698d486865c0d15a6d162169","user":{"_id":"66c8037c737ba92ae3fe0322","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66c8037c737ba92ae3fe0322/WR_Yh5DWOVVh7IFlF24NM.jpeg","isPro":true,"fullname":"Zhiding Yu","user":"Zhiding","type":"user"},"name":"Zhiding Yu","status":"claimed_verified","statusLastChangedAt":"2026-02-12T13:28:25.160Z","hidden":false}],"publishedAt":"2026-02-11T18:35:39.000Z","submittedOnDailyAt":"2026-02-12T02:07:20.427Z","title":"PhyCritic: Multimodal Critic Models for Physical AI","submittedOnDailyBy":{"_id":"6570977f87a92b76922c9950","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6570977f87a92b76922c9950/AQGto1w6ugBvH2yCV46YU.jpeg","isPro":false,"fullname":"Tianyi Xiong","user":"txiong23","type":"user"},"summary":"With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment, providing pairwise preferences, numerical scores, and explanatory justifications for assessing model-generated responses. However, existing critics are primarily trained in general visual domains such as captioning or image question answering, leaving physical AI tasks involving perception, causal reasoning, and planning largely underexplored. We introduce PhyCritic, a multimodal critic model optimized for physical AI through a two-stage RLVR pipeline: a physical skill warmup stage that enhances physically oriented perception and reasoning, followed by self-referential critic finetuning, where the critic generates its own prediction as an internal reference before judging candidate responses, improving judgment stability and physical correctness. Across both physical and general-purpose multimodal judge benchmarks, PhyCritic achieves strong performance gains over open-source baselines and, when applied as a policy model, further improves perception and reasoning in physically grounded tasks.","upvotes":51,"discussionId":"698d486865c0d15a6d16216a","projectPage":"https://research.nvidia.com/labs/lpr/phycritic","ai_summary":"PhyCritic is a multimodal critic model designed for physical AI tasks through a two-stage RLVR pipeline that enhances perception and reasoning capabilities.","ai_keywords":["multimodal models","physical AI","RLVR pipeline","physical skill warmup stage","self-referential critic finetuning","perception","reasoning","policy model"],"organization":{"_id":"60262b67268c201cdc8b7d43","name":"nvidia","fullname":"NVIDIA","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1613114437487-60262a8e0703121c822a80b6.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6570977f87a92b76922c9950","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6570977f87a92b76922c9950/AQGto1w6ugBvH2yCV46YU.jpeg","isPro":false,"fullname":"Tianyi Xiong","user":"txiong23","type":"user"},{"_id":"6623e265609d7a39a1107cc9","avatarUrl":"/avatars/887f1f5b74f6669ce7d560d95b05b530.svg","isPro":false,"fullname":"Dawn","user":"LegendaryDawn","type":"user"},{"_id":"6794cd79b72b1721ea69f4f2","avatarUrl":"/avatars/4e4fb9e9e127a0c031131ace705687cd.svg","isPro":false,"fullname":"Ming Li","user":"afdsafas","type":"user"},{"_id":"6392c73390b8e99a6779a7b0","avatarUrl":"/avatars/9ff824ab02848120aec5e8de6780bcf1.svg","isPro":false,"fullname":"Guo Chen","user":"cg1177","type":"user"},{"_id":"62f1bc942e2b285043155462","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1660009614213-noauth.jpeg","isPro":false,"fullname":"Liu Yuhe","user":"Junetheriver","type":"user"},{"_id":"646c1715c7f672003c87730a","avatarUrl":"/avatars/731e608bb542ff21622ac981144d90bc.svg","isPro":false,"fullname":"Shuning Zhang","user":"Shuningz","type":"user"},{"_id":"656741b757c58ae7f82a52ac","avatarUrl":"/avatars/65ddd7147d7745b56290f9d3ef24077a.svg","isPro":false,"fullname":"Shihao Wang","user":"exiawsh","type":"user"},{"_id":"667b04670b19955ad0f61c7e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Q0hMYkaX8WzquYCUctUKX.png","isPro":false,"fullname":"Yi Ge (Ellen)","user":"ellenyige","type":"user"},{"_id":"6312cab05beb528b5c1500e3","avatarUrl":"/avatars/a328e8cc99fb031b2d5c911c4b577e7e.svg","isPro":false,"fullname":"Fu-En Yang","user":"FuEnYang","type":"user"},{"_id":"66c8037c737ba92ae3fe0322","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66c8037c737ba92ae3fe0322/WR_Yh5DWOVVh7IFlF24NM.jpeg","isPro":true,"fullname":"Zhiding Yu","user":"Zhiding","type":"user"},{"_id":"66631978126013dc6a1fe62d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66631978126013dc6a1fe62d/PK0ZA6ic_iMuC230ZzWeh.png","isPro":false,"fullname":"Yu-Cheng Chou","user":"Johnson111788","type":"user"},{"_id":"63f6c04ac96958470d1e9043","avatarUrl":"/avatars/da46cdd9e21498e120ca91b67bfbfb5e.svg","isPro":false,"fullname":"Jian Hu","user":"chuyi777","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"60262b67268c201cdc8b7d43","name":"nvidia","fullname":"NVIDIA","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1613114437487-60262a8e0703121c822a80b6.png"}}">
PhyCritic is a multimodal critic model designed for physical AI tasks through a two-stage RLVR pipeline that enhances perception and reasoning capabilities.
AI-generated summary
With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment, providing pairwise preferences, numerical scores, and explanatory justifications for assessing model-generated responses. However, existing critics are primarily trained in general visual domains such as captioning or image question answering, leaving physical AI tasks involving perception, causal reasoning, and planning largely underexplored. We introduce PhyCritic, a multimodal critic model optimized for physical AI through a two-stage RLVR pipeline: a physical skill warmup stage that enhances physically oriented perception and reasoning, followed by self-referential critic finetuning, where the critic generates its own prediction as an internal reference before judging candidate responses, improving judgment stability and physical correctness. Across both physical and general-purpose multimodal judge benchmarks, PhyCritic achieves strong performance gains over open-source baselines and, when applied as a policy model, further improves perception and reasoning in physically grounded tasks.