Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - 4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time
[go: Go Back, main page]

https://4dlrm.github.io/

\n","updatedAt":"2025-06-24T21:22:59.832Z","author":{"_id":"63ee8fb05f1300034de097fd","avatarUrl":"/avatars/bceedb88927d8633948266503c2dd0b1.svg","fullname":"Yu","name":"Shoubin","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.2606920003890991},"editors":["Shoubin"],"editorAvatarUrls":["/avatars/bceedb88927d8633948266503c2dd0b1.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2506.18890","authors":[{"_id":"685b168dd2ee4fac76521d68","user":{"_id":"630cfc45b66f088d547b2768","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630cfc45b66f088d547b2768/9-dMts2xFVbPmPHJGGqBx.png","isPro":true,"fullname":"Martin Ziqiao Ma","user":"marstin","type":"user"},"name":"Ziqiao Ma","status":"claimed_verified","statusLastChangedAt":"2025-06-25T08:09:00.720Z","hidden":false},{"_id":"685b168dd2ee4fac76521d69","name":"Xuweiyi Chen","hidden":false},{"_id":"685b168dd2ee4fac76521d6a","name":"Shoubin Yu","hidden":false},{"_id":"685b168dd2ee4fac76521d6b","name":"Sai Bi","hidden":false},{"_id":"685b168dd2ee4fac76521d6c","name":"Kai Zhang","hidden":false},{"_id":"685b168dd2ee4fac76521d6d","name":"Chen Ziwen","hidden":false},{"_id":"685b168dd2ee4fac76521d6e","user":{"_id":"63f233820a16587ea967adc2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f233820a16587ea967adc2/1nSoZofPV7UseXzjI2qAH.png","isPro":false,"fullname":"Sihan XU","user":"sihanxu","type":"user"},"name":"Sihan Xu","status":"claimed_verified","statusLastChangedAt":"2025-12-19T09:09:54.641Z","hidden":false},{"_id":"685b168dd2ee4fac76521d6f","name":"Jianing Yang","hidden":false},{"_id":"685b168dd2ee4fac76521d70","name":"Zexiang Xu","hidden":false},{"_id":"685b168dd2ee4fac76521d71","name":"Kalyan Sunkavalli","hidden":false},{"_id":"685b168dd2ee4fac76521d72","name":"Mohit Bansal","hidden":false},{"_id":"685b168dd2ee4fac76521d73","name":"Joyce Chai","hidden":false},{"_id":"685b168dd2ee4fac76521d74","name":"Hao Tan","hidden":false}],"publishedAt":"2025-06-23T17:57:47.000Z","submittedOnDailyAt":"2025-06-24T19:52:59.823Z","title":"4D-LRM: Large Space-Time Reconstruction Model From and To Any View at\n Any Time","submittedOnDailyBy":{"_id":"63ee8fb05f1300034de097fd","avatarUrl":"/avatars/bceedb88927d8633948266503c2dd0b1.svg","isPro":true,"fullname":"Yu","user":"Shoubin","type":"user"},"summary":"Can we scale 4D pretraining to learn general space-time representations that\nreconstruct an object from a few views at some times to any view at any time?\nWe provide an affirmative answer with 4D-LRM, the first large-scale 4D\nreconstruction model that takes input from unconstrained views and timestamps\nand renders arbitrary novel view-time combinations. Unlike prior 4D approaches,\ne.g., optimization-based, geometry-based, or generative, that struggle with\nefficiency, generalization, or faithfulness, 4D-LRM learns a unified space-time\nrepresentation and directly predicts per-pixel 4D Gaussian primitives from\nposed image tokens across time, enabling fast, high-quality rendering at, in\nprinciple, infinite frame rate. Our results demonstrate that scaling\nspatiotemporal pretraining enables accurate and efficient 4D reconstruction. We\nshow that 4D-LRM generalizes to novel objects, interpolates across time, and\nhandles diverse camera setups. It reconstructs 24-frame sequences in one\nforward pass with less than 1.5 seconds on a single A100 GPU.","upvotes":6,"discussionId":"685b168dd2ee4fac76521d75","projectPage":"https://4dlrm.github.io/","githubRepo":"https://github.com/Mars-tin/4D-LRM","githubRepoAddedBy":"user","ai_summary":"4D-LRM is a large-scale model that efficiently reconstructs objects from multiple views and times into any view-time combination using space-time representations and Gaussian primitives.","ai_keywords":["4D-LRM","space-time representation","Gaussian primitives","posed image tokens","4D reconstruction","frame rate"],"githubStars":65},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63ee8fb05f1300034de097fd","avatarUrl":"/avatars/bceedb88927d8633948266503c2dd0b1.svg","isPro":true,"fullname":"Yu","user":"Shoubin","type":"user"},{"_id":"635964636a61954080850e1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635964636a61954080850e1d/0bfExuDTrHTtm8c-40cDM.png","isPro":false,"fullname":"William Lamkin","user":"phanes","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"634632aaac1cb29fb2ac9f14","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/634632aaac1cb29fb2ac9f14/nGZ2TzKOOcKMAR_NFYKkR.jpeg","isPro":false,"fullname":"Xuweiyi Chen","user":"Xuweiyi","type":"user"},{"_id":"630cfc45b66f088d547b2768","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630cfc45b66f088d547b2768/9-dMts2xFVbPmPHJGGqBx.png","isPro":true,"fullname":"Martin Ziqiao Ma","user":"marstin","type":"user"},{"_id":"65a4567e212d6aca9a3e8f5a","avatarUrl":"/avatars/ed944797230b5460381209bf76e4a0e4.svg","isPro":false,"fullname":"Catherine Liu","user":"Liu12uiL","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2506.18890

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

Published on Jun 23, 2025
· Submitted by
Yu
on Jun 24, 2025
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

4D-LRM is a large-scale model that efficiently reconstructs objects from multiple views and times into any view-time combination using space-time representations and Gaussian primitives.

AI-generated summary

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at some times to any view at any time? We provide an affirmative answer with 4D-LRM, the first large-scale 4D reconstruction model that takes input from unconstrained views and timestamps and renders arbitrary novel view-time combinations. Unlike prior 4D approaches, e.g., optimization-based, geometry-based, or generative, that struggle with efficiency, generalization, or faithfulness, 4D-LRM learns a unified space-time representation and directly predicts per-pixel 4D Gaussian primitives from posed image tokens across time, enabling fast, high-quality rendering at, in principle, infinite frame rate. Our results demonstrate that scaling spatiotemporal pretraining enables accurate and efficient 4D reconstruction. We show that 4D-LRM generalizes to novel objects, interpolates across time, and handles diverse camera setups. It reconstructs 24-frame sequences in one forward pass with less than 1.5 seconds on a single A100 GPU.

Community

Paper submitter

project page: https://4dlrm.github.io/

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.18890 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.18890 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.18890 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.