https://huggingface.co/spaces/Doubiiu/DynamiCrafter
Project page: https://doubiiu.github.io/projects/DynamiCrafter/
Code: https://github.com/Doubiiu/DynamiCrafter

\n","updatedAt":"2023-12-06T05:47:02.046Z","author":{"_id":"64770e86d7cf39f2e937ae9a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64770e86d7cf39f2e937ae9a/pLqGg2z1KzQxCGpMwds-9.jpeg","fullname":"Jinbo Xing","name":"Doubiiu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":162,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.5951587557792664},"editors":["Doubiiu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64770e86d7cf39f2e937ae9a/pLqGg2z1KzQxCGpMwds-9.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2310.12190","authors":[{"_id":"656af0bdd848a6683a97a654","user":{"_id":"64770e86d7cf39f2e937ae9a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64770e86d7cf39f2e937ae9a/pLqGg2z1KzQxCGpMwds-9.jpeg","isPro":false,"fullname":"Jinbo Xing","user":"Doubiiu","type":"user"},"name":"Jinbo Xing","status":"claimed_verified","statusLastChangedAt":"2023-12-04T08:02:31.199Z","hidden":false},{"_id":"656af0bdd848a6683a97a655","user":{"_id":"63401c89f81b9d101361f712","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665146415483-63401c89f81b9d101361f712.png","isPro":false,"fullname":"Richard","user":"menghanxia","type":"user"},"name":"Menghan Xia","status":"claimed_verified","statusLastChangedAt":"2024-02-12T08:40:09.471Z","hidden":false},{"_id":"656af0bdd848a6683a97a656","name":"Yong Zhang","hidden":false},{"_id":"656af0bdd848a6683a97a657","user":{"_id":"6305cb2f435ec751b723d80a","avatarUrl":"/avatars/5e8519d9814038726bf98f4104f74568.svg","isPro":true,"fullname":"Haoxin Chen","user":"paulai","type":"user"},"name":"Haoxin Chen","status":"claimed_verified","statusLastChangedAt":"2024-01-19T09:23:29.984Z","hidden":false},{"_id":"656af0bdd848a6683a97a658","user":{"_id":"63f095be6309c84d5f48848a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f095be6309c84d5f48848a/pL2CKi-r-0mMfIGhYSAsm.jpeg","isPro":false,"fullname":"Wangbo Yu","user":"Drexubery","type":"user"},"name":"Wangbo Yu","status":"claimed_verified","statusLastChangedAt":"2024-08-29T07:25:39.967Z","hidden":false},{"_id":"656af0bdd848a6683a97a659","name":"Hanyuan Liu","hidden":false},{"_id":"656af0bdd848a6683a97a65a","user":{"_id":"60e272ca6c78a8c122b12127","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60e272ca6c78a8c122b12127/xldEGBzGrU-bX6IwAw0Ie.jpeg","isPro":false,"fullname":"Xintao Wang","user":"Xintao","type":"user"},"name":"Xintao Wang","status":"claimed_verified","statusLastChangedAt":"2024-01-18T08:20:59.354Z","hidden":false},{"_id":"656af0bdd848a6683a97a65b","name":"Tien-Tsin Wong","hidden":false},{"_id":"656af0bdd848a6683a97a65c","name":"Ying Shan","hidden":false}],"publishedAt":"2023-10-18T14:42:16.000Z","title":"DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors","summary":"Animating a still image offers an engaging visual experience. Traditional\nimage animation techniques mainly focus on animating natural scenes with\nstochastic dynamics (e.g. clouds and fluid) or domain-specific motions (e.g.\nhuman hair or body motions), and thus limits their applicability to more\ngeneral visual content. To overcome this limitation, we explore the synthesis\nof dynamic content for open-domain images, converting them into animated\nvideos. The key idea is to utilize the motion prior of text-to-video diffusion\nmodels by incorporating the image into the generative process as guidance.\nGiven an image, we first project it into a text-aligned rich context\nrepresentation space using a query transformer, which facilitates the video\nmodel to digest the image content in a compatible fashion. However, some visual\ndetails still struggle to be preserved in the resultant videos. To supplement\nwith more precise image information, we further feed the full image to the\ndiffusion model by concatenating it with the initial noises. Experimental\nresults show that our proposed method can produce visually convincing and more\nlogical & natural motions, as well as higher conformity to the input image.\nComparative evaluation demonstrates the notable superiority of our approach\nover existing competitors.","upvotes":13,"discussionId":"656af0c1d848a6683a97a763","githubRepo":"https://github.com/Doubiiu/DynamiCrafter","githubRepoAddedBy":"auto","ai_summary":"A method using text-to-video diffusion models and query transformers allows the animation of general still images into dynamic videos while preserving visual details and producing natural motions.","ai_keywords":["motion prior","text-to-video diffusion models","query transformer","context representation space","diffusion model","visual details","visual convincing","logical motions","natural motions","conformity to input image"],"githubStars":2997},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63d4c8ce13ae45b780792f32","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63d4c8ce13ae45b780792f32/QasegimoxBqfZwDzorukz.png","isPro":false,"fullname":"Ohenenoo","user":"PeepDaSlan9","type":"user"},{"_id":"64770e86d7cf39f2e937ae9a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64770e86d7cf39f2e937ae9a/pLqGg2z1KzQxCGpMwds-9.jpeg","isPro":false,"fullname":"Jinbo Xing","user":"Doubiiu","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"656059c64f5318d10ca340b9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/54UxR4VouJMOuRmzgburi.png","isPro":false,"fullname":"Edson Jr","user":"EdsonJr","type":"user"},{"_id":"650c8bfb3d3542884da1a845","avatarUrl":"/avatars/863a5deebf2ac6d4faedc4dd368e0561.svg","isPro":false,"fullname":"Adhurim ","user":"Limi07","type":"user"},{"_id":"63f77b6226bb222ab60c6c7e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f77b6226bb222ab60c6c7e/lftdzl7omjn0W7hVGVwwl.jpeg","isPro":false,"fullname":"Salwa Zeitoun","user":"Salwa-Zeitoun","type":"user"},{"_id":"62fa15d28cd542e895b62035","avatarUrl":"/avatars/b4260b60fece4fadb13ea05fb18cd6c5.svg","isPro":false,"fullname":"rfs","user":"rfs","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"60c8d264224e250fb0178f77","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60c8d264224e250fb0178f77/i8fbkBVcoFeJRmkQ9kYAE.png","isPro":false,"fullname":"Adam Lee","user":"Abecid","type":"user"},{"_id":"6420a45ff837b31c1cfdc092","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6420a45ff837b31c1cfdc092/5oKn9EtYPfv5VFpfENFpD.png","isPro":false,"fullname":"Vikramjeet Singh","user":"VikramSingh178","type":"user"},{"_id":"66b61df179fdd9200646a6e1","avatarUrl":"/avatars/35b72db039f1037cbb58dc44234d877a.svg","isPro":false,"fullname":"YuTingshen","user":"yuting89830","type":"user"},{"_id":"639a8f29b2740bf1474e82c1","avatarUrl":"/avatars/306ac149819c80b66386e4a719662130.svg","isPro":false,"fullname":"Hongbo Wang","user":"Larer","type":"user"}],"acceptLanguages":["*"]}">

arxiv:2310.12190

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Published on Oct 18, 2023

Upvote

Authors:

Jinbo Xing ,

Menghan Xia ,

Haoxin Chen ,

Wangbo Yu ,

Xintao Wang ,

Abstract

A method using text-to-video diffusion models and query transformers allows the animation of general still images into dynamic videos while preserving visual details and producing natural motions.

AI-generated summary

Animating a still image offers an engaging visual experience. Traditional image animation techniques mainly focus on animating natural scenes with stochastic dynamics (e.g. clouds and fluid) or domain-specific motions (e.g. human hair or body motions), and thus limits their applicability to more general visual content. To overcome this limitation, we explore the synthesis of dynamic content for open-domain images, converting them into animated videos. The key idea is to utilize the motion prior of text-to-video diffusion models by incorporating the image into the generative process as guidance. Given an image, we first project it into a text-aligned rich context representation space using a query transformer, which facilitates the video model to digest the image content in a compatible fashion. However, some visual details still struggle to be preserved in the resultant videos. To supplement with more precise image information, we further feed the full image to the diffusion model by concatenating it with the initial noises. Experimental results show that our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image. Comparative evaluation demonstrates the notable superiority of our approach over existing competitors.