Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
[go: Go Back, main page]

https://arxiv.org/abs/2408.14765) presents CrossViewDiff. We design the satellite scene structure estimation and cross-view texture mapping modules to construct the structural and textural controls for street-view image synthesis.

\n

Project page: https://opendatalab.github.io/CrossViewDiff/

\n","updatedAt":"2024-09-02T13:03:52.396Z","author":{"_id":"646325085897b675c65aea0f","avatarUrl":"/avatars/28ce7388f9318b49bdd0a5594c0f6732.svg","fullname":"Baichuan Zhou","name":"bczhou","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":17,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6503663659095764},"editors":["bczhou"],"editorAvatarUrls":["/avatars/28ce7388f9318b49bdd0a5594c0f6732.svg"],"reactions":[],"isReport":false}},{"id":"66d6679eb50e35e1706def45","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-09-03T01:34:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm](https://huggingface.co/papers/2408.01812) (2024)\n* [Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views](https://huggingface.co/papers/2407.08061) (2024)\n* [Mixed-View Panorama Synthesis using Geospatially Guided Diffusion](https://huggingface.co/papers/2407.09672) (2024)\n* [Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network](https://huggingface.co/papers/2408.05475) (2024)\n* [DiffX: Guide Your Layout to Cross-Modal Generative Modeling](https://huggingface.co/papers/2407.15488) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-09-03T01:34:22.059Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7266408801078796},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2408.14765","authors":[{"_id":"66d5b674b66342762fd54d02","user":{"_id":"66d5b56c77a026c3d2086a79","avatarUrl":"/avatars/45da07fd82fd455955faa05b27a6393f.svg","isPro":false,"fullname":"Weijia Li","user":"liweijia","type":"user"},"name":"Weijia Li","status":"claimed_verified","statusLastChangedAt":"2024-09-03T07:48:23.479Z","hidden":false},{"_id":"66d5b674b66342762fd54d03","user":{"_id":"670ddb69d6ac6394419d88c5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/XxnGNaX3FWug4aiZVjg93.png","isPro":false,"fullname":"Jun He","user":"JunHe0915","type":"user"},"name":"Jun He","status":"claimed_verified","statusLastChangedAt":"2024-10-16T09:44:49.833Z","hidden":false},{"_id":"66d5b674b66342762fd54d04","user":{"_id":"66978ee0b8656f6506b4acb2","avatarUrl":"/avatars/298acb8222e189fce4368985ee5374a1.svg","isPro":false,"fullname":"Junyan Ye","user":"Yejy53","type":"user"},"name":"Junyan Ye","status":"claimed_verified","statusLastChangedAt":"2024-09-03T14:48:59.572Z","hidden":false},{"_id":"66d5b674b66342762fd54d05","name":"Huaping Zhong","hidden":false},{"_id":"66d5b674b66342762fd54d06","name":"Zhimeng Zheng","hidden":false},{"_id":"66d5b674b66342762fd54d07","name":"Zilong Huang","hidden":false},{"_id":"66d5b674b66342762fd54d08","name":"Dahua Lin","hidden":false},{"_id":"66d5b674b66342762fd54d09","name":"Conghui He","hidden":false}],"publishedAt":"2024-08-27T03:41:44.000Z","submittedOnDailyAt":"2024-09-02T11:33:52.389Z","title":"CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View\n Synthesis","submittedOnDailyBy":{"_id":"646325085897b675c65aea0f","avatarUrl":"/avatars/28ce7388f9318b49bdd0a5594c0f6732.svg","isPro":false,"fullname":"Baichuan Zhou","user":"bczhou","type":"user"},"summary":"Satellite-to-street view synthesis aims at generating a realistic street-view\nimage from its corresponding satellite-view image. Although stable diffusion\nmodels have exhibit remarkable performance in a variety of image generation\napplications, their reliance on similar-view inputs to control the generated\nstructure or texture restricts their application to the challenging cross-view\nsynthesis task. In this work, we propose CrossViewDiff, a cross-view diffusion\nmodel for satellite-to-street view synthesis. To address the challenges posed\nby the large discrepancy across views, we design the satellite scene structure\nestimation and cross-view texture mapping modules to construct the structural\nand textural controls for street-view image synthesis. We further design a\ncross-view control guided denoising process that incorporates the above\ncontrols via an enhanced cross-view attention module. To achieve a more\ncomprehensive evaluation of the synthesis results, we additionally design a\nGPT-based scoring method as a supplement to standard evaluation metrics. We\nalso explore the effect of different data sources (e.g., text, maps, building\nheights, and multi-temporal satellite imagery) on this task. Results on three\npublic cross-view datasets show that CrossViewDiff outperforms current\nstate-of-the-art on both standard and GPT-based evaluation metrics, generating\nhigh-quality street-view panoramas with more realistic structures and textures\nacross rural, suburban, and urban scenes. The code and models of this work will\nbe released at https://opendatalab.github.io/CrossViewDiff/.","upvotes":15,"discussionId":"66d5b675b66342762fd54d64","ai_summary":"CrossViewDiff, a cross-view diffusion model, enhances satellite-to-street view synthesis by incorporating structural and textural controls, achieving superior results across various datasets.","ai_keywords":["cross-view diffusion model","satellite scene structure estimation","cross-view texture mapping","denoising process","cross-view attention module","GPT-based scoring method"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"646325085897b675c65aea0f","avatarUrl":"/avatars/28ce7388f9318b49bdd0a5594c0f6732.svg","isPro":false,"fullname":"Baichuan Zhou","user":"bczhou","type":"user"},{"_id":"66d5b56c77a026c3d2086a79","avatarUrl":"/avatars/45da07fd82fd455955faa05b27a6393f.svg","isPro":false,"fullname":"Weijia Li","user":"liweijia","type":"user"},{"_id":"6625cb7ba8d1362ebcb86572","avatarUrl":"/avatars/2016d982fa09a2805c19b099baffa5f2.svg","isPro":false,"fullname":"Kang Hengrui","user":"kanghengrui","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"641b754d1911d3be6745cce9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641b754d1911d3be6745cce9/Ydjcjd4VuNUGj5Cd4QHdB.png","isPro":false,"fullname":"atayloraerospace","user":"Taylor658","type":"user"},{"_id":"66d696d36bddfb32e7f1ca11","avatarUrl":"/avatars/b6dc8378e2bb3ee5fec7f188d7b7d45f.svg","isPro":false,"fullname":"Minfa Liu","user":"lmmf","type":"user"},{"_id":"641802c31f1f3b0fa812225c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641802c31f1f3b0fa812225c/i-OFWm8d37ofsmuvvAKYR.png","isPro":false,"fullname":"becca_bai","user":"beccabai","type":"user"},{"_id":"65fd45473ccf43503350d837","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65fd45473ccf43503350d837/i8r1uUpD0dP1vrrWucO6E.jpeg","isPro":false,"fullname":"Haote Yang","user":"Hoter","type":"user"},{"_id":"640d99628512ec51d7ef71c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/640d99628512ec51d7ef71c7/fcBkqnxfxuuuZTqfN_BGy.jpeg","isPro":false,"fullname":"Honglin Lin","user":"LHL3341","type":"user"},{"_id":"66978ee0b8656f6506b4acb2","avatarUrl":"/avatars/298acb8222e189fce4368985ee5374a1.svg","isPro":false,"fullname":"Junyan Ye","user":"Yejy53","type":"user"},{"_id":"669205f1ccca14aa8f13f770","avatarUrl":"/avatars/11ce274e93345fe3790ac9fa687e2bcb.svg","isPro":false,"fullname":"Hao Yu","user":"Longin-Yu","type":"user"},{"_id":"6487e158f675b4a7867f45fa","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6487e158f675b4a7867f45fa/J0sls6zZ682o-SH7iQs7B.jpeg","isPro":false,"fullname":"Zilong Huang","user":"SereinH","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2408.14765

CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

Published on Aug 27, 2024
· Submitted by
Baichuan Zhou
on Sep 2, 2024
Authors:
Jun He ,
,
,
,
,

Abstract

CrossViewDiff, a cross-view diffusion model, enhances satellite-to-street view synthesis by incorporating structural and textural controls, achieving superior results across various datasets.

AI-generated summary

Satellite-to-street view synthesis aims at generating a realistic street-view image from its corresponding satellite-view image. Although stable diffusion models have exhibit remarkable performance in a variety of image generation applications, their reliance on similar-view inputs to control the generated structure or texture restricts their application to the challenging cross-view synthesis task. In this work, we propose CrossViewDiff, a cross-view diffusion model for satellite-to-street view synthesis. To address the challenges posed by the large discrepancy across views, we design the satellite scene structure estimation and cross-view texture mapping modules to construct the structural and textural controls for street-view image synthesis. We further design a cross-view control guided denoising process that incorporates the above controls via an enhanced cross-view attention module. To achieve a more comprehensive evaluation of the synthesis results, we additionally design a GPT-based scoring method as a supplement to standard evaluation metrics. We also explore the effect of different data sources (e.g., text, maps, building heights, and multi-temporal satellite imagery) on this task. Results on three public cross-view datasets show that CrossViewDiff outperforms current state-of-the-art on both standard and GPT-based evaluation metrics, generating high-quality street-view panoramas with more realistic structures and textures across rural, suburban, and urban scenes. The code and models of this work will be released at https://opendatalab.github.io/CrossViewDiff/.

Community

Paper submitter

Our newest paper (https://arxiv.org/abs/2408.14765) presents CrossViewDiff. We design the satellite scene structure estimation and cross-view texture mapping modules to construct the structural and textural controls for street-view image synthesis.

Project page: https://opendatalab.github.io/CrossViewDiff/

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2408.14765 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.14765 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.14765 in a Space README.md to link it from this page.

Collections including this paper 1