Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
[go: Go Back, main page]

https://picoaudio.github.io/

\n","updatedAt":"2024-07-04T05:58:26.411Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9179,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4004528224468231},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[{"reaction":"๐Ÿ‘","users":["AdinaY","wsntxxn"],"count":2}],"isReport":false}},{"id":"6686a52a17285e9db446fbcb","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1145,"isUserFollowing":false},"createdAt":"2024-07-04T13:35:38.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Really cool ๐Ÿ”ฅ Do you have plans to open source this work? ","html":"

Really cool ๐Ÿ”ฅ Do you have plans to open source this work?

\n","updatedAt":"2024-07-04T13:35:38.380Z","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1145,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9534018039703369},"editors":["AdinaY"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"668798882306065e320a3f6b","author":{"_id":"64ca3251710645aa7bd1ccdc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ca3251710645aa7bd1ccdc/iQV8f8gU220ECxIkTkea0.jpeg","fullname":"Zeyu Xie","name":"ZeyuXie","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2024-07-05T06:54:00.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Sure, it is currently in progress. A huggingface demo will be available soon.","html":"

Sure, it is currently in progress. A huggingface demo will be available soon.

\n","updatedAt":"2024-07-05T06:54:00.422Z","author":{"_id":"64ca3251710645aa7bd1ccdc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ca3251710645aa7bd1ccdc/iQV8f8gU220ECxIkTkea0.jpeg","fullname":"Zeyu Xie","name":"ZeyuXie","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9593239426612854},"editors":["ZeyuXie"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64ca3251710645aa7bd1ccdc/iQV8f8gU220ECxIkTkea0.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"6686a52a17285e9db446fbcb"}},{"id":"669ddb8adc9c86e6ab844b56","author":{"_id":"6457623c90bf61c87f9ffdc4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6457623c90bf61c87f9ffdc4/pKG76YipWfYFhSnorepsl.jpeg","fullname":"Xuenan Xu","name":"wsntxxn","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false},"createdAt":"2024-07-22T04:09:46.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Hello @AdinaY , our demo is available at https://huggingface.co/spaces/amphion/PicoAudio","html":"

Hello \n\n@AdinaY\n\t , our demo is available at https://huggingface.co/spaces/amphion/PicoAudio

\n","updatedAt":"2024-07-22T04:09:46.385Z","author":{"_id":"6457623c90bf61c87f9ffdc4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6457623c90bf61c87f9ffdc4/pKG76YipWfYFhSnorepsl.jpeg","fullname":"Xuenan Xu","name":"wsntxxn","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7768279314041138},"editors":["wsntxxn"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6457623c90bf61c87f9ffdc4/pKG76YipWfYFhSnorepsl.jpeg"],"reactions":[{"reaction":"๐Ÿš€","users":["AdinaY"],"count":1},{"reaction":"๐Ÿ‘","users":["AdinaY"],"count":1}],"isReport":false,"parentCommentId":"6686a52a17285e9db446fbcb"}},{"id":"669e3730dcf76b2ef4c9a74e","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1145,"isUserFollowing":false},"createdAt":"2024-07-22T10:40:48.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Awesome!! Thanks for sharing ๐Ÿค—","html":"

Awesome!! Thanks for sharing ๐Ÿค—

\n","updatedAt":"2024-07-22T10:40:48.002Z","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1145,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.910368025302887},"editors":["AdinaY"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"6686a52a17285e9db446fbcb"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2407.02869","authors":[{"_id":"668639f8b9a71fa518a628bd","user":{"_id":"64ca3251710645aa7bd1ccdc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ca3251710645aa7bd1ccdc/iQV8f8gU220ECxIkTkea0.jpeg","isPro":false,"fullname":"Zeyu Xie","user":"ZeyuXie","type":"user"},"name":"Zeyu Xie","status":"claimed_verified","statusLastChangedAt":"2024-07-05T06:55:15.541Z","hidden":false},{"_id":"668639f8b9a71fa518a628be","user":{"_id":"6457623c90bf61c87f9ffdc4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6457623c90bf61c87f9ffdc4/pKG76YipWfYFhSnorepsl.jpeg","isPro":true,"fullname":"Xuenan Xu","user":"wsntxxn","type":"user"},"name":"Xuenan Xu","status":"admin_assigned","statusLastChangedAt":"2024-07-04T12:17:06.950Z","hidden":false},{"_id":"668639f8b9a71fa518a628bf","user":{"_id":"63b4dcefa50cfcefdaa121f3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b4dcefa50cfcefdaa121f3/MlIxOaTSCbZARVyo8Ly7r.jpeg","isPro":false,"fullname":"Dr Wuz","user":"drwuz","type":"user"},"name":"Zhizheng Wu","status":"claimed_verified","statusLastChangedAt":"2024-07-05T06:55:39.445Z","hidden":false},{"_id":"668639f8b9a71fa518a628c0","name":"Mengyue Wu","hidden":false}],"publishedAt":"2024-07-03T07:33:14.000Z","submittedOnDailyAt":"2024-07-04T04:28:26.367Z","title":"PicoAudio: Enabling Precise Timestamp and Frequency Controllability of\n Audio Events in Text-to-audio Generation","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Recently, audio generation tasks have attracted considerable research\ninterests. Precise temporal controllability is essential to integrate audio\ngeneration with real applications. In this work, we propose a temporal\ncontrolled audio generation framework, PicoAudio. PicoAudio integrates temporal\ninformation to guide audio generation through tailored model design. It\nleverages data crawling, segmentation, filtering, and simulation of\nfine-grained temporally-aligned audio-text data. Both subjective and objective\nevaluations demonstrate that PicoAudio dramantically surpasses current\nstate-of-the-art generation models in terms of timestamp and occurrence\nfrequency controllability. The generated samples are available on the demo\nwebsite https://PicoAudio.github.io.","upvotes":21,"discussionId":"668639fab9a71fa518a629ac","githubRepo":"https://github.com/picoaudio/picoaudio","githubRepoAddedBy":"auto","ai_summary":"PicoAudio, a temporal-controlled audio generation framework, enhances timestamp and occurrence frequency controllability through tailored model design and fine-grained audio-text data.","ai_keywords":["temporal controllability","audio generation framework","temporal information","data crawling","segmentation","filtering","simulation","fine-grained temporally-aligned audio-text data"],"githubStars":43},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"61868ce808aae0b5499a2a95","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61868ce808aae0b5499a2a95/F6BA0anbsoY_Z7M1JrwOe.jpeg","isPro":true,"fullname":"Sylvain Filoni","user":"fffiloni","type":"user"},{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"63477bb66f8773f2a28daa15","avatarUrl":"/avatars/9a369763a73278cddcf2abcae594865d.svg","isPro":false,"fullname":"Dhruv Diddi","user":"ddiddi","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"630bd0e9910e17bbfeadf22c","avatarUrl":"/avatars/ff041e7a8cbca8e7130c3102a5d31c0c.svg","isPro":false,"fullname":"Ivan Rubachev","user":"puhsu","type":"user"},{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":false,"fullname":"Adina Yakefu","user":"AdinaY","type":"user"},{"_id":"63a00584beb95d698dea8d4a","avatarUrl":"/avatars/6f95af7e0a42de3c600c354b9e95045a.svg","isPro":false,"fullname":"Zhisheng Zheng","user":"zhisheng01","type":"user"},{"_id":"63b4dcefa50cfcefdaa121f3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b4dcefa50cfcefdaa121f3/MlIxOaTSCbZARVyo8Ly7r.jpeg","isPro":false,"fullname":"Dr Wuz","user":"drwuz","type":"user"},{"_id":"64f955c582673b2a07fbf0ad","avatarUrl":"/avatars/1c98c8be61f6580c1e4ee698fa5c0716.svg","isPro":false,"fullname":"hongyu","user":"learn12138","type":"user"},{"_id":"6555125a4f361968f0e3aad7","avatarUrl":"/avatars/e7692d82804338f21ecdc6e731f5c5ea.svg","isPro":false,"fullname":"marinaretikof","user":"marinaretik","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2407.02869

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Published on Jul 3, 2024
ยท Submitted by
AK
on Jul 4, 2024
Authors:

Abstract

PicoAudio, a temporal-controlled audio generation framework, enhances timestamp and occurrence frequency controllability through tailored model design and fine-grained audio-text data.

AI-generated summary

Recently, audio generation tasks have attracted considerable research interests. Precise temporal controllability is essential to integrate audio generation with real applications. In this work, we propose a temporal controlled audio generation framework, PicoAudio. PicoAudio integrates temporal information to guide audio generation through tailored model design. It leverages data crawling, segmentation, filtering, and simulation of fine-grained temporally-aligned audio-text data. Both subjective and objective evaluations demonstrate that PicoAudio dramantically surpasses current state-of-the-art generation models in terms of timestamp and occurrence frequency controllability. The generated samples are available on the demo website https://PicoAudio.github.io.

Community

Paper submitter

Really cool ๐Ÿ”ฅ Do you have plans to open source this work?

ยท
Paper author

Sure, it is currently in progress. A huggingface demo will be available soon.

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 10