Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Flow-GRPO: Training Flow Matching Models via Online RL
https://github.com/yifan123/flow_grpo\n","updatedAt":"2025-05-09T05:53:59.475Z","author":{"_id":"639be86b59473c6ae02ef9c4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/639be86b59473c6ae02ef9c4/gw34RBCVZCOkcAA79xUr3.png","fullname":"Jie Liu","name":"jieliu","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":33,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7244346141815186},"editors":["jieliu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/639be86b59473c6ae02ef9c4/gw34RBCVZCOkcAA79xUr3.png"],"reactions":[{"reaction":"👍","users":["CheeryLJH","qe2","wangsssssss","xiaodi"],"count":4}],"isReport":false}},{"id":"681daba9e7b55b8a32d80f88","author":{"_id":"64d71083a787c9bc7b9f1238","avatarUrl":"/avatars/d0b0546dec7fc5792921154bec41385a.svg","fullname":"Yangguang Li","name":"Lp256","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false},"createdAt":"2025-05-09T07:15:53.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","hiddenReason":"Resolved","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-05-09T07:20:19.068Z","author":{"_id":"64d71083a787c9bc7b9f1238","avatarUrl":"/avatars/d0b0546dec7fc5792921154bec41385a.svg","fullname":"Yangguang Li","name":"Lp256","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"68231257ef9ca70869097c69","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-05-13T09:35:19.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing](https://huggingface.co/papers/2503.19385) (2025)\n* [Gaussian Mixture Flow Matching Models](https://huggingface.co/papers/2504.05304) (2025)\n* [CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models](https://huggingface.co/papers/2503.18886) (2025)\n* [InstructEngine: Instruction-driven Text-to-Image Alignment](https://huggingface.co/papers/2504.10329) (2025)\n* [Consistency Trajectory Matching for One-Step Generative Super-Resolution](https://huggingface.co/papers/2503.20349) (2025)\n* [T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT](https://huggingface.co/papers/2505.00703) (2025)\n* [Deeply Supervised Flow-Based Generative Models](https://huggingface.co/papers/2503.14494) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
\n","updatedAt":"2025-05-24T16:11:21.434Z","author":{"_id":"6813ee19c9b224a738fea856","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/g1uPHIKEgWe1ftHGHbo_U.png","fullname":"YJ","name":"yjh415","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"nl","probability":0.1045880988240242},"editors":["yjh415"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/g1uPHIKEgWe1ftHGHbo_U.png"],"reactions":[],"isReport":false}},{"id":"689dc29093b97bd0ed70e1dd","author":{"_id":"60991602f7c9c7bf29603a88","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60991602f7c9c7bf29603a88/me8VFG_06ZOovTLldF-L7.jpeg","fullname":"Lev Novitskiy","name":"leffff","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":18,"isUserFollowing":false},"createdAt":"2025-08-14T11:03:44.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"We have implemented: Flow-GRPO here: https://github.com/leffff/Diffusion-Reward-Modeling-for-Text-Rendering \nWe tested the method for the task of text rendering, but it did not turned out to work as well as Flow-DPO. Any advice?","html":"
\n","updatedAt":"2025-08-14T11:03:44.786Z","author":{"_id":"60991602f7c9c7bf29603a88","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60991602f7c9c7bf29603a88/me8VFG_06ZOovTLldF-L7.jpeg","fullname":"Lev Novitskiy","name":"leffff","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":18,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9194655418395996},"editors":["leffff"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/60991602f7c9c7bf29603a88/me8VFG_06ZOovTLldF-L7.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2505.05470","authors":[{"_id":"681d9829edf34a77aab565eb","user":{"_id":"639be86b59473c6ae02ef9c4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/639be86b59473c6ae02ef9c4/gw34RBCVZCOkcAA79xUr3.png","isPro":true,"fullname":"Jie Liu","user":"jieliu","type":"user"},"name":"Jie Liu","status":"claimed_verified","statusLastChangedAt":"2025-05-09T13:54:46.048Z","hidden":false},{"_id":"681d9829edf34a77aab565ec","user":{"_id":"6553316bf151de82f6a23e1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6553316bf151de82f6a23e1d/GTBkSj4Fa3OoyM6Muz_Sc.jpeg","isPro":false,"fullname":"Gongye Liu","user":"liuhuohuo","type":"user"},"name":"Gongye Liu","status":"admin_assigned","statusLastChangedAt":"2025-05-09T08:41:47.403Z","hidden":false},{"_id":"681d9829edf34a77aab565ed","name":"Jiajun Liang","hidden":false},{"_id":"681d9829edf34a77aab565ee","user":{"_id":"64d71083a787c9bc7b9f1238","avatarUrl":"/avatars/d0b0546dec7fc5792921154bec41385a.svg","isPro":false,"fullname":"Yangguang Li","user":"Lp256","type":"user"},"name":"Yangguang Li","status":"admin_assigned","statusLastChangedAt":"2025-05-09T08:43:44.697Z","hidden":false},{"_id":"681d9829edf34a77aab565ef","user":{"_id":"65377c30e48353201e6fdda0","avatarUrl":"/avatars/a8f803b6f2e598eaee9c52c0d2ddfc16.svg","isPro":false,"fullname":"Jiaheng Liu","user":"CheeryLJH","type":"user"},"name":"Jiaheng Liu","status":"admin_assigned","statusLastChangedAt":"2025-05-09T08:45:07.297Z","hidden":false},{"_id":"681d9829edf34a77aab565f0","user":{"_id":"60e272ca6c78a8c122b12127","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60e272ca6c78a8c122b12127/xldEGBzGrU-bX6IwAw0Ie.jpeg","isPro":false,"fullname":"Xintao Wang","user":"Xintao","type":"user"},"name":"Xintao Wang","status":"admin_assigned","statusLastChangedAt":"2025-05-09T08:42:27.977Z","hidden":false},{"_id":"681d9829edf34a77aab565f1","name":"Pengfei Wan","hidden":false},{"_id":"681d9829edf34a77aab565f2","user":{"_id":"644c8324f02250233d0d67d9","avatarUrl":"/avatars/feb39d281457c1750f3eada3c060a23e.svg","isPro":false,"fullname":"Di Zhang","user":"dizhang","type":"user"},"name":"Di Zhang","status":"admin_assigned","statusLastChangedAt":"2025-05-09T08:43:01.366Z","hidden":false},{"_id":"681d9829edf34a77aab565f3","name":"Wanli Ouyang","hidden":false}],"publishedAt":"2025-05-08T17:58:45.000Z","submittedOnDailyAt":"2025-05-09T05:45:53.355Z","title":"Flow-GRPO: Training Flow Matching Models via Online RL","submittedOnDailyBy":{"_id":"64d71083a787c9bc7b9f1238","avatarUrl":"/avatars/d0b0546dec7fc5792921154bec41385a.svg","isPro":false,"fullname":"Yangguang Li","user":"Lp256","type":"user"},"summary":"We propose Flow-GRPO, the first method integrating online reinforcement\nlearning (RL) into flow matching models. Our approach uses two key strategies:\n(1) an ODE-to-SDE conversion that transforms a deterministic Ordinary\nDifferential Equation (ODE) into an equivalent Stochastic Differential Equation\n(SDE) that matches the original model's marginal distribution at all timesteps,\nenabling statistical sampling for RL exploration; and (2) a Denoising Reduction\nstrategy that reduces training denoising steps while retaining the original\ninference timestep number, significantly improving sampling efficiency without\nperformance degradation. Empirically, Flow-GRPO is effective across multiple\ntext-to-image tasks. For complex compositions, RL-tuned SD3.5 generates nearly\nperfect object counts, spatial relations, and fine-grained attributes, boosting\nGenEval accuracy from 63% to 95%. In visual text rendering, its accuracy\nimproves from 59% to 92%, significantly enhancing text generation.\nFlow-GRPO also achieves substantial gains in human preference alignment.\nNotably, little to no reward hacking occurred, meaning rewards did not increase\nat the cost of image quality or diversity, and both remained stable in our\nexperiments.","upvotes":88,"discussionId":"681d982aedf34a77aab56635","githubRepo":"https://github.com/yifan123/flow_grpo","githubRepoAddedBy":"user","ai_summary":"Flow-GRPO combines online reinforcement learning with flow matching models through an ODE-to-SDE conversion and denoising reduction, improving sampling efficiency and performance across text-to-image tasks.","ai_keywords":["Flow-GRPO","online reinforcement learning","flow matching models","ODE-to-SDE conversion","Stochastic Differential Equation","Denoising Reduction","text-to-image tasks","RL-tuned SD3.5","GenEval","visual text rendering","human preference alignment"],"githubStars":2002},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66a0b434968fb59eb2434a0a","avatarUrl":"/avatars/67538f97ebf1995c530563f411d3cd95.svg","isPro":false,"fullname":"Zixuan Ye","user":"zixuan-ye","type":"user"},{"_id":"66743477ab975c859114d410","avatarUrl":"/avatars/ac692cc336e383fb2cb53db6d1e3fe8c.svg","isPro":false,"fullname":"yawenluo","user":"yawenluo","type":"user"},{"_id":"65377c30e48353201e6fdda0","avatarUrl":"/avatars/a8f803b6f2e598eaee9c52c0d2ddfc16.svg","isPro":false,"fullname":"Jiaheng Liu","user":"CheeryLJH","type":"user"},{"_id":"651ed7ef755e92f7f12742e6","avatarUrl":"/avatars/57a9cc189b4a59299aad6c96191b18d8.svg","isPro":false,"fullname":"yu li","user":"lyabc","type":"user"},{"_id":"64f94370c3c12b377cc51086","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64f94370c3c12b377cc51086/6CXcHhqAoykqXcShqM8Rd.jpeg","isPro":false,"fullname":"Minghong Cai","user":"onevfall","type":"user"},{"_id":"64105a6d14215c0775dfdd14","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64105a6d14215c0775dfdd14/-VX-cUYOLjHIg7QnWhRGG.jpeg","isPro":false,"fullname":"Jiwen Yu","user":"VictorYuki","type":"user"},{"_id":"637cba13b8e573d75be96ea6","avatarUrl":"/avatars/5eca230e63d66947b2a05c1ff964a96c.svg","isPro":false,"fullname":"Nina","user":"NinaKarine","type":"user"},{"_id":"6360d9f0472131c3bc4f61df","avatarUrl":"/avatars/c5d884e5ef19b781e3405aba6dd68ca8.svg","isPro":false,"fullname":"WeicaiYe","user":"WeicaiYe","type":"user"},{"_id":"6423f44d30b0e4ab36dd939d","avatarUrl":"/avatars/9bbf6f00188d21d7bd5b678b955b9fc0.svg","isPro":false,"fullname":"Xiaoshi Wu","user":"xswu","type":"user"},{"_id":"6530bf50f145530101ec03a2","avatarUrl":"/avatars/c61c00c314cf202b64968e51e855694d.svg","isPro":false,"fullname":"Jianhong Bai","user":"jianhongbai","type":"user"},{"_id":"646f3418a6a58aa29505fd30","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f3418a6a58aa29505fd30/1z13rnpb6rsUgQsYumWPg.png","isPro":false,"fullname":"QINGHE WANG","user":"Qinghew","type":"user"},{"_id":"659fe17d58a49686b2b4aae9","avatarUrl":"/avatars/4163bb79967e92efd0a0d9af26441fb1.svg","isPro":false,"fullname":"kl","user":"kl233","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
Flow-GRPO combines online reinforcement learning with flow matching models through an ODE-to-SDE conversion and denoising reduction, improving sampling efficiency and performance across text-to-image tasks.
AI-generated summary
We propose Flow-GRPO, the first method integrating online reinforcement
learning (RL) into flow matching models. Our approach uses two key strategies:
(1) an ODE-to-SDE conversion that transforms a deterministic Ordinary
Differential Equation (ODE) into an equivalent Stochastic Differential Equation
(SDE) that matches the original model's marginal distribution at all timesteps,
enabling statistical sampling for RL exploration; and (2) a Denoising Reduction
strategy that reduces training denoising steps while retaining the original
inference timestep number, significantly improving sampling efficiency without
performance degradation. Empirically, Flow-GRPO is effective across multiple
text-to-image tasks. For complex compositions, RL-tuned SD3.5 generates nearly
perfect object counts, spatial relations, and fine-grained attributes, boosting
GenEval accuracy from 63% to 95%. In visual text rendering, its accuracy
improves from 59% to 92%, significantly enhancing text generation.
Flow-GRPO also achieves substantial gains in human preference alignment.
Notably, little to no reward hacking occurred, meaning rewards did not increase
at the cost of image quality or diversity, and both remained stable in our
experiments.