Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Switch EMA: A Free Lunch for Better Flatness and Sharpness
https://github.com/Westlake-AI/SEMA\n","updatedAt":"2024-10-12T00:36:53.214Z","author":{"_id":"6594d390674349122ce6f368","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6594d390674349122ce6f368/KdWz6lZyGYQpjAgBDeiC1.jpeg","fullname":"Zedong Wang","name":"JackyWangAI","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false}},"numEdits":0,"editors":["JackyWangAI"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6594d390674349122ce6f368/KdWz6lZyGYQpjAgBDeiC1.jpeg"],"reactions":[{"reaction":"๐ฅ","users":["JackyWangAI","Lupin1998"],"count":2}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.09240","authors":[{"_id":"67088511da7c23b885847340","user":{"_id":"640f7083208821a59b74c757","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678735253848-640f7083208821a59b74c757.jpeg","isPro":false,"fullname":"Siyuan Li","user":"Lupin1998","type":"user"},"name":"Siyuan Li","status":"claimed_verified","statusLastChangedAt":"2024-10-13T20:16:27.294Z","hidden":false},{"_id":"67088511da7c23b885847341","name":"Zicheng Liu","hidden":false},{"_id":"67088511da7c23b885847342","user":{"_id":"670880950e79a8b46f7ff9dd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/670880950e79a8b46f7ff9dd/hA1TLhwlQblkFsq8wLrkB.jpeg","isPro":false,"fullname":"Juanxi Tian","user":"Juanxi","type":"user"},"name":"Juanxi Tian","status":"claimed_verified","statusLastChangedAt":"2024-10-11T07:43:51.525Z","hidden":false},{"_id":"67088511da7c23b885847343","name":"Ge Wang","hidden":false},{"_id":"67088511da7c23b885847344","user":{"_id":"6594d390674349122ce6f368","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6594d390674349122ce6f368/KdWz6lZyGYQpjAgBDeiC1.jpeg","isPro":false,"fullname":"Zedong Wang","user":"JackyWangAI","type":"user"},"name":"Zedong Wang","status":"claimed_verified","statusLastChangedAt":"2024-10-11T07:43:53.432Z","hidden":false},{"_id":"67088511da7c23b885847345","user":{"_id":"66608add236f958513d21d2e","avatarUrl":"/avatars/53eca0891c98cbb93be899885160a983.svg","isPro":false,"fullname":"Weiyang Jin","user":"Wayne-King","type":"user"},"name":"Weiyang Jin","status":"claimed_verified","statusLastChangedAt":"2025-10-15T15:29:07.018Z","hidden":true},{"_id":"67088511da7c23b885847346","name":"Di Wu","hidden":false},{"_id":"67088511da7c23b885847347","user":{"_id":"64be296a46cc3cdfbb057f7e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64be296a46cc3cdfbb057f7e/jSHeNY2AcPifCZzJyFhr4.jpeg","isPro":false,"fullname":"Cheng Tan","user":"chengtan9907","type":"user"},"name":"Cheng Tan","status":"claimed_verified","statusLastChangedAt":"2025-04-03T19:21:28.583Z","hidden":false},{"_id":"67088511da7c23b885847348","user":{"_id":"65296b8a6cdea40585bd81e2","avatarUrl":"/avatars/824452e8b8fd056eaf7549c46393c47b.svg","isPro":false,"fullname":"Tao LIN","user":"tlin-taolin","type":"user"},"name":"Tao Lin","status":"claimed_verified","statusLastChangedAt":"2025-05-13T08:58:02.793Z","hidden":false},{"_id":"67088511da7c23b885847349","name":"Yang Liu","hidden":false},{"_id":"67088511da7c23b88584734a","name":"Baigui Sun","hidden":false},{"_id":"67088511da7c23b88584734b","name":"Stan Z. Li","hidden":false}],"publishedAt":"2024-02-14T15:28:42.000Z","title":"Switch EMA: A Free Lunch for Better Flatness and Sharpness","summary":"Exponential Moving Average (EMA) is a widely used weight averaging (WA)\nregularization to learn flat optima for better generalizations without extra\ncost in deep neural network (DNN) optimization. Despite achieving better\nflatness, existing WA methods might fall into worse final performances or\nrequire extra test-time computations. This work unveils the full potential of\nEMA with a single line of modification, i.e., switching the EMA parameters to\nthe original model after each epoch, dubbed as Switch EMA (SEMA). From both\ntheoretical and empirical aspects, we demonstrate that SEMA can help DNNs to\nreach generalization optima that better trade-off between flatness and\nsharpness. To verify the effectiveness of SEMA, we conduct comparison\nexperiments with discriminative, generative, and regression tasks on vision and\nlanguage datasets, including image classification, self-supervised learning,\nobject detection and segmentation, image generation, video prediction,\nattribute regression, and language modeling. Comprehensive results with popular\noptimizers and networks show that SEMA is a free lunch for DNN training by\nimproving performances and boosting convergence speeds.","upvotes":5,"discussionId":"67088514da7c23b88584746e","githubRepo":"https://github.com/westlake-ai/sema","githubRepoAddedBy":"auto","ai_summary":"Switch EMA (SEMA) enhances deep neural network generalization by improving trade-offs between flatness and sharpness without additional computational costs, demonstrating superior performance and faster convergence across various tasks.","ai_keywords":["Exponential Moving Average (EMA)","weight averaging (WA)","Switch EMA (SEMA)","flat optima","generalization optima","deep neural networks (DNNs)","image classification","self-supervised learning","object detection","segmentation","image generation","video prediction","attribute regression","language modeling","optimizers","networks"],"githubStars":28},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6594d390674349122ce6f368","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6594d390674349122ce6f368/KdWz6lZyGYQpjAgBDeiC1.jpeg","isPro":false,"fullname":"Zedong Wang","user":"JackyWangAI","type":"user"},{"_id":"670880950e79a8b46f7ff9dd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/670880950e79a8b46f7ff9dd/hA1TLhwlQblkFsq8wLrkB.jpeg","isPro":false,"fullname":"Juanxi Tian","user":"Juanxi","type":"user"},{"_id":"640f7083208821a59b74c757","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678735253848-640f7083208821a59b74c757.jpeg","isPro":false,"fullname":"Siyuan Li","user":"Lupin1998","type":"user"},{"_id":"67d8ede220b23d6baef38465","avatarUrl":"/avatars/2ff521ea7b2e003dc966632c4f52f150.svg","isPro":false,"fullname":"LuYinMiao","user":"LuYinMiao","type":"user"},{"_id":"692e15f02ab84c5b65447736","avatarUrl":"/avatars/55258d7538e0c758ad933da965c54d1a.svg","isPro":false,"fullname":"Joshua-hk","user":"Joshuahk","type":"user"}],"acceptLanguages":["*"]}">
Switch EMA (SEMA) enhances deep neural network generalization by improving trade-offs between flatness and sharpness without additional computational costs, demonstrating superior performance and faster convergence across various tasks.
AI-generated summary
Exponential Moving Average (EMA) is a widely used weight averaging (WA)
regularization to learn flat optima for better generalizations without extra
cost in deep neural network (DNN) optimization. Despite achieving better
flatness, existing WA methods might fall into worse final performances or
require extra test-time computations. This work unveils the full potential of
EMA with a single line of modification, i.e., switching the EMA parameters to
the original model after each epoch, dubbed as Switch EMA (SEMA). From both
theoretical and empirical aspects, we demonstrate that SEMA can help DNNs to
reach generalization optima that better trade-off between flatness and
sharpness. To verify the effectiveness of SEMA, we conduct comparison
experiments with discriminative, generative, and regression tasks on vision and
language datasets, including image classification, self-supervised learning,
object detection and segmentation, image generation, video prediction,
attribute regression, and language modeling. Comprehensive results with popular
optimizers and networks show that SEMA is a free lunch for DNN training by
improving performances and boosting convergence speeds.
๐ก Highlights: Just ONE line of code change that strategically switches between EMA and online SGD, combining both flatness & sharpness in the loss landscape. ๐ฏ Pluggable to any DL optimizers, yielding performance gains and speeding up without extra costs ๐ป Code: https://github.com/Westlake-AI/SEMA