Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-06T01:38:06.989Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6900231242179871},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"69874c9db2c21b6c11e8602c","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-07T14:30:53.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/semantic-routing-exploring-multi-layer-llm-feature-weighting-for-diffusion-transformers-115-95f137b5\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/semantic-routing-exploring-multi-layer-llm-feature-weighting-for-diffusion-transformers-115-95f137b5

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-02-07T14:30:53.532Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6154560446739197},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.03510","authors":[{"_id":"69843228e34659da7e1f5060","name":"Bozhou Li","hidden":false},{"_id":"69843228e34659da7e1f5061","user":{"_id":"66ac46766c3f950f4f10b9f9","avatarUrl":"/avatars/027b573bc6e5b18107e762645cec6069.svg","isPro":false,"fullname":"Yushuo Guan","user":"UnnamedWatcher","type":"user"},"name":"Yushuo Guan","status":"claimed_verified","statusLastChangedAt":"2026-02-05T10:52:24.319Z","hidden":false},{"_id":"69843228e34659da7e1f5062","user":{"_id":"64858cbb6121946cf1e3d41b","avatarUrl":"/avatars/cbe80473069512ba74e9713f8fa0942b.svg","isPro":false,"fullname":"lihaolin","user":"tdlhl","type":"user"},"name":"Haolin Li","status":"claimed_verified","statusLastChangedAt":"2026-02-05T10:52:21.747Z","hidden":false},{"_id":"69843228e34659da7e1f5063","name":"Bohan Zeng","hidden":false},{"_id":"69843228e34659da7e1f5064","name":"Yiyan Ji","hidden":false},{"_id":"69843228e34659da7e1f5065","name":"Yue Ding","hidden":false},{"_id":"69843228e34659da7e1f5066","name":"Pengfei Wan","hidden":false},{"_id":"69843228e34659da7e1f5067","name":"Kun Gai","hidden":false},{"_id":"69843228e34659da7e1f5068","name":"Yuanxing Zhang","hidden":false},{"_id":"69843228e34659da7e1f5069","name":"Wentao Zhang","hidden":false}],"publishedAt":"2026-02-03T13:30:13.000Z","submittedOnDailyAt":"2026-02-05T03:45:54.690Z","title":"Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers","submittedOnDailyBy":{"_id":"661e62c6bac5d981f886f77b","avatarUrl":"/avatars/f1eb51ed4499ca434c8939573dfbd5e2.svg","isPro":false,"fullname":"Bozhou Li","user":"zooblastlbz","type":"user"},"summary":"Recent DiT-based text-to-image models increasingly adopt LLMs as text encoders, yet text conditioning remains largely static and often utilizes only a single LLM layer, despite pronounced semantic hierarchy across LLM layers and non-stationary denoising dynamics over both diffusion time and network depth. To better match the dynamic process of DiT generation and thereby enhance the diffusion model's generative capability, we introduce a unified normalized convex fusion framework equipped with lightweight gates to systematically organize multi-layer LLM hidden states via time-wise, depth-wise, and joint fusion. Experiments establish Depth-wise Semantic Routing as the superior conditioning strategy, consistently improving text-image alignment and compositional generation (e.g., +9.97 on the GenAI-Bench Counting task). Conversely, we find that purely time-wise fusion can paradoxically degrade visual generation fidelity. We attribute this to a train-inference trajectory mismatch: under classifier-free guidance, nominal timesteps fail to track the effective SNR, causing semantically mistimed feature injection during inference. Overall, our results position depth-wise routing as a strong and effective baseline and highlight the critical need for trajectory-aware signals to enable robust time-dependent conditioning.","upvotes":27,"discussionId":"69843228e34659da7e1f506a","ai_summary":"Text conditioning in DiT-based models is enhanced through a unified normalized convex fusion framework that optimizes multi-layer LLM hidden states via depth-wise semantic routing, improving text-image alignment and compositional generation.","ai_keywords":["DiT-based text-to-image models","LLMs","text encoders","diffusion models","normalized convex fusion","lightweight gates","multi-layer LLM hidden states","time-wise fusion","depth-wise fusion","joint fusion","classifier-free guidance","effective SNR","semantic hierarchy","generative capability","text-image alignment","compositional generation","Depth-wise Semantic Routing","train-inference trajectory mismatch"],"organization":{"_id":"662c559b322afcbae51b3c8b","name":"KlingTeam","fullname":"Kling Team","avatar":"https://cdn-uploads.huggingface.co/production/uploads/60e272ca6c78a8c122b12127/ZQV1aKLUDPf2rUcxxAqj6.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"661e62c6bac5d981f886f77b","avatarUrl":"/avatars/f1eb51ed4499ca434c8939573dfbd5e2.svg","isPro":false,"fullname":"Bozhou Li","user":"zooblastlbz","type":"user"},{"_id":"64241749a05235e2f8d34cb0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64241749a05235e2f8d34cb0/o6CY4xS22W8_DIqesFykM.jpeg","isPro":false,"fullname":"Yuanxing Zhang","user":"LongoXC","type":"user"},{"_id":"673c7319d11b1c2e246ead9c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/673c7319d11b1c2e246ead9c/IjFIO--N7Hm_BOEafhEQv.jpeg","isPro":false,"fullname":"Yang Shi","user":"DogNeverSleep","type":"user"},{"_id":"675a69699e086bd6250a36ef","avatarUrl":"/avatars/95c72e3975d1a37f8655a2fe629746ec.svg","isPro":false,"fullname":"Weihong Lin","user":"lwher1996","type":"user"},{"_id":"646f7b60799a974be3191889","avatarUrl":"/avatars/8fff4c87a2ea2d12958424074dd8e93d.svg","isPro":false,"fullname":"AaronWuuuuu","user":"zhenhuawu","type":"user"},{"_id":"61540338e5b9ae6774201e58","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61540338e5b9ae6774201e58/h_159VrXOlIgu0N0pNgXj.png","isPro":false,"fullname":"jingyun","user":"hjy","type":"user"},{"_id":"65e71ef39cf349af2940b317","avatarUrl":"/avatars/fc1cd8d3510946fc947d67b16b51834b.svg","isPro":false,"fullname":"Yuran Wang","user":"Ryann829","type":"user"},{"_id":"6656c44a6df2f5f48213a702","avatarUrl":"/avatars/1da36e763202a8379ecc1ea5aa4c2abe.svg","isPro":false,"fullname":"wangzihao","user":"wwzzhh063","type":"user"},{"_id":"66ac46766c3f950f4f10b9f9","avatarUrl":"/avatars/027b573bc6e5b18107e762645cec6069.svg","isPro":false,"fullname":"Yushuo Guan","user":"UnnamedWatcher","type":"user"},{"_id":"660781a450d2b7a71091240d","avatarUrl":"/avatars/da9439b8920605d8427893d0ebc32dfa.svg","isPro":false,"fullname":"Bohan Zeng","user":"zbh0217","type":"user"},{"_id":"64858cbb6121946cf1e3d41b","avatarUrl":"/avatars/cbe80473069512ba74e9713f8fa0942b.svg","isPro":false,"fullname":"lihaolin","user":"tdlhl","type":"user"},{"_id":"66100bacac50abb8d56dece6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66100bacac50abb8d56dece6/fd-4VMpb_1nl903yAIK4K.jpeg","isPro":false,"fullname":"Ding Yue","user":"dingyue1011","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"662c559b322afcbae51b3c8b","name":"KlingTeam","fullname":"Kling Team","avatar":"https://cdn-uploads.huggingface.co/production/uploads/60e272ca6c78a8c122b12127/ZQV1aKLUDPf2rUcxxAqj6.jpeg"}}">
Papers
arxiv:2602.03510

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Published on Feb 3
· Submitted by
Bozhou Li
on Feb 5
Authors:
,
,
,
,
,
,
,

Abstract

Text conditioning in DiT-based models is enhanced through a unified normalized convex fusion framework that optimizes multi-layer LLM hidden states via depth-wise semantic routing, improving text-image alignment and compositional generation.

AI-generated summary

Recent DiT-based text-to-image models increasingly adopt LLMs as text encoders, yet text conditioning remains largely static and often utilizes only a single LLM layer, despite pronounced semantic hierarchy across LLM layers and non-stationary denoising dynamics over both diffusion time and network depth. To better match the dynamic process of DiT generation and thereby enhance the diffusion model's generative capability, we introduce a unified normalized convex fusion framework equipped with lightweight gates to systematically organize multi-layer LLM hidden states via time-wise, depth-wise, and joint fusion. Experiments establish Depth-wise Semantic Routing as the superior conditioning strategy, consistently improving text-image alignment and compositional generation (e.g., +9.97 on the GenAI-Bench Counting task). Conversely, we find that purely time-wise fusion can paradoxically degrade visual generation fidelity. We attribute this to a train-inference trajectory mismatch: under classifier-free guidance, nominal timesteps fail to track the effective SNR, causing semantically mistimed feature injection during inference. Overall, our results position depth-wise routing as a strong and effective baseline and highlight the critical need for trajectory-aware signals to enable robust time-dependent conditioning.

Community

Paper submitter

Recent DiT-based text-to-image models increasingly adopt LLMs as text encoders, yet text conditioning remains largely static and often utilizes only a single LLM layer, despite pronounced semantic hierarchy across LLM layers and non-stationary denoising dynamics over both diffusion time and network depth. To better match the dynamic process of DiT generation and thereby enhance the diffusion model's generative capability, we introduce a unified normalized convex fusion framework equipped with lightweight gates to systematically organize multi-layer LLM hidden states via time-wise, depth-wise, and joint fusion. Experiments establish Depth-wise Semantic Routing as the superior conditioning strategy, consistently improving text–image alignment and compositional generation (e.g., +9.97 on the GenAI-Bench Counting task). Conversely, we find that purely time-wise fusion can paradoxically degrade visual generation fidelity. We attribute this to a train–inference trajectory mismatch: under classifier-free guidance, nominal timesteps fail to track the effective SNR, causing semantically mistimed feature injection during inference. Overall, our results position depth-wise routing as a strong and effective baseline and highlight the critical need for trajectory-aware signals to enable robust time-dependent conditioning.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/semantic-routing-exploring-multi-layer-llm-feature-weighting-for-diffusion-transformers-115-95f137b5

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03510 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.03510 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.03510 in a Space README.md to link it from this page.

Collections including this paper 2