FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching Paper ⢠2604.06757 ⢠Published 12 days ago ⢠10
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models Paper ⢠2602.10179 ⢠Published Feb 10 ⢠6
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models Paper ⢠2602.10179 ⢠Published Feb 10 ⢠6
Olaf-World: Orienting Latent Actions for Video World Modeling Paper ⢠2602.10104 ⢠Published Feb 10 ⢠27
Glance: Accelerating Diffusion Models with 1 Sample Paper ⢠2512.02899 ⢠Published Dec 2, 2025 ⢠30
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper ⢠2511.11434 ⢠Published Nov 14, 2025 ⢠47
š± Sailor2 Language Models Collection Sailing in South-East Asia with Inclusive Multilingual LLMs ⢠32 items ⢠Updated Mar 2 ⢠30
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper ⢠2511.02778 ⢠Published Nov 4, 2025 ⢠103
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback Paper ⢠2511.01678 ⢠Published Nov 3, 2025 ⢠38
From Charts to Code: A Hierarchical Benchmark for Multimodal Models Paper ⢠2510.17932 ⢠Published Oct 20, 2025 ⢠8
Paper2Video: Automatic Video Generation from Scientific Papers Paper ⢠2510.05096 ⢠Published Oct 6, 2025 ⢠120