The MHA2MLA-VLM model published in the paper "MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models"
Xiaoran Fan
cnxup
AI & ML interests
NLP, CV, LLM
Organizations
None yet
models 9
cnxup/LLaVA-NeXT-8B-MLA-stage1-rope32
8B • Updated • 3 • 1
cnxup/LLaVA-NeXT-8B-MLA-stage2-rope32-d_kv_32
8B • Updated • 1 • 1
cnxup/LLaVA-NeXT-8B-MLA-stage2-rope32-d_kv_64
8B • Updated • 2 • 1
cnxup/LLaVA-NeXT-8B-MLA-stage2-rope32-d_kv_128
9B • Updated • 1 • 1
cnxup/SVD-Init
Updated • 1
cnxup/Qwen2.5-VL-7B-MLA-stage2-rope32-d_kv_128
8B • Updated • 7 • 1
cnxup/Qwen2.5-VL-7B-MLA-stage2-rope32-d_kv_64
8B • Updated • 3 • 1
cnxup/Qwen2.5-VL-7B-MLA-stage2-rope32-d_kv_32
8B • Updated • 11 • 1
cnxup/Qwen2.5-VL-7B-MLA-stage1-rope32
8B • Updated • 49 • 1