37 10 36

wenhua cheng

wenhuach

wenhuach21

AI & ML interests

Model Compression, CV

Recent Activity

new activity about 23 hours ago

Intel/gemma-4-31B-it-int4-AutoRound:Fails to load on Ampere (sm_86) at TP=2: Marlin kernel rejects 32-dim weight slice

liked a model 2 days ago

Intel/Qwen3.6-35B-A3B-int4-AutoRound

new activity 7 days ago

Intel/GLM-4.7-Flash-int4-AutoRound:MTP 0 accept rate

View all activity

Organizations

New activity in Intel/gemma-4-31B-it-int4-AutoRound about 23 hours ago

Fails to load on Ampere (sm_86) at TP=2: Marlin kernel rejects 32-dim weight slice

#3 opened about 24 hours ago by

wasifb

New activity in Intel/GLM-4.7-Flash-int4-AutoRound 7 days ago

MTP 0 accept rate

#4 opened 7 days ago by

AMUN-RA1

New activity in Intel/gemma-4-31B-it-int4-AutoRound 10 days ago

Installation Video and Testing - Step by Step

🚀 3

#1 opened 11 days ago by

fahdmirzac

New activity in Intel/gemma-4-26B-A4B-it-int4-mixed-AutoRound 10 days ago

GGUF version

🔥 1

#1 opened 11 days ago by

limcheekin

New activity in Intel/Qwen3.5-397B-A17B-gguf-q2ks-mixed-AutoRound 27 days ago

Performance indicators

👍 2

#1 opened 28 days ago by

dehnhaide

New activity in Intel/GLM-5-int4-mixed-AutoRound about 1 month ago

This model always predicts some few nonsense sequences

#1 opened about 2 months ago by

CharlesChen2023

New activity in Intel/Qwen3.5-122B-A10B-int4-AutoRound about 1 month ago

Does the A100 work?

#1 opened about 2 months ago by

xz123321

New activity in Intel/Qwen3.5-35B-A3B-int4-AutoRound about 2 months ago

Thanks! And MTP key question

#1 opened about 2 months ago by

seanthomaswilliams

New activity in Intel/GLM-5-int4-mixed-AutoRound about 2 months ago

vLLM fails to serve Intel/GLM-5-int4-mixed-AutoRound on NVIDIA DGX Spark (GB10, sm121) due to no valid MLA attention backend (qk_nope_head_dim 192)

#2 opened about 2 months ago by

oliverjohnwilson

New activity in Intel/GLM-4.7-Flash-int4-AutoRound about 2 months ago

Convert to gguf-q2ks-mixed-AutoRound?

🔥 2

#2 opened 3 months ago by

limcheekin

New activity in Intel/Qwen3-Next-80B-A3B-Thinking-int4-AutoRound 3 months ago

Qwen/Qwen3-Next-80B-A3B-Thinking has MMLU_PRO 82.7 but you guys get 0.7271

#2 opened 7 months ago by

hlxxxxxx

New activity in Intel/GLM-4.7-int4-mixed-AutoRound 4 months ago

AutoRound request: GLM-4.5-Air

#1 opened 4 months ago by

babytifa

New activity in Intel/Qwen3-30B-A3B-Instruct-2507-gguf-q2ks-mixed-AutoRound 4 months ago

2507 Thinking model release

#4 opened 7 months ago by

anjeysapkovski

New activity in kernels-community/quantization-gptq 4 months ago

How to use this kernel

#1 opened 4 months ago by

wenhuach

New activity in Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound 4 months ago

Thinking version has been deleted?

#2 opened 4 months ago by

reswewr

New activity in Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound 5 months ago

Improve model card: Add pipeline tag, library name, and update paper/citation

🔥 1

#1 opened 5 months ago by

nielsr

New activity in Intel/Magistral-Small-2509-int4-AutoRound 5 months ago

Could we get more w2a16 w3a16 and w4a16 Autoround

👍 1

#1 opened 5 months ago by

twhitworth

New activity in Intel/Ling-flash-2.0-gguf-q2ks-mixed-AutoRound 6 months ago

Practical performance feedback

#2 opened 6 months ago by

maigonis

New activity in Intel/Mistral-Small-3.2-24B-Instruct-2506-int4-AutoRound 6 months ago

Works good with vLLM, just no tool calling

#1 opened 8 months ago by

Ununnilium

New activity in Intel/Ling-flash-2.0-gguf-q2ks-mixed-AutoRound 6 months ago

Inference with llama.cpp + Open WebUI gives repeating `?`

#1 opened 6 months ago by

whoisjeremylam

wenhua cheng

AI & ML interests

Recent Activity

Organizations

wenhuach's activity

Fails to load on Ampere (sm_86) at TP=2: Marlin kernel rejects 32-dim weight slice

MTP 0 accept rate

Installation Video and Testing - Step by Step

GGUF version

Performance indicators

This model always predicts some few nonsense sequences

Does the A100 work?

Thanks! And MTP key question

vLLM fails to serve Intel/GLM-5-int4-mixed-AutoRound on NVIDIA DGX Spark (GB10, sm121) due to no valid MLA attention backend (qk_nope_head_dim 192)

Convert to gguf-q2ks-mixed-AutoRound?

Qwen/Qwen3-Next-80B-A3B-Thinking has MMLU_PRO 82.7 but you guys get 0.7271

AutoRound request: GLM-4.5-Air

2507 Thinking model release

How to use this kernel

Thinking version has been deleted?

Improve model card: Add pipeline tag, library name, and update paper/citation

Could we get more w2a16 w3a16 and w4a16 Autoround

Practical performance feedback

Works good with vLLM, just no tool calling

Inference with llama.cpp + Open WebUI gives repeating `?`