wenhua cheng
wenhuach
AI & ML interests
Model Compression, CV
Recent Activity
new activity about 23 hours ago
Intel/gemma-4-31B-it-int4-AutoRound:Fails to load on Ampere (sm_86) at TP=2: Marlin kernel rejects 32-dim weight slice liked a model 2 days ago
Intel/Qwen3.6-35B-A3B-int4-AutoRound new activity 7 days ago
Intel/GLM-4.7-Flash-int4-AutoRound:MTP 0 accept rateOrganizations
Fails to load on Ampere (sm_86) at TP=2: Marlin kernel rejects 32-dim weight slice
2
#3 opened about 24 hours ago
by
wasifb
MTP 0 accept rate
2
#4 opened 7 days ago
by
AMUN-RA1
Installation Video and Testing - Step by Step
π 3
4
#1 opened 11 days ago
by
fahdmirzac
GGUF version
π₯ 1
1
#1 opened 11 days ago
by
limcheekin
Performance indicators
π 2
4
#1 opened 28 days ago
by
dehnhaide
This model always predicts some few nonsense sequences
8
#1 opened about 2 months ago
by
CharlesChen2023
Does the A100 work?
12
#1 opened about 2 months ago
by
xz123321
Thanks! And MTP key question
11
#1 opened about 2 months ago
by
seanthomaswilliams
Convert to gguf-q2ks-mixed-AutoRound?
π₯ 2
4
#2 opened 3 months ago
by
limcheekin
Qwen/Qwen3-Next-80B-A3B-Thinking has MMLU_PRO 82.7 but you guys get 0.7271
3
#2 opened 7 months ago
by
hlxxxxxx
AutoRound request: GLM-4.5-Air
1
#1 opened 4 months ago
by
babytifa
2507 Thinking model release
11
#4 opened 7 months ago
by
anjeysapkovski
How to use this kernel
#1 opened 4 months ago
by
wenhuach
Thinking version has been deleted?
1
#2 opened 4 months ago
by
reswewr
Improve model card: Add pipeline tag, library name, and update paper/citation
π₯ 1
#1 opened 5 months ago
by
nielsr
Could we get more w2a16 w3a16 and w4a16 Autoround
π 1
1
#1 opened 5 months ago
by
twhitworth
Practical performance feedback
1
#2 opened 6 months ago
by
maigonis
Works good with vLLM, just no tool calling
1
#1 opened 8 months ago
by
Ununnilium
Inference with llama.cpp + Open WebUI gives repeating `?`
4
#1 opened 6 months ago
by
whoisjeremylam