Qwen3.5-9B-Sculpt-Experimental
18% FFN compression with live teacher distillation. Drop-in replacement — no custom kernels, no runtime changes.
Dystrio Sculpt structurally compresses transformer FFN layers, producing dense models that load with standard transformers.
This is the Experimental tier of Qwen3.5-9B.
Use case: Local — maximum compression (1.27x prefill)
Benchmark Results (lm_eval)
| Model | MMLU | HellaSwag | ARC-C | TruthfulQA | Winogrande | GSM8K |
|---|---|---|---|---|---|---|
| Qwen3.5-9B (baseline) | 78.7 | 78.1 | 55.6 | 53.7 | 73.0 | 87.3 |
| Sculpt Default (kf=0.95) | 76.2 (↓2.5) | 75.8 (↓2.3) | 56.4 (↑0.8) | 52.6 (↓1.1) | 68.7 (↓4.3) | 81.5 (↓5.8) |
| Sculpt Production (kf=0.9) | 73.9 (↓4.8) | 75.1 (↓3.0) | 56.8 (↑1.2) | 47.3 (↓6.4) | 69.8 (↓3.2) | 74.5 (↓12.8) |
| Sculpt Throughput (kf=0.88) | 70.8 (↓7.9) | 74.0 (↓4.1) | 57.2 (↑1.6) | 52.0 (↓1.7) | 70.7 (↓2.3) | 69.6 (↓17.7) |
| Sculpt Experimental (kf=0.82) | 70.2 (↓8.5) | 70.7 (↓7.4) | 53.6 (↓2.0) | 47.6 (↓6.1) | 66.6 (↓6.4) | 54.7 (↓32.6) |
This Model vs Baseline
| Benchmark | Experimental | Baseline | Delta |
|---|---|---|---|
| arc_challenge | 53.6 | 55.6 | -2.0 |
| gsm8k | 54.7 | 87.3 | -32.6 |
| hellaswag | 70.7 | 78.1 | -7.4 |
| mmlu | 70.2 | 78.7 | -8.5 |
| mmlu_abstract_algebra | 48.0 | 66.0 | -18.0 |
| mmlu_anatomy | 57.8 | 77.8 | -20.0 |
| mmlu_astronomy | 79.6 | 92.8 | -13.2 |
| mmlu_business_ethics | 70.0 | 82.0 | -12.0 |
| mmlu_clinical_knowledge | 74.0 | 86.8 | -12.8 |
| mmlu_college_biology | 80.6 | 93.1 | -12.5 |
| mmlu_college_chemistry | 53.0 | 59.0 | -6.0 |
| mmlu_college_computer_science | 64.0 | 82.0 | -18.0 |
| mmlu_college_mathematics | 46.0 | 64.0 | -18.0 |
| mmlu_college_medicine | 70.5 | 81.5 | -11.0 |
| mmlu_college_physics | 50.0 | 64.7 | -14.7 |
| mmlu_computer_security | 78.0 | 83.0 | -5.0 |
| mmlu_conceptual_physics | 80.0 | 90.2 | -10.2 |
| mmlu_econometrics | 64.0 | 73.7 | -9.7 |
| mmlu_electrical_engineering | 68.3 | 82.1 | -13.8 |
| mmlu_elementary_mathematics | 66.4 | 80.7 | -14.3 |
| mmlu_formal_logic | 59.5 | 65.9 | -6.4 |
| mmlu_global_facts | 38.0 | 50.0 | -12.0 |
| mmlu_high_school_biology | 85.5 | 93.5 | -8.0 |
| mmlu_high_school_chemistry | 69.0 | 77.8 | -8.8 |
| mmlu_high_school_computer_science | 77.0 | 88.0 | -11.0 |
| mmlu_high_school_european_history | 78.2 | 87.3 | -9.1 |
| mmlu_high_school_geography | 89.4 | 92.4 | -3.0 |
| mmlu_high_school_government_and_politics | 90.7 | 96.9 | -6.2 |
| mmlu_high_school_macroeconomics | 74.6 | 85.9 | -11.3 |
| mmlu_high_school_mathematics | 42.2 | 53.3 | -11.1 |
| mmlu_high_school_microeconomics | 82.4 | 93.3 | -10.9 |
| mmlu_high_school_physics | 55.6 | 72.8 | -17.2 |
| mmlu_high_school_psychology | 89.0 | 93.2 | -4.2 |
| mmlu_high_school_statistics | 73.1 | 78.7 | -5.6 |
| mmlu_high_school_us_history | 82.8 | 90.2 | -7.4 |
| mmlu_high_school_world_history | 85.7 | 89.9 | -4.2 |
| mmlu_human_aging | 69.5 | 78.9 | -9.4 |
| mmlu_human_sexuality | 78.6 | 86.3 | -7.7 |
| mmlu_humanities | 64.4 | 70.5 | -6.1 |
| mmlu_international_law | 83.5 | 90.1 | -6.6 |
| mmlu_jurisprudence | 77.8 | 84.3 | -6.5 |
| mmlu_logical_fallacies | 76.1 | 84.7 | -8.6 |
| mmlu_machine_learning | 59.8 | 66.1 | -6.3 |
| mmlu_management | 80.6 | 86.4 | -5.8 |
| mmlu_marketing | 89.7 | 95.7 | -6.0 |
| mmlu_medical_genetics | 78.0 | 91.0 | -13.0 |
| mmlu_miscellaneous | 80.5 | 90.3 | -9.8 |
| mmlu_moral_disputes | 74.3 | 81.2 | -6.9 |
| mmlu_moral_scenarios | 48.6 | 53.3 | -4.7 |
| mmlu_nutrition | 75.2 | 86.3 | -11.1 |
| mmlu_other | 72.9 | 83.1 | -10.2 |
| mmlu_philosophy | 76.2 | 80.4 | -4.2 |
| mmlu_prehistory | 75.0 | 84.3 | -9.3 |
| mmlu_professional_accounting | 56.0 | 65.6 | -9.6 |
| mmlu_professional_law | 54.9 | 60.3 | -5.4 |
| mmlu_professional_medicine | 76.5 | 91.5 | -15.0 |
| mmlu_professional_psychology | 72.7 | 82.8 | -10.1 |
| mmlu_public_relations | 68.2 | 73.6 | -5.4 |
| mmlu_security_studies | 75.5 | 76.7 | -1.2 |
| mmlu_social_sciences | 79.9 | 87.0 | -7.1 |
| mmlu_sociology | 85.1 | 89.1 | -4.0 |
| mmlu_stem | 66.5 | 78.3 | -11.8 |
| mmlu_us_foreign_policy | 84.0 | 90.0 | -6.0 |
| mmlu_virology | 51.8 | 56.6 | -4.8 |
| mmlu_world_religions | 77.8 | 86.5 | -8.7 |
| truthfulqa_mc2 | 47.6 | 53.7 | -6.1 |
| winogrande | 66.6 | 73.0 | -6.4 |
Performance
| Metric | Sculpt | Baseline | Change |
|---|---|---|---|
| Model size | 15.1 GB | 16.7 GB | -9.6% |
| Parameters | 8,098,165,248 | — | — |
| Prefill throughput | 5,803 tok/s | 4,566 tok/s | +27% |
| Decode throughput | 36 tok/s | 37 tok/s | -4% |
KV-cache footprint is unchanged — Sculpt only compresses FFN layers, not attention.
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"dystrio/Qwen3.5-9B-Sculpt-Experimental",
torch_dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dystrio/Qwen3.5-9B-Sculpt-Experimental")
inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
All Sculpt Tiers
| Tier | HuggingFace | Config | Use Case |
|---|---|---|---|
| Default | dystrio/Qwen3.5-9B-Sculpt-Default | kf=0.95 | Enterprise — maximum quality preservation |
| Production | dystrio/Qwen3.5-9B-Sculpt-Production | kf=0.9 | Enterprise — balanced quality and efficiency |
| Throughput | dystrio/Qwen3.5-9B-Sculpt-Throughput | kf=0.88 | Local/throughput — speed sweet spot (1.25x prefill) |
| Experimental | dystrio/Qwen3.5-9B-Sculpt-Experimental | kf=0.82 | Local — maximum compression (1.27x prefill) |
Technical Details
- Method: Structural FFN pruning with importance-aware block selection + live teacher distillation (alpha=0.5)
- Keep fraction: 0.82 (18% of FFN neurons removed)
- Repair: 8-stage cosine-LR fine-tuning with best-checkpoint restore
- Training data: general_v2 mixture (WikiText, OpenHermes 2.5, MMLU, HellaSwag, GSM8K, OpenOrca)
- Hardware: 1x NVIDIA H200 141GB
- Output: Standard dense transformer — loads with any HuggingFace-compatible framework
Compatibility
- HuggingFace Transformers
- vLLM
- TGI (Text Generation Inference)
- llama.cpp / GGUF conversion
- AWQ / GPTQ quantization
- Any framework that loads standard safetensors
Citation
@misc{dystrio_sculpt_2026,
title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
author={Dystrio},
year={2026},
url={https://huggingface.co/dystrio}
}
- Downloads last month
- 351