Qwen3.5-9B-Sculpt-Experimental

18% FFN compression with live teacher distillation. Drop-in replacement — no custom kernels, no runtime changes.

Dystrio Sculpt structurally compresses transformer FFN layers, producing dense models that load with standard transformers.

This is the Experimental tier of Qwen3.5-9B.

Use case: Local — maximum compression (1.27x prefill)

Benchmark Results (lm_eval)

Model	MMLU	HellaSwag	ARC-C	TruthfulQA	Winogrande	GSM8K
Qwen3.5-9B (baseline)	78.7	78.1	55.6	53.7	73.0	87.3
Sculpt Default (kf=0.95)	76.2 (↓2.5)	75.8 (↓2.3)	56.4 (↑0.8)	52.6 (↓1.1)	68.7 (↓4.3)	81.5 (↓5.8)
Sculpt Production (kf=0.9)	73.9 (↓4.8)	75.1 (↓3.0)	56.8 (↑1.2)	47.3 (↓6.4)	69.8 (↓3.2)	74.5 (↓12.8)
Sculpt Throughput (kf=0.88)	70.8 (↓7.9)	74.0 (↓4.1)	57.2 (↑1.6)	52.0 (↓1.7)	70.7 (↓2.3)	69.6 (↓17.7)
Sculpt Experimental (kf=0.82)	70.2 (↓8.5)	70.7 (↓7.4)	53.6 (↓2.0)	47.6 (↓6.1)	66.6 (↓6.4)	54.7 (↓32.6)

This Model vs Baseline

Benchmark	Experimental	Baseline	Delta
arc_challenge	53.6	55.6	-2.0
gsm8k	54.7	87.3	-32.6
hellaswag	70.7	78.1	-7.4
mmlu	70.2	78.7	-8.5
mmlu_abstract_algebra	48.0	66.0	-18.0
mmlu_anatomy	57.8	77.8	-20.0
mmlu_astronomy	79.6	92.8	-13.2
mmlu_business_ethics	70.0	82.0	-12.0
mmlu_clinical_knowledge	74.0	86.8	-12.8
mmlu_college_biology	80.6	93.1	-12.5
mmlu_college_chemistry	53.0	59.0	-6.0
mmlu_college_computer_science	64.0	82.0	-18.0
mmlu_college_mathematics	46.0	64.0	-18.0
mmlu_college_medicine	70.5	81.5	-11.0
mmlu_college_physics	50.0	64.7	-14.7
mmlu_computer_security	78.0	83.0	-5.0
mmlu_conceptual_physics	80.0	90.2	-10.2
mmlu_econometrics	64.0	73.7	-9.7
mmlu_electrical_engineering	68.3	82.1	-13.8
mmlu_elementary_mathematics	66.4	80.7	-14.3
mmlu_formal_logic	59.5	65.9	-6.4
mmlu_global_facts	38.0	50.0	-12.0
mmlu_high_school_biology	85.5	93.5	-8.0
mmlu_high_school_chemistry	69.0	77.8	-8.8
mmlu_high_school_computer_science	77.0	88.0	-11.0
mmlu_high_school_european_history	78.2	87.3	-9.1
mmlu_high_school_geography	89.4	92.4	-3.0
mmlu_high_school_government_and_politics	90.7	96.9	-6.2
mmlu_high_school_macroeconomics	74.6	85.9	-11.3
mmlu_high_school_mathematics	42.2	53.3	-11.1
mmlu_high_school_microeconomics	82.4	93.3	-10.9
mmlu_high_school_physics	55.6	72.8	-17.2
mmlu_high_school_psychology	89.0	93.2	-4.2
mmlu_high_school_statistics	73.1	78.7	-5.6
mmlu_high_school_us_history	82.8	90.2	-7.4
mmlu_high_school_world_history	85.7	89.9	-4.2
mmlu_human_aging	69.5	78.9	-9.4
mmlu_human_sexuality	78.6	86.3	-7.7
mmlu_humanities	64.4	70.5	-6.1
mmlu_international_law	83.5	90.1	-6.6
mmlu_jurisprudence	77.8	84.3	-6.5
mmlu_logical_fallacies	76.1	84.7	-8.6
mmlu_machine_learning	59.8	66.1	-6.3
mmlu_management	80.6	86.4	-5.8
mmlu_marketing	89.7	95.7	-6.0
mmlu_medical_genetics	78.0	91.0	-13.0
mmlu_miscellaneous	80.5	90.3	-9.8
mmlu_moral_disputes	74.3	81.2	-6.9
mmlu_moral_scenarios	48.6	53.3	-4.7
mmlu_nutrition	75.2	86.3	-11.1
mmlu_other	72.9	83.1	-10.2
mmlu_philosophy	76.2	80.4	-4.2
mmlu_prehistory	75.0	84.3	-9.3
mmlu_professional_accounting	56.0	65.6	-9.6
mmlu_professional_law	54.9	60.3	-5.4
mmlu_professional_medicine	76.5	91.5	-15.0
mmlu_professional_psychology	72.7	82.8	-10.1
mmlu_public_relations	68.2	73.6	-5.4
mmlu_security_studies	75.5	76.7	-1.2
mmlu_social_sciences	79.9	87.0	-7.1
mmlu_sociology	85.1	89.1	-4.0
mmlu_stem	66.5	78.3	-11.8
mmlu_us_foreign_policy	84.0	90.0	-6.0
mmlu_virology	51.8	56.6	-4.8
mmlu_world_religions	77.8	86.5	-8.7
truthfulqa_mc2	47.6	53.7	-6.1
winogrande	66.6	73.0	-6.4

Performance

Metric	Sculpt	Baseline	Change
Model size	15.1 GB	16.7 GB	-9.6%
Parameters	8,098,165,248	—	—
Prefill throughput	5,803 tok/s	4,566 tok/s	+27%
Decode throughput	36 tok/s	37 tok/s	-4%

KV-cache footprint is unchanged — Sculpt only compresses FFN layers, not attention.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/Qwen3.5-9B-Sculpt-Experimental",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dystrio/Qwen3.5-9B-Sculpt-Experimental")

inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

All Sculpt Tiers

Tier	HuggingFace	Config	Use Case
Default	dystrio/Qwen3.5-9B-Sculpt-Default	kf=0.95	Enterprise — maximum quality preservation
Production	dystrio/Qwen3.5-9B-Sculpt-Production	kf=0.9	Enterprise — balanced quality and efficiency
Throughput	dystrio/Qwen3.5-9B-Sculpt-Throughput	kf=0.88	Local/throughput — speed sweet spot (1.25x prefill)
Experimental	dystrio/Qwen3.5-9B-Sculpt-Experimental	kf=0.82	Local — maximum compression (1.27x prefill)

Technical Details

Method: Structural FFN pruning with importance-aware block selection + live teacher distillation (alpha=0.5)
Keep fraction: 0.82 (18% of FFN neurons removed)
Repair: 8-stage cosine-LR fine-tuning with best-checkpoint restore
Training data: general_v2 mixture (WikiText, OpenHermes 2.5, MMLU, HellaSwag, GSM8K, OpenOrca)
Hardware: 1x NVIDIA H200 141GB
Output: Standard dense transformer — loads with any HuggingFace-compatible framework

Compatibility

HuggingFace Transformers
vLLM
TGI (Text Generation Inference)
llama.cpp / GGUF conversion
AWQ / GPTQ quantization
Any framework that loads standard safetensors

Citation

@misc{dystrio_sculpt_2026,
  title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
  author={Dystrio},
  year={2026},
  url={https://huggingface.co/dystrio}
}

Downloads last month: 351

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for dystrio/Qwen3.5-9B-Sculpt-Experimental

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B