Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 wop (Wop)
A tiny (~16.6M) experimental model that predicts 4 tokens per forward pass instead of one. A Transformer trunk pools the prompt into a single vector, then 4 sequential "slot" heads emit a block of tokens left-to-right — a lightweight take on multi-token prediction.
Trained on GSM8K (GPT-2 tokenizer, 10 epochs). It's small and rough — answers are often wrong — but it's a fun little testbed for block decoding. Weights, config, training curves, and a self-contained inference snippet are all in the repo.
Also wired into the Cosmos T2-Accelerate chat demo, where it streams those 4-token blocks live. 🧪