unsloth/Qwen3.6-35B-A3B-GGUF Image-Text-to-Text • 35B • Updated about 5 hours ago • 816k • 542
Running Featured 192 Gemma 4 WebGPU 🚀 192 Run Gemma 4 locally in-browser on WebGPU w/ Transformers.js
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation Paper • 2604.08455 • Published 12 days ago • 47
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 16
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance Apr 16, 2025 • 71
unsloth/gemma-4-26B-A4B-it-GGUF Image-Text-to-Text • 25B • Updated about 5 hours ago • 2.43M • 554
arcee-ai/Trinity-Large-Thinking Text Generation • 399B • Updated 11 days ago • 20.6k • • 159
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 299