Deprecated : The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
new architecture - a Tempo14 Collection
new architecture updated 5 days ago
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM Paper
• 2401.02994
• Published Jan 4, 2024 • 52
MambaByte: Token-free Selective State Space Model Paper
• 2401.13660
• Published Jan 24, 2024 • 59
Repeat After Me: Transformers are Better than State Space Models at
Copying Paper
• 2402.01032
• Published Feb 1, 2024 • 24
BlackMamba: Mixture of Experts for State-Space Models Paper
• 2402.01771
• Published Feb 1, 2024 • 25
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
Tasks Paper
• 2402.04248
• Published Feb 6, 2024 • 32
KAN: Kolmogorov-Arnold Networks Paper
• 2404.19756
• Published Apr 30, 2024 • 116
Zamba: A Compact 7B SSM Hybrid Model Paper
• 2405.16712
• Published May 26, 2024 • 25
Transformers are SSMs: Generalized Models and Efficient Algorithms
Through Structured State Space Duality Paper
• 2405.21060
• Published May 31, 2024 • 68
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper
• 2406.02657
• Published Jun 4, 2024 • 41
Breaking the Attention Bottleneck Paper
• 2406.10906
• Published Jun 16, 2024 • 4
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper
• 2407.04620
• Published Jul 5, 2024 • 34
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Paper
• 2408.12570
• Published Aug 22, 2024 • 32
A Comprehensive Survey of Mamba Architectures for Medical Image
Analysis: Classification, Segmentation, Restoration and Beyond Paper
• 2410.02362
• Published Oct 3, 2024 • 18
Paper
• 2410.05258
• Published Oct 7, 2024 • 182
GPT or BERT: why not both? Paper
• 2410.24159
• Published Oct 31, 2024 • 14
Relaxed Recursive Transformers: Effective Parameter Sharing with
Layer-wise LoRA Paper
• 2410.20672
• Published Oct 28, 2024 • 7
SambaMixer: State of Health Prediction of Li-ion Batteries using Mamba
State Space Models Paper
• 2411.00233
• Published Oct 31, 2024 • 7
Hymba: A Hybrid-head Architecture for Small Language Models Paper
• 2411.13676
• Published Nov 20, 2024 • 48
Gated Delta Networks: Improving Mamba2 with Delta Rule Paper
• 2412.06464
• Published Dec 9, 2024 • 17
Byte Latent Transformer: Patches Scale Better Than Tokens Paper
• 2412.09871
• Published Dec 13, 2024 • 108
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper
• 2503.14456
• Published Mar 18, 2025 • 154
Deep Residual Echo State Networks: exploring residual orthogonal
connections in untrained Recurrent Neural Networks Paper
• 2508.21172
• Published Aug 28, 2025 • 2
Gated Associative Memory: A Parallel O(N) Architecture for Efficient
Sequence Modeling Paper
• 2509.00605
• Published Aug 30, 2025 • 43
Less is More: Recursive Reasoning with Tiny Networks Paper
• 2510.04871
• Published Oct 6, 2025 • 513
Paper
• 2601.00417
• Published Jan 1 • 34
Nested Learning: The Illusion of Deep Learning Architectures Paper
• 2512.24695
• Published Dec 31, 2025 • 45
Recursive Language Models Paper
• 2512.24601
• Published Dec 31, 2025 • 94
Skip to the Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs. Autoregressive LLMs Paper
• 2603.07475
• Published Mar 8 • 3
Effective Distillation to Hybrid xLSTM Architectures Paper
• 2603.15590
• Published Mar 16 • 33
Residual Stream Duality in Modern Transformer Architectures Paper
• 2603.16039
• Published Mar 17 • 4
Paper
• 2604.14430
• Published 8 days ago • 3