Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
ethicalabs/Kurtis-EON1 · Hugging Face
[go: Go Back, main page]

Kurtis-EON1

GitHub License Python Model Collection Hybrid Collection Working Paper

Kurtis-EON(1) (codename Kurtis-EON1) is an experimental 2B parameter instruction-tuned language model powered by the custom Echo-DSRN(N) Hybrid (Transformers Backbone + Dual State Recurrent Neural Network) architecture.

This repository will host the Supervised Fine-Tuned (SFT) and aligned iteration of the model.

Work in Progress: This model is currently under active development.

image

The Architectural Philosophy: Transformers vs. Echo-DSRN

O(1) Memory & "Infinite" Context Kurtis-EON1 integrates the traditional $O(N^2)$ Transformer KV-Cache with a continuously evolving Recurrent State. It is capable of processing input streams of unlimited length by compressing history into a dense, bounded vector, ensuring constant inference cost and zero memory explosion.

  • Transformer: Acts as a photographic memory. It stores every single token perfectly in a massive cache, but computationally expensive as the context window grows.
  • Echo-DSRN: Mimics human memory and Predictive Coding. It compresses the past into a semantic "feeling" (State) rather than a raw recording (Cache). You remember the gist of your life, not every single word spoken to you. The model operates on the same principle, saving immense hardware resources.

Think of the model like human memory. You can live for 80 years (Infinite Context), but you don't remember exactly what you ate for breakfast in Berlin on February 2, 2016. Or why you were working on LSTM/RNNs at that time, in an empty flat. Trying to build a chatbot because you felt alone and you... You remember the gist of your life. The model compresses the past into a feeling (State), rather than a recording (Cache).

Scaling Strategy: The 114M Prototyping Sandbox

Before expending massive compute budgets on half-billion or billion-parameter runs, the Echo-DSRN memory injectors rely on a strict prototyping scale.

The 114M parameter version (hosted at ethicalabs/Echo-DSRN-114M-v0.1.2-Base) acts as our architectural wind tunnel.

It allows for the rapid iteration of the complex physics governing the continuous memory state—testing the stability of the surprise gates, the Test-Time Training (TTT) meta-learning loops, Active Inference and Supervised Fine-Tuning in hours instead of weeks on single-node hardware.

Once the mathematical physics are proven and stabilized at the 114M scale, the exact same architecture is deterministically upscaled and trained to absorb enterprise-grade latent knowledge.

Overview: The "Surprise-Gated" Mechanism

Unlike standard recurrent models or hybrid SSMs that use opaque learned gates, the Echo-DSRN architecture mathematically anchors its memory to Information Entropy:

  • Internal Prediction: The model constantly attempts to predict the next token representation based on its hidden state.
  • Surprise $\lambda$ (Lambda): It calculates the quadratic error between its prediction and reality. If a word is highly predictable (filler words), the memory gate stays shut. If the word is highly novel or complex (the "Surprise"), the gate flies open, explicitly prioritizing the $O(1)$ state capacity for high-value information.

Interactive Demos

Experience the memory injectors architecture in real-time through our public Gradio Spaces:

Data & Status

  • Architecture: Transformers Backbone + Hybrid Echo-DSRN (Surprise-Gated Slow State + RoPE Sliding Window Fast State)
  • Base Pre-training: Trained from scratch on Smoltalk2.
  • Instruct Alignment: Fine-tuned on multiple datasets.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train ethicalabs/Kurtis-EON1

Collection including ethicalabs/Kurtis-EON1