Live Stats

Current baseline metrics and experiment status.

Last updated

—

Baseline

80M

Parameters

256

Vocab

0.59

ms / token

7.88

Perplexity

Layers

768

Hidden

Experiments

Multimind M3

trainer200mbf16active

Byte-level transformer with a Hebbian adapter that carries memory across the sequence window. State threads across chunks so the trainer can drive TBPTT through it.

In flight

QAT mode 2 fine-tuning of the 80M model
Speculative decoding end-to-end (5M draft model trained)
Mamba-2 SSD prototype
Runtime shape refactor for multi-size coexistence