Last updated
—
Baseline
80M
Parameters
256
Vocab
0.59
ms / token
7.88
Perplexity
12
Layers
768
Hidden
Experiments
Multimind M3
trainer200mbf16active
Byte-level transformer with a Hebbian adapter that carries memory across the sequence window. State threads across chunks so the trainer can drive TBPTT through it.
In flight
- QAT mode 2 fine-tuning of the 80M model
- Speculative decoding end-to-end (5M draft model trained)
- Mamba-2 SSD prototype
- Runtime shape refactor for multi-size coexistence