Carpathian Open Source

Veritate

A hand-coded INT8 byte-level language model inference engine. No framework. No wrapper. Every kernel custom.

PyTorch trains the model. Veritate runs it. The two halves share nothing but a .bin weight file.

80M

Parameters

256

Vocabulary

0.59

ms / token

7.88

Perplexity

What makes it unusual

Glass-box interpretability.

Full trace on every forward pass

Per-layer residual stream, FFN neuron activations, attention scores, logit lens, and direct logit attribution. A browser-based MRI dashboard reads these live.

No GPU required

One binary. No CUDA, no driver, no runtime. CPU-native autoregressive decode at batch=1 is the CPU's home turf.

Byte-level vocabulary

256-character byte-level vocab. No tokenizer, no vocabulary mismatch, no subword artifacts. Trains on raw bytes of any corpus.

Architecture

Transformer, INT8, custom.

Layers

12

Hidden

768

FFN

3072

Heads

12

Seq length

256

Quantization

INT8 QAT

INT8 weights with per-channel quantization scales. INT16 residual stream. QAT-trained so rounding is baked in at training time.

Decode budget

0.59 ms → 0.03 ms

Five compounding optimizations planned to bring decode time down by 95%. Several are already in training.

1

INT4 / QuaRot

Weight quantization pass

in flight
2

Mixture of Depths

Adaptive layer skipping

in flight
3

Speculative decoding

5M draft model trained

in flight
4

Mamba-2 SSD

State-space backbone

in flight
5

BitNet b1.58 ternary

Ternary weight encoding

in flight

The long-range moonshot

Frontier-class reasoning anywhere.

The target

A 1.5B-parameter model, BitNet ternary quantized (300 MB on disk), with Mamba-2 backbone, Mixture of Experts, Mixture of Recursions adaptive depth, and reasoning-trace distillation from a teacher model.

Estimated behavior

~70B-class quality on hard reasoning tasks, running on any machine with an ALU. No FPU required. No datacenter required. The pitch is that reasoning capability should not have a hardware floor.

Currently in flight

  • QAT mode 2 fine-tuning of the 80M model
  • Speculative decoding end-to-end integration
  • Mamba-2 SSD prototype
  • Runtime shape refactor for multi-size coexistence

Open source. Read the code.

Veritate is developed by Carpathian and published on GitHub. The engine, plugins, and training scripts are all there. Fork it, run it, break it, build on top of it.