Skip to content

Changelog

All notable changes to the Bit-Axon project will be documented in this file.

[Unreleased]

Added

  • CLI: evaluate command for WikiText-103 perplexity benchmarking
  • CLI: port-weights command for Qwen2.5-3B to Bit-Axon format conversion
  • CLI: pipeline module for end-to-end ML workflow (SFT → merge → quantize → evaluate → inference → ORPO)
  • CLI: prepare command for dataset format conversion (alpaca, messages, orpo)
  • Evaluation: custom tokenizer support in WikiTextDataset
  • Tests: public API smoke tests and CLI command tests

[0.1.0] - 2026-04-07

Added

Architecture & Model

  • BitAxonModel — 24-layer sandwich architecture with three block variants
  • BitAxonConfig — model configuration dataclass (3.2B params, 32K vocab, 65K max context)
  • Axon-SSM (Mamba-style State Space Model) with O(1) memory per token
  • Shared-Expert MoE (8 experts, top-2 routing, ~1.4B active params/token)
  • Sliding Window Attention (4K window)
  • RMSNorm layer and KV cache utilities

Training

  • LoRA and DoRA adapter layers with QLoRA support
  • SFT, Alpaca, and ORPO dataset classes
  • Thermal-aware cooling scheduler
  • Cosine LR scheduler with warmup
  • Adapter merging and safetensors export

Inference

  • Autoregressive text generation with streaming
  • Temperature, top-k, top-p sampling
  • Interactive chat mode
  • Model loading from local path and HuggingFace Hub

Quantization

  • NF4 (4-bit NormalFloat) quantization

CLI

  • run, train, quantize, merge, benchmark, download commands

macOS App

  • SwiftUI native chat application with MLX-Swift backend

Infrastructure

  • GitHub Actions CI, PyPI publishing, pre-commit hooks