Skip to content

Research Papers¶

This section documents the theoretical foundations and key innovations behind Bit-Axon. Each page provides the mathematical formulation, design rationale, and implementation mapping for a core component of the system.

Papers¶

#	Paper	Status	Key Idea
1	Axon-SSM: Selective SSM for Apple Silicon		Mamba-style selective state space model with `@mx.compile` fused kernels, \(\mathcal{O}(1)\) memory per token
2	24-Layer Sandwich Architecture		Three-zone hybrid: SSM → SWA+MoE → SSM+MoE, dimension bridge \(d_{\text{src}}=2048\)
3	Thermal-Aware Training		`CoolingScheduler` + macOS `powermetrics`, pause at 85°C, halt at 95°C
4	TurboQuant KV Cache Compression		Compress KV cache for 64K contexts; ICLR 2026 reference

Scope¶

These papers focus on the mathematical foundations and algorithmic design of each component. For API usage and integration details, see the Architecture section.

Referenced Work¶

Mamba: Gu, A., & Dao, T. (2023). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752.
MLX: Apple Machine Learning Research. MLX: An array framework for machine learning on Apple silicon.
Qwen2.5: Qwen Team (2024). Qwen2.5 Technical Report.

See also¶

Architecture Overview — Implementation details of each component
Training Guide — Thermal-aware QLoRA training pipeline
Quantization Guide — NF4 quantization and merge workflows