Skip to content

Config

bit_axon.config.BitAxonConfig dataclass

BitAxonConfig(vocab_size: int = 32000, hidden_dim: int = 2560, num_layers: int = 24, num_heads: int = 32, d_source_model: int = 2048, ssm_d_state: int = 16, ssm_d_conv: int = 4, ssm_expand: int = 3, swa_window_size: int = 4096, moe_num_experts: int = 8, moe_top_k: int = 2, moe_intermediate_dim: int = 4096, moe_shared_expert: bool = True, weight_tying: bool = True, max_seq_len: int = 65536, rms_norm_eps: float = 1e-06)

Bit-Axon 3B model configuration.

Hybrid SSM + MoE + Quantized architecture for Apple Silicon. Target: MacBook Air M4 (16GB unified memory, ~8GB available for model).

Attributes

vocab_size class-attribute instance-attribute
vocab_size: int = 32000
hidden_dim class-attribute instance-attribute
hidden_dim: int = 2560
num_layers class-attribute instance-attribute
num_layers: int = 24
num_heads class-attribute instance-attribute
num_heads: int = 32
d_source_model class-attribute instance-attribute
d_source_model: int = 2048
ssm_d_state class-attribute instance-attribute
ssm_d_state: int = 16
ssm_d_conv class-attribute instance-attribute
ssm_d_conv: int = 4
ssm_expand class-attribute instance-attribute
ssm_expand: int = 3
swa_window_size class-attribute instance-attribute
swa_window_size: int = 4096
moe_num_experts class-attribute instance-attribute
moe_num_experts: int = 8
moe_top_k class-attribute instance-attribute
moe_top_k: int = 2
moe_intermediate_dim class-attribute instance-attribute
moe_intermediate_dim: int = 4096
moe_shared_expert class-attribute instance-attribute
moe_shared_expert: bool = True
weight_tying class-attribute instance-attribute
weight_tying: bool = True
max_seq_len class-attribute instance-attribute
max_seq_len: int = 65536
rms_norm_eps class-attribute instance-attribute
rms_norm_eps: float = 1e-06
head_dim property
head_dim: int

SWA head dimension (hidden_dim / num_heads).

ssm_intermediate_dim property
ssm_intermediate_dim: int

SSM expanded dimension (hidden_dim * ssm_expand).

Functions