BitAxonConfig(vocab_size: int = 32000, hidden_dim: int = 2560, num_layers: int = 24, num_heads: int = 32, d_source_model: int = 2048, ssm_d_state: int = 16, ssm_d_conv: int = 4, ssm_expand: int = 3, swa_window_size: int = 4096, moe_num_experts: int = 8, moe_top_k: int = 2, moe_intermediate_dim: int = 4096, moe_shared_expert: bool = True, weight_tying: bool = True, max_seq_len: int = 65536, rms_norm_eps: float = 1e-06)
Bit-Axon 3B model configuration.
Hybrid SSM + MoE + Quantized architecture for Apple Silicon. Target: MacBook Air M4 (16GB unified memory, ~8GB available for model).
Attributes
vocab_size class-attribute instance-attribute
hidden_dim class-attribute instance-attribute
num_layers class-attribute instance-attribute
num_heads class-attribute instance-attribute
d_source_model class-attribute instance-attribute
d_source_model: int = 2048
ssm_d_state class-attribute instance-attribute
ssm_d_conv class-attribute instance-attribute
ssm_expand class-attribute instance-attribute
swa_window_size class-attribute instance-attribute
swa_window_size: int = 4096
moe_num_experts class-attribute instance-attribute
moe_top_k class-attribute instance-attribute
moe_intermediate_dim: int = 4096
moe_shared_expert class-attribute instance-attribute
moe_shared_expert: bool = True
weight_tying class-attribute instance-attribute
weight_tying: bool = True
max_seq_len class-attribute instance-attribute
rms_norm_eps class-attribute instance-attribute
rms_norm_eps: float = 1e-06
head_dim property
SWA head dimension (hidden_dim / num_heads).
ssm_intermediate_dim: int
SSM expanded dimension (hidden_dim * ssm_expand).
Functions