Quantization¶
bit_axon.quantization ¶
Functions¶
quantize_nf4 ¶
replace_linear_with_quantized ¶
Recursively replace nn.Linear layers with nn.QuantizedLinear.
Skips layers whose input dimension is smaller than group_size.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module | Module | Root module to traverse. | required |
group_size | int | Quantization group size. | 64 |
bits | int | Quantization bit width. | 4 |
Returns:
| Type | Description |
|---|---|
| The modified module (mutated in-place). |