CLI Reference¶

Bit-Axon ships a single bit-axon entry point powered by Typer. Every subcommand is typed with typer.Argument and typer.Option annotations, so bit-axon --help and bit-axon <command> --help always reflect the latest signatures.

Inference¶

`bit-axon run`¶

Generate text from a prompt (or from stdin).

bit-axon run "Explain entropy in one sentence"
bit-axon run --chat
echo "What is 2+2?" | bit-axon run

Usage

bit-axon run [PROMPT] [OPTIONS]

Option	Type	Default	Description
`--model` / `-m`	`str`	`skyoo2003/bit-axon`	Model identifier (Hugging Face repo or local path)
`--tokenizer` / `-t`	`str`	`None`	Override tokenizer (defaults to the model's own)
`--max-tokens`	`int`	`512`	Maximum tokens to generate
`--temperature`	`float`	`0.6`	Sampling temperature
`--top-k`	`int`	`50`	Top-K filtering
`--top-p`	`float`	`0.95`	Nucleus (top-p) filtering threshold
`--seed`	`int`	`None`	Random seed for reproducible output
`--chat` / `-c`	`bool`	`False`	Launch an interactive chat session
`--no-stream`	`bool`	`False`	Print the full response at once instead of streaming tokens
`--config-small`	`bool`	`False`	Use the small-model configuration

Training¶

`bit-axon train`¶

Fine-tune a model on a JSONL dataset using LoRA.

bit-axon train data/train.jsonl -w skyoo2003/bit-axon -o checkpoints/my-run

Usage

bit-axon train DATA [OPTIONS]

Option	Type	Default	Description
`--model-weights` / `-w`	`str`	required	Base model weights to fine-tune
`--val-data`	`str`	`None`	Validation dataset path
`--tokenizer` / `-t`	`str`	`Qwen/Qwen2.5-3B`	Tokenizer identifier
`--lora-rank`	`int`	`8`	LoRA rank
`--lora-dropout`	`float`	`0.0`	LoRA dropout probability
`--lora-scale`	`float`	`20.0`	LoRA scaling factor
`--no-dora`	`bool`	`False`	Disable DoRA (use standard LoRA instead)
`--learning-rate` / `-lr`	`float`	`1e-4`	Peak learning rate
`--max-steps`	`int`	`10000`	Maximum training steps
`--batch-size`	`int`	`1`	Per-device batch size
`--grad-accum-steps`	`int`	`4`	Gradient accumulation steps
`--max-seq-len`	`int`	`2048`	Maximum sequence length
`--warmup-steps`	`int`	`100`	Linear warmup steps
`--max-grad-norm`	`float`	`1.0`	Gradient clipping norm
`--seed`	`int`	`42`	Random seed
`--no-thermal`	`bool`	`False`	Disable thermal management
`--temp-pause`	`float`	`85.0`	Temperature (°C) at which training pauses
`--temp-stop`	`float`	`95.0`	Temperature (°C) at which training stops
`--output-dir` / `-o`	`str`	`checkpoints`	Directory to save checkpoints
`--save-every`	`int`	`500`	Save a checkpoint every N steps
`--eval-every`	`int`	`500`	Run evaluation every N steps
`--resume`	`bool`	`False`	Resume from the latest checkpoint
`--config-small`	`bool`	`False`	Use the small-model configuration

Training pipeline (10 steps)

Load the base model weights and tokenizer.
Apply LoRA (or DoRA) adapters to the target modules.
Load and tokenize the training dataset.
Set up the optimizer with the configured learning rate and warmup schedule.
Configure gradient accumulation to simulate a larger effective batch size.
Optionally enable thermal monitoring to pause or stop training if the GPU exceeds the configured thresholds.
Run the training loop, evaluating and saving checkpoints at the configured intervals.
If interrupted or completed, write a final checkpoint.
Log training metrics (loss, learning rate, throughput) at each step.
Exit and report the path to the best or latest checkpoint.

Model Management¶

`bit-axon quantize`¶

Quantize a model to lower precision (e.g. 4-bit integer).

bit-axon quantize skyoo2003/bit-axon -o models/bit-axon-q4 -b 4 -g 64

Usage

bit-axon quantize MODEL_PATH [OPTIONS]

Option	Type	Default	Description
`--output` / `-o`	`str`	`""`	Output directory for the quantized model
`--bits` / `-b`	`int`	`4`	Quantization bit-width
`--group-size` / `-g`	`int`	`64`	Group size for grouped quantization
`--config-small`	`bool`	`False`	Use the small-model configuration

`bit-axon merge`¶

Merge a LoRA adapter back into the base model, optionally re-quantizing the result.

bit-axon merge skyoo2003/bit-axon -a checkpoints/my-run/adapter -o models/merged

Usage

bit-axon merge BASE_MODEL [OPTIONS]

Option	Type	Default	Description
`--adapter` / `-a`	`str`	required	Path to the LoRA adapter to merge
`--output` / `-o`	`str`	`""`	Output directory for the merged model
`--no-re-quantize`	`bool`	`False`	Skip re-quantization after merging
`--bits` / `-b`	`int`	`4`	Bit-width if re-quantizing
`--group-size` / `-g`	`int`	`64`	Group size if re-quantizing
`--lora-rank` / `-r`	`int`	`8`	LoRA rank of the adapter

`bit-axon download`¶

Download a model (or dataset) from Hugging Face.

bit-axon download skyoo2003/bit-axon -d models/bit-axon

Usage

bit-axon download [REPO_ID] [OPTIONS]

Option	Type	Default	Description
`--local-dir` / `-d`	`str`	`None`	Local directory to save files into
`--include`	`list[str]`	`None`	Glob patterns of files to include

`bit-axon port-weights`¶

Port model weights to the Bit-Axon format.

bit-axon port-weights models/bit-axon-ported

Usage

bit-axon port-weights OUTPUT [OPTIONS]

Option	Type	Default	Description
`--config-small`	`bool`	`False`	Use the small-model configuration

Evaluation¶

`bit-axon benchmark`¶

Measure generation throughput across multiple sequence lengths.

bit-axon benchmark -s "128,512,1024,2048" -i 10

Usage

bit-axon benchmark [OPTIONS]

Option	Type	Default	Description
`--seq-lengths` / `-s`	`str`	`128,512,1024,2048`	Comma-separated sequence lengths to benchmark
`--batch-size`	`int`	`1`	Batch size for each benchmark run
`--warmup` / `-w`	`int`	`2`	Warmup iterations (excluded from timing)
`--iterations` / `-i`	`int`	`5`	Timed iterations per sequence length
`--config-small`	`bool`	`False`	Use the small-model configuration

`bit-axon evaluate`¶

Run evaluation on a model and print aggregate metrics.

bit-axon evaluate models/bit-axon-q4 -t Qwen/Qwen2.5-3B

Usage

bit-axon evaluate MODEL_PATH [OPTIONS]

Option	Type	Default	Description
`--max-tokens`	`int`	`100000`	Token budget for the full evaluation run
`--seq-length`	`int`	`2048`	Maximum sequence length
`--tokenizer` / `-t`	`str`	`None`	Override tokenizer
`--batch-size`	`int`	`4`	Evaluation batch size
`--config-small`	`bool`	`False`	Use the small-model configuration

`bit-axon pipeline`¶

Run the end-to-end training and alignment pipeline on a built-in dataset.

bit-axon pipeline -o pipeline_output --max-steps 200

Usage

bit-axon pipeline [OPTIONS]

Option	Type	Default	Description
`--output-dir` / `-o`	`str`	`pipeline_output`	Root output directory
`--max-steps`	`int`	`100`	Maximum SFT training steps
`--orpo-steps`	`int`	`50`	Maximum ORPO alignment steps
`--max-seq-len`	`int`	`32`	Maximum sequence length
`--lora-rank`	`int`	`8`	LoRA rank for both SFT and ORPO phases
`--batch-size`	`int`	`1`	Per-device batch size

Pipeline stages (7 stages)

Download (or verify) the built-in training dataset.
Preprocess and tokenize the data for supervised fine-tuning.
Run SFT (supervised fine-tuning) with LoRA for the configured number of steps.
Generate preference pairs from the SFT checkpoint.
Run ORPO (odds-ratio preference optimization) on the preference pairs.
Merge the final adapter weights back into the base model.
Write the finished model and a summary report to the output directory.

CLI Reference¶

Inference¶

bit-axon run¶

Training¶

bit-axon train¶

Model Management¶

bit-axon quantize¶

bit-axon merge¶

bit-axon download¶

bit-axon port-weights¶

Evaluation¶

bit-axon benchmark¶

bit-axon evaluate¶

bit-axon pipeline¶

See also¶

`bit-axon run`¶

`bit-axon train`¶

`bit-axon quantize`¶

`bit-axon merge`¶

`bit-axon download`¶

`bit-axon port-weights`¶

`bit-axon benchmark`¶

`bit-axon evaluate`¶

`bit-axon pipeline`¶