Tokenizer¶

bit_axon.tokenizer.QwenTokenizerWrapper ¶

QwenTokenizerWrapper(path_or_name: str | Path)

Lightweight Qwen2.5 tokenizer wrapper using the tokenizers library.

Loads a tokenizer.json file (Qwen2.5 format) and provides: - encode/decode - Qwen2.5 chat template rendering (pure Python, no Jinja) - Special token properties

Load tokenizer from local file path or HuggingFace Hub repo name.

If path is a local file that exists: HFTokenizer.from_file()
If path looks like a HuggingFace ID (contains '/'): download tokenizer.json via huggingface_hub.hf_hub_download, then load with HFTokenizer.from_file()

Attributes¶

pad_token_id `property` ¶

pad_token_id: int

Return the pad token ID (endoftext, 151643 for Qwen2.5).

eos_token_id `property` ¶

eos_token_id: int

Return the end-of-sequence token ID (im_end, 151645 for Qwen2.5).

vocab_size `property` ¶

vocab_size: int

Return the vocabulary size including added tokens.

Functions¶

encode ¶

encode(text: str) -> list[int]

Encode text to list of token IDs.

decode ¶

decode(token_ids: list[int] | array, skip_special_tokens: bool = True) -> str

Decode token IDs to text. Accepts list or mx.array.

apply_chat_template ¶

apply_chat_template(messages: list[dict[str, str]], add_generation_prompt: bool = False) -> list[int]

Apply Qwen2.5 chat template to messages.

Template: <|im_start|>{role}\n{content}<|im_end|>\n If add_generation_prompt=True, appends: <|im_start|>assistant\n

Parameters:

Name	Type	Description	Default
`messages`	`list[dict[str, str]]`	[{"role": "system"\|"user"\|"assistant", "content": "..."}]	required
`add_generation_prompt`	`bool`	Whether to append assistant prompt	`False`

Returns: list of token IDs