Tokenizer¶
bit_axon.tokenizer.QwenTokenizerWrapper ¶
Lightweight Qwen2.5 tokenizer wrapper using the tokenizers library.
Loads a tokenizer.json file (Qwen2.5 format) and provides: - encode/decode - Qwen2.5 chat template rendering (pure Python, no Jinja) - Special token properties
Load tokenizer from local file path or HuggingFace Hub repo name.
- If path is a local file that exists: HFTokenizer.from_file()
- If path looks like a HuggingFace ID (contains '/'): download tokenizer.json via huggingface_hub.hf_hub_download, then load with HFTokenizer.from_file()
Attributes¶
eos_token_id property ¶
eos_token_id: int
Return the end-of-sequence token ID (im_end, 151645 for Qwen2.5).
Functions¶
decode ¶
Decode token IDs to text. Accepts list or mx.array.
apply_chat_template ¶
apply_chat_template(messages: list[dict[str, str]], add_generation_prompt: bool = False) -> list[int]
Apply Qwen2.5 chat template to messages.
Template: <|im_start|>{role}\n{content}<|im_end|>\n If add_generation_prompt=True, appends: <|im_start|>assistant\n
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages | list[dict[str, str]] | [{"role": "system"|"user"|"assistant", "content": "..."}] | required |
add_generation_prompt | bool | Whether to append assistant prompt | False |
Returns: list of token IDs