File size: 987 Bytes
6c26802
0cc32d2
6c26802
 
 
 
 
 
0cc32d2
 
 
 
6c26802
 
0cc32d2
13c2c77
0cc32d2
13c2c77
0cc32d2
 
 
13c2c77
0cc32d2
a9ec422
0cc32d2
 
 
 
13c2c77
0cc32d2
7e2a1a9
0cc32d2
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
title: B2NL v6.2.1 - Byte-to-Natural Language Tokenizer πŸš€
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.19.2
app_file: app.py
pinned: true
license: apache-2.0
models:
- ggunio/B2NL-IntelligentTokenizer-v6.2.1
---

# B2NL v6.2.1 - Byte-to-Natural Language Tokenizer πŸš€

**Compress and reconstruct text with token boundaries**

⚠️ **IMPORTANT: Currently in AUTOREGRESSIVE MODE**
- Current: ~500ms inference (Teacher Forcing training)
- Coming Soon (November 2025): Non-autoregressive training (<50ms)

## 🌟 What's New in v6.2.1

- **204 languages** support (up from 6)
- **16:1 fixed compression** ratio
- **Multi-Query Attention** (8x memory reduction)
- Model: [ggunio/B2NL-IntelligentTokenizer-v6.2.1](https://huggingface.co/ggunio/B2NL-IntelligentTokenizer-v6.2.1)

## Author

**Jinhyun Woo**
- GitHub: [Woojiggun/intelligent-tokenizer](https://github.com/Woojiggun/intelligent-tokenizer)
- Paper: [Zenodo](https://zenodo.org/records/17116281)