Spaces:

ggunio
/

intelligent-tokenizer-v6-demo

Sleeping

File size: 987 Bytes

6c26802
0cc32d2
6c26802
 
 
 
 
 
0cc32d2
 
 
 
6c26802
 
0cc32d2
13c2c77
0cc32d2
13c2c77
0cc32d2
 
 
13c2c77
0cc32d2
a9ec422
0cc32d2
 
 
 
13c2c77
0cc32d2
7e2a1a9
0cc32d2

---
title: B2NL v6.2.1 - Byte-to-Natural Language Tokenizer 🚀
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.19.2
app_file: app.py
pinned: true
license: apache-2.0
models:
- ggunio/B2NL-IntelligentTokenizer-v6.2.1
---

# B2NL v6.2.1 - Byte-to-Natural Language Tokenizer 🚀

**Compress and reconstruct text with token boundaries**

⚠️ **IMPORTANT: Currently in AUTOREGRESSIVE MODE**
- Current: ~500ms inference (Teacher Forcing training)
- Coming Soon (November 2025): Non-autoregressive training (<50ms)

## 🌟 What's New in v6.2.1

- **204 languages** support (up from 6)
- **16:1 fixed compression** ratio
- **Multi-Query Attention** (8x memory reduction)
- Model: [ggunio/B2NL-IntelligentTokenizer-v6.2.1](https://huggingface.co/ggunio/B2NL-IntelligentTokenizer-v6.2.1)

## Author

**Jinhyun Woo**
- GitHub: [Woojiggun/intelligent-tokenizer](https://github.com/Woojiggun/intelligent-tokenizer)
- Paper: [Zenodo](https://zenodo.org/records/17116281)