ggunio commited on
Commit
d554a87
Β·
verified Β·
1 Parent(s): 2607a65

Update README for v6.1.1

Browse files
Files changed (1) hide show
  1. README.md +29 -8
README.md CHANGED
@@ -1,14 +1,35 @@
1
  ---
2
- title: Intelligent Tokenizer V6 Demo
3
- emoji: πŸš€
4
- colorFrom: purple
5
  colorTo: green
6
  sdk: gradio
7
- sdk_version: 5.45.0
8
  app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: Pure learning-based tokenizer with 260 fixed vocab. Universa
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: B2NL Tokenizer v6.1.1 Demo
3
+ emoji: 🌍
4
+ colorFrom: blue
5
  colorTo: green
6
  sdk: gradio
7
+ sdk_version: 4.0.0
8
  app_file: app.py
9
+ pinned: true
10
+ models:
11
+ - ggunio/B2NL-v6.1.1
12
  ---
13
 
14
+ # B2NL v6.1.1: Byte-to-Natural-Language Tokenizer
15
+
16
+ ## πŸŽ‰ 97.71% Reconstruction Achieved!
17
+
18
+ This space demonstrates our breakthrough byte-level tokenizer that achieves **100% byte-exact reconstruction** for all tested languages without any vocabulary files.
19
+
20
+ ### Key Features
21
+ - **No Vocabulary**: Pure byte-level learning
22
+ - **97.71% Overall Accuracy**: Near-perfect reconstruction
23
+ - **6 Languages**: 100% byte-exact for each
24
+ - **301.7M Parameters**: Efficient size
25
+ - **Pure Learning**: No linguistic rules
26
+
27
+ ### Phase 1 Complete
28
+ We've successfully completed Phase 1 training with outstanding results. Phase 2 (compression) starting soon!
29
+
30
+ ### Links
31
+ - [Model](https://huggingface.co/ggunio/B2NL-v6.1.1)
32
+ - [GitHub](https://github.com/Woojiggun/intelligent-tokenizer)
33
+
34
+ ### Support Us
35
+ We need GPU resources to train on 204 languages. If you can help, please reach out!