Upload README.md
Browse files
README.md
CHANGED
|
@@ -25,10 +25,16 @@ pipeline_tag: text-to-speech
|
|
| 25 |
|
| 26 |
### Releases
|
| 27 |
|
| 28 |
-
| Model | Published | Training Data |
|
| 29 |
-
| ----- | --------- | ------------- |
|
| 30 |
-
|
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
### Usage
|
| 34 |
|
|
@@ -105,8 +111,6 @@ Under the hood, `kokoro` uses [`misaki`](https://pypi.org/project/misaki/), a G2
|
|
| 105 |
|
| 106 |
### Training Details
|
| 107 |
|
| 108 |
-
**Compute:** About $1000 for 1000 hours of A100 80GB vRAM
|
| 109 |
-
|
| 110 |
**Data:** Kokoro was trained exclusively on **permissive/non-copyrighted audio data** and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:
|
| 111 |
- Public domain audio
|
| 112 |
- Audio licensed under Apache, MIT, etc
|
|
@@ -116,6 +120,8 @@ Under the hood, `kokoro` uses [`misaki`](https://pypi.org/project/misaki/), a G2
|
|
| 116 |
|
| 117 |
**Total Dataset Size:** A few hundred hours of audio
|
| 118 |
|
|
|
|
|
|
|
| 119 |
### Creative Commons Attribution
|
| 120 |
|
| 121 |
The following CC BY audio was part of the dataset used to train Kokoro v1.0.
|
|
|
|
| 25 |
|
| 26 |
### Releases
|
| 27 |
|
| 28 |
+
| Model | Published | Training Data | Langs & Voices | SHA256 |
|
| 29 |
+
| ----- | --------- | ------------- | -------------- | ------ |
|
| 30 |
+
| [v0.19](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19) | 2024 Dec 25 | <100 hrs | 1 & 10 | `3b0c392f` |
|
| 31 |
+
| **v1.0** | **2025 Jan 27** | **Few hundred hrs** | [**8 & 54**](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) | `496dba11` |
|
| 32 |
+
|
| 33 |
+
| Training Costs | v0.19 | v1.0 | **Total** |
|
| 34 |
+
| -------------- | ----- | ---- | ----- |
|
| 35 |
+
| in A100 80GB GPU hours | 500 | 500 | **1000** |
|
| 36 |
+
| average hourly rate | $0.80/h | $1.20/h | **$1/h** |
|
| 37 |
+
| in USD | $400 | $600 | **$1000** |
|
| 38 |
|
| 39 |
### Usage
|
| 40 |
|
|
|
|
| 111 |
|
| 112 |
### Training Details
|
| 113 |
|
|
|
|
|
|
|
| 114 |
**Data:** Kokoro was trained exclusively on **permissive/non-copyrighted audio data** and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:
|
| 115 |
- Public domain audio
|
| 116 |
- Audio licensed under Apache, MIT, etc
|
|
|
|
| 120 |
|
| 121 |
**Total Dataset Size:** A few hundred hours of audio
|
| 122 |
|
| 123 |
+
**Total Training Cost:** About $1000 for 1000 hours of A100 80GB vRAM
|
| 124 |
+
|
| 125 |
### Creative Commons Attribution
|
| 126 |
|
| 127 |
The following CC BY audio was part of the dataset used to train Kokoro v1.0.
|