HuggingFaceTB/SmolVLM-256M-Instruct Image-Text-to-Text ⢠0.3B ⢠Updated Apr 8, 2025 ⢠89.8k ⢠331
Running on Zero Featured 1.74k Dia 1.6B šÆ 1.74k Generate realistic dialogue from a script, using Dia!
Running on Zero Featured 229 Spark TTS š 229 A text-to-speech model powered by SparkAudio and Mobvoi.
HuggingFaceTB/SmolVLM2-500M-Video-Instruct Image-Text-to-Text ⢠0.5B ⢠Updated Apr 8, 2025 ⢠79.3k ⢠113
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition ⢠6B ⢠Updated Dec 10, 2025 ⢠153k ⢠1.56k
Running Featured 353 Kokoro Text-to-Speech (WebGPU) š£ 353 High-quality speech synthesis powered by Kokoro TTS
mlx-community/SmolVLM2-500M-Video-Instruct-mlx Video-Text-to-Text ⢠Updated Feb 20, 2025 ⢠1.34k ⢠18