Describe and highlight entities in images
Chat with a language model using text input
Tess-R1 capabilities to produce a Chain-of-Thought (CoT).
Transcribe and translate audio into text