"""HTML content for the TuRTLe leaderboard.""" HEADER_HTML = """
Welcome to the TuRTLe Model Leaderboard! TuRTLe is a unified evaluation framework designed to systematically assess Large Language Models (LLMs) in RTL (Register-Transfer Level) generation for hardware design. Evaluation criteria include syntax correctness, functional accuracy, synthesizability, and post-synthesis quality (PPA: Power, Performance, Area). TuRTLe integrates multiple benchmarks to highlight strengths and weaknesses of available LLMs. Use the filters below to explore different RTL benchmarks, simulators and models.
UPDATE (OCT 2025): Added Hermes-4-14B, Qwen3-8B, and Seed-OSS-36B to the leaderboard. Implemented Other Models tab and moved models to it
UPDATE (SEPT 2025): Added gpt-oss-20b and gpt-oss-120b to the leaderboard
UPDATE (JULY 2025): Our TuRTLe paper was accepted to MLCAD 2025 in September (Santa Cruz, CA), plus we've added Verilator as a new simulator alongside Icarus Verilog
UPDATE (JUNE 2025): We make our framework open-source on GitHub and we add 7 new recent models! For a total of 40 base and instruct models and 5 RTL benchmarks
The High-Performance Artificial Intelligence (HPAI) group is part of the Barcelona Supercomputing Center (BSC). This leaderboard is maintained by HPAI as part of our commitment to open science.
Feel free to contact us:
Email: hpai@bsc.es
These models were previously listed on the main leaderboard, evaluated with a potentially deprecated version of TuRTLe, and will no longer be updated.