Welcome to the TuRTLe Model Leaderboard! TuRTLe is a unified evaluation framework designed to systematically assess Large Language Models (LLMs) in RTL (Register-Transfer Level) generation for hardware design. Evaluation criteria include syntax correctness, functional accuracy, synthesizability, and post-synthesis quality (PPA: Power, Performance, Area). TuRTLe integrates multiple benchmarks to highlight strengths and weaknesses of available LLMs. Use the filters below to explore different RTL benchmarks, simulators and models.

UPDATE (OCT 2025): Added Hermes-4-14B, Qwen3-8B, and Seed-OSS-36B to the leaderboard. Implemented Other Models tab and moved models to it

UPDATE (SEPT 2025): Added gpt-oss-20b and gpt-oss-120b to the leaderboard

UPDATE (JULY 2025): Our TuRTLe paper was accepted to MLCAD 2025 in September (Santa Cruz, CA), plus we've added Verilator as a new simulator alongside Icarus Verilog

UPDATE (JUNE 2025): We make our framework open-source on GitHub and we add 7 new recent models! For a total of 40 base and instruct models and 5 RTL benchmarks

The High-Performance Artificial Intelligence (HPAI) group is part of the Barcelona Supercomputing Center (BSC). This leaderboard is maintained by HPAI as part of our commitment to open science.

Feel free to contact us:

Email: hpai@bsc.es

Code Generation LM Evaluation Harness
Williams, S. Icarus Verilog [Computer software]. https://github.com/steveicarus/iverilog
Snyder, W., Wasson, P., Galbi, D., & et al. Verilator [Computer software]. https://github.com/verilator/verilator
RTL-Repo: Allam and M. Shalan, "Rtl-repo: A benchmark for evaluating llms on large-scale rtl design projects," in 2024 IEEE LLM Aided Design Workshop (LAD). IEEE, 2024, pp. 1–5.
VeriGen: S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, "Verigen: A large language model for verilog code generation," ACM Transactions on Design Automation of Electronic Systems, vol. 29, no. 3, pp. 1–31, 2024.
VerilogEval (I): M. Liu, N. Pinckney, B. Khailany, and H. Ren, "Verilogeval: Evaluating large language models for verilog code generation," in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023, pp. 1–8.
VerilogEval (II): N. Pinckney, C. Batten, M. Liu, H. Ren, and B. Khailany, "Revisiting VerilogEval: A Year of Improvements in Large-Language Models for Hardware Code Generation," ACM Trans. Des. Autom. Electron. Syst., feb 2025. https://doi.org/10.1145/3718088
RTLLM: Y. Lu, S. Liu, Q. Zhang, and Z. Xie, "Rtllm: An open-source benchmark for design rtl generation with large language model," in 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2024, pp. 722–727.