TuRTLe-Leaderboard

Running

File size: 7,157 Bytes

5e92e3d

"""HTML content for the TuRTLe leaderboard."""

HEADER_HTML = """
<div align="center">
    <img src='/gradio_api/file=logo_new.png' alt='TuRTLe Logo' width='220'/>
</div>
"""

NAV_BUTTONS_HTML = """
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css">
<script defer src="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/js/all.min.js"></script>
<div style="text-align: center; margin-bottom: 0px; margin-top: 0px;">
    <a href="https://github.com/HPAI-BSC/TuRTLe" target="_blank" style="text-decoration: none; margin-right: 10px;">
        <button style="background: #333; color: white; padding: 10px 14px; border-radius: 8px; border: none; font-size: 16px; cursor: pointer;">
            GitHub Repo
        </button>
    </a>

    <a href="http://arxiv.org/abs/2504.01986" target="_blank" style="text-decoration: none; margin-right: 10px;">
        <button style="background: #b31b1b; color: white; padding: 10px 14px; border-radius: 8px; border: none; font-size: 16px; cursor: pointer;">
            arXiv MLCAD 2025
        </button>
    </a>

    <a href="mailto:hpai@bsc.es?subject=TuRTLe%20leaderboard%20new%20entry&body=Link%20to%20HuggingFace%20Model:" style="text-decoration: none;">
        <button style="background: #00674F; color: white; padding: 10px 14px; border-radius: 8px; border: none; font-size: 16px; cursor: pointer;">
            How to submit
        </button>
    </a>
    <p style="margin-top: 15px;">If you have any inquiries or wish to collaborate:
        <a href="mailto:hpai@bsc.es">hpai@bsc.es</a>
    </p>
</div>
"""

INTRO_HTML = """
<div style=" margin-top:-10px !important;">
    <p style="margin-bottom: 15px; text-align: start !important;">
        Welcome to the TuRTLe Model Leaderboard! TuRTLe is a
        <b>unified evaluation framework designed to systematically assess Large Language Models (LLMs) in RTL (Register-Transfer Level) generation</b>
        for hardware design.
        Evaluation criteria include <b>syntax correctness, functional accuracy, synthesizability, and post-synthesis quality</b>
        (PPA: Power, Performance, Area). TuRTLe integrates multiple benchmarks to highlight strengths and weaknesses of available LLMs.
        Use the filters below to explore different RTL benchmarks, simulators and models.
    </p>
    <p style="margin-top:10px; text-align:start !important;">
        <span style="font-variant:small-caps; font-weight:bold;">UPDATE (SEPT 2025):</span> Added <span>gpt-oss-20b</span> and <span>gpt-oss-120b</span> to the leaderboard
    </p>
    <p style="margin-top:-6px; text-align:start !important;">
        <span style="font-variant:small-caps; font-weight:bold;">UPDATE (JULY 2025):</span> Our TuRTLe paper was accepted to
        <a href="https://mlcad.org/symposium/2025/" target="_blank">MLCAD 2025</a> in September (Santa Cruz, CA), plus we've added Verilator as a new simulator alongside Icarus Verilog
    </p>
    <p style="margin-top: -6px; text-align: start !important;">
        <span style="font-variant: small-caps; font-weight: bold;">UPDATE (JUNE 2025):</span> We make our framework open-source on GitHub and we add 7 new recent models! For a total of 40 base and instruct models and 5 RTL benchmarks
    </p>
</div>
"""

LC_FOOTNOTE_HTML = """
<div id="lc-footnote" style="font-size: 13px; opacity: 0.6; margin-top: -5px; z-index:999; text-align: left;">
    <span style="font-weight: 600; opacity: 1;">†</span>
    <em>Line Completion</em> excludes "reasoning" models since this task targets quick auto-completion<br/>
    Additionally, for <em>Line Completion</em> and <em>Code Completion</em> benchmarks we use <b>Base</b> model variant (if available), and for <em>Spec-to-RTL</em> we use <b>Instruct</b> model variant
</div>
"""

ABOUT_US_HTML = """
<div style="max-width: 800px; margin: auto; padding: 20px; border: 1px solid #ccc; border-radius: 10px;">
    <div style="display: flex; justify-content: center; align-items: center; gap: 5%; margin-bottom: 20px;">
        <img src='/gradio_api/file=hpai_logo_grad.png' alt='HPAI Group Logo' style="width: 45%;"/>
        <img src='/gradio_api/file=bsc-logo.png' alt='BSC Logo' style="width: 25%;"/>
    </div>

    <p style="font-size: 16px; text-align: start;">
        The <b>High-Performance Artificial Intelligence (HPAI)</b> group is part of the
        <a href="https://bsc.es/" target="_blank">Barcelona Supercomputing Center (BSC)</a>.
        This leaderboard is maintained by HPAI as part of our commitment to <b>open science</b>.
    </p>

    <ul style="font-size: 16px; margin-bottom: 20px; margin-top: 20px;">
        <li><a href="https://hpai.bsc.es/" target="_blank">HPAI Website</a></li>
        <li><a href="https://github.com/HPAI-BSC/" target="_blank">HPAI GitHub Organization Page</a></li>
        <li><a href="https://huggingface.co/HPAI-BSC/" target="_blank">HPAI Hugging Face Organization Page</a></li>
    </ul>

    <p style="font-size: 16px; margin-top: 15px;">
        Feel free to contact us:
    </p>

    <p style="font-size: 16px;">Email: <a href="mailto:hpai@bsc.es"><b>hpai@bsc.es</b></a></p>
</div>
"""

REFERENCES_HTML = """
<div style="max-width: 800px; margin: auto; padding: 20px; border: 1px solid #ccc; border-radius: 10px;">
    <ul style="font-size: 16px; margin-bottom: 20px; margin-top: 20px;">
        <li><a href="https://github.com/bigcode-project/bigcode-evaluation-harness" target="_blank">Code Generation LM Evaluation Harness</a></li>
        <li>Williams, S. Icarus Verilog [Computer software]. <a href="https://github.com/steveicarus/iverilog" target="_blank">https://github.com/steveicarus/iverilog</a></li>
        <li>Snyder, W., Wasson, P., Galbi, D., & et al. Verilator [Computer software]. <a href="https://github.com/verilator/verilator" target="_blank">https://github.com/verilator/verilator</a></li>
        <li>RTL-Repo: Allam and M. Shalan, "Rtl-repo: A benchmark for evaluating llms on large-scale rtl design projects," in 2024 IEEE LLM Aided Design Workshop (LAD). IEEE, 2024, pp. 1–5.</li>
        <li>VeriGen: S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, "Verigen: A large language model for verilog code generation," ACM Transactions on Design Automation of Electronic Systems, vol. 29, no. 3, pp. 1–31, 2024. </li>
        <li>VerilogEval (I): M. Liu, N. Pinckney, B. Khailany, and H. Ren, "Verilogeval: Evaluating large language models for verilog code generation," in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023, pp. 1–8.</li>
        <li>VerilogEval (II): N. Pinckney, C. Batten, M. Liu, H. Ren, and B. Khailany, "Revisiting VerilogEval: A Year of Improvements in Large-Language Models for Hardware Code Generation," ACM Trans. Des. Autom. Electron. Syst., feb 2025. https://doi.org/10.1145/3718088</li>
        <li>RTLLM: Y. Lu, S. Liu, Q. Zhang, and Z. Xie, "Rtllm: An open-source benchmark for design rtl generation with large language model," in 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2024, pp. 722–727.</li>
    </ul>
</div>
"""