drbh HF Staff commited on
Commit
a6ab428
·
verified ·
1 Parent(s): 6094336

Upload folder using huggingface_hub

Browse files
megablocks/megablocks_only.html CHANGED
@@ -3718,219 +3718,119 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
3718
  <h1>No Kernels</h1>
3719
  <p>First, we run the model without any custom kernels to get a reference point.</p>
3720
  <h2>Forward</h2>
3721
- <h2>Forward and Backward</h2>
3722
- <p>Next, we'll attempt to run a forward and backward pass without any custom kernels. This will likely run out of memory since the default implementation is not optimized for memory usage.</p>
3723
- <div class="cell cell-failed" id="cell-forward_and_backward_no_kernel">
3724
  <div class="cell-header">
3725
  <span class="collapse-indicators">
3726
- <span onclick="toggleCode('forward_and_backward_no_kernel')" style="cursor: pointer;">▼ code</span>
3727
- <span onclick="toggleOutput('forward_and_backward_no_kernel')" style="cursor: pointer;">▼ output</span>
3728
- <span id="uv-indicator-forward_and_backward_no_kernel" onclick="toggleUvLogsFromHeader('forward_and_backward_no_kernel')" style="cursor: pointer;">▶ uv-logs</span>
3729
  </span> |
3730
- Cell: forward_and_backward_no_kernel | 99.38s | FAILED
3731
- | <button class="run-btn" onclick="runCell('forward_and_backward_no_kernel')">▶ run</button>
3732
- <button class="copy-btn" onclick="copyCell('forward_and_backward_no_kernel')">Copy</button>
3733
- <a href="cells/forward_and_backward_no_kernel.py" target="_blank" class="raw-btn">Raw</a>
3734
  </div>
3735
- <div id="code-forward_and_backward_no_kernel" class="cell-code" data-lines="196">
3736
  <div class="highlight-with-lines">
3737
- <div class="line-numbers" id="lines-forward_and_backward_no_kernel">
3738
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="1" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 1, true);">1</a>
3739
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="2" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 2, true);">2</a>
3740
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="3" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 3, true);">3</a>
3741
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="4" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 4, true);">4</a>
3742
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="5" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 5, true);">5</a>
3743
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="6" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 6, true);">6</a>
3744
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="7" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 7, true);">7</a>
3745
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="8" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 8, true);">8</a>
3746
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="9" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 9, true);">9</a>
3747
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="10" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 10, true);">10</a>
3748
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="11" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 11, true);">11</a>
3749
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="12" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 12, true);">12</a>
3750
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="13" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 13, true);">13</a>
3751
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="14" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 14, true);">14</a>
3752
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="15" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 15, true);">15</a>
3753
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="16" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 16, true);">16</a>
3754
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="17" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 17, true);">17</a>
3755
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="18" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 18, true);">18</a>
3756
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="19" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 19, true);">19</a>
3757
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="20" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 20, true);">20</a>
3758
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="21" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 21, true);">21</a>
3759
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="22" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 22, true);">22</a>
3760
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="23" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 23, true);">23</a>
3761
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="24" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 24, true);">24</a>
3762
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="25" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 25, true);">25</a>
3763
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="26" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 26, true);">26</a>
3764
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="27" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 27, true);">27</a>
3765
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="28" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 28, true);">28</a>
3766
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="29" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 29, true);">29</a>
3767
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="30" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 30, true);">30</a>
3768
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="31" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 31, true);">31</a>
3769
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="32" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 32, true);">32</a>
3770
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="33" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 33, true);">33</a>
3771
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="34" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 34, true);">34</a>
3772
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="35" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 35, true);">35</a>
3773
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="36" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 36, true);">36</a>
3774
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="37" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 37, true);">37</a>
3775
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="38" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 38, true);">38</a>
3776
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="39" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 39, true);">39</a>
3777
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="40" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 40, true);">40</a>
3778
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="41" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 41, true);">41</a>
3779
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="42" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 42, true);">42</a>
3780
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="43" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 43, true);">43</a>
3781
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="44" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 44, true);">44</a>
3782
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="45" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 45, true);">45</a>
3783
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="46" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 46, true);">46</a>
3784
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="47" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 47, true);">47</a>
3785
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="48" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 48, true);">48</a>
3786
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="49" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 49, true);">49</a>
3787
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="50" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 50, true);">50</a>
3788
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="51" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 51, true);">51</a>
3789
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="52" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 52, true);">52</a>
3790
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="53" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 53, true);">53</a>
3791
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="54" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 54, true);">54</a>
3792
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="55" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 55, true);">55</a>
3793
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="56" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 56, true);">56</a>
3794
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="57" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 57, true);">57</a>
3795
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="58" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 58, true);">58</a>
3796
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="59" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 59, true);">59</a>
3797
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="60" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 60, true);">60</a>
3798
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="61" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 61, true);">61</a>
3799
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="62" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 62, true);">62</a>
3800
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="63" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 63, true);">63</a>
3801
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="64" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 64, true);">64</a>
3802
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="65" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 65, true);">65</a>
3803
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="66" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 66, true);">66</a>
3804
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="67" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 67, true);">67</a>
3805
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="68" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 68, true);">68</a>
3806
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="69" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 69, true);">69</a>
3807
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="70" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 70, true);">70</a>
3808
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="71" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 71, true);">71</a>
3809
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="72" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 72, true);">72</a>
3810
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="73" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 73, true);">73</a>
3811
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="74" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 74, true);">74</a>
3812
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="75" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 75, true);">75</a>
3813
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="76" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 76, true);">76</a>
3814
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="77" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 77, true);">77</a>
3815
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="78" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 78, true);">78</a>
3816
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="79" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 79, true);">79</a>
3817
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="80" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 80, true);">80</a>
3818
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="81" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 81, true);">81</a>
3819
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="82" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 82, true);">82</a>
3820
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="83" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 83, true);">83</a>
3821
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="84" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 84, true);">84</a>
3822
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="85" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 85, true);">85</a>
3823
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="86" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 86, true);">86</a>
3824
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="87" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 87, true);">87</a>
3825
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="88" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 88, true);">88</a>
3826
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="89" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 89, true);">89</a>
3827
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="90" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 90, true);">90</a>
3828
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="91" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 91, true);">91</a>
3829
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="92" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 92, true);">92</a>
3830
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="93" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 93, true);">93</a>
3831
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="94" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 94, true);">94</a>
3832
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="95" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 95, true);">95</a>
3833
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="96" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 96, true);">96</a>
3834
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="97" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 97, true);">97</a>
3835
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="98" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 98, true);">98</a>
3836
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="99" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 99, true);">99</a>
3837
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="100" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 100, true);">100</a>
3838
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="101" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 101, true);">101</a>
3839
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="102" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 102, true);">102</a>
3840
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="103" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 103, true);">103</a>
3841
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="104" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 104, true);">104</a>
3842
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="105" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 105, true);">105</a>
3843
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="106" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 106, true);">106</a>
3844
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="107" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 107, true);">107</a>
3845
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="108" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 108, true);">108</a>
3846
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="109" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 109, true);">109</a>
3847
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="110" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 110, true);">110</a>
3848
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="111" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 111, true);">111</a>
3849
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="112" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 112, true);">112</a>
3850
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="113" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 113, true);">113</a>
3851
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="114" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 114, true);">114</a>
3852
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="115" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 115, true);">115</a>
3853
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="116" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 116, true);">116</a>
3854
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="117" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 117, true);">117</a>
3855
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="118" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 118, true);">118</a>
3856
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="119" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 119, true);">119</a>
3857
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="120" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 120, true);">120</a>
3858
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="121" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 121, true);">121</a>
3859
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="122" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 122, true);">122</a>
3860
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="123" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 123, true);">123</a>
3861
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="124" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 124, true);">124</a>
3862
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="125" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 125, true);">125</a>
3863
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="126" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 126, true);">126</a>
3864
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="127" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 127, true);">127</a>
3865
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="128" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 128, true);">128</a>
3866
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="129" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 129, true);">129</a>
3867
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="130" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 130, true);">130</a>
3868
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="131" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 131, true);">131</a>
3869
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="132" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 132, true);">132</a>
3870
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="133" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 133, true);">133</a>
3871
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="134" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 134, true);">134</a>
3872
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="135" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 135, true);">135</a>
3873
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="136" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 136, true);">136</a>
3874
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="137" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 137, true);">137</a>
3875
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="138" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 138, true);">138</a>
3876
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="139" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 139, true);">139</a>
3877
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="140" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 140, true);">140</a>
3878
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="141" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 141, true);">141</a>
3879
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="142" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 142, true);">142</a>
3880
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="143" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 143, true);">143</a>
3881
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="144" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 144, true);">144</a>
3882
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="145" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 145, true);">145</a>
3883
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="146" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 146, true);">146</a>
3884
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="147" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 147, true);">147</a>
3885
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="148" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 148, true);">148</a>
3886
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="149" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 149, true);">149</a>
3887
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="150" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 150, true);">150</a>
3888
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="151" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 151, true);">151</a>
3889
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="152" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 152, true);">152</a>
3890
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="153" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 153, true);">153</a>
3891
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="154" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 154, true);">154</a>
3892
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="155" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 155, true);">155</a>
3893
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="156" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 156, true);">156</a>
3894
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="157" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 157, true);">157</a>
3895
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="158" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 158, true);">158</a>
3896
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="159" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 159, true);">159</a>
3897
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="160" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 160, true);">160</a>
3898
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="161" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 161, true);">161</a>
3899
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="162" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 162, true);">162</a>
3900
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="163" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 163, true);">163</a>
3901
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="164" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 164, true);">164</a>
3902
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="165" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 165, true);">165</a>
3903
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="166" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 166, true);">166</a>
3904
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="167" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 167, true);">167</a>
3905
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="168" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 168, true);">168</a>
3906
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="169" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 169, true);">169</a>
3907
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="170" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 170, true);">170</a>
3908
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="171" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 171, true);">171</a>
3909
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="172" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 172, true);">172</a>
3910
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="173" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 173, true);">173</a>
3911
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="174" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 174, true);">174</a>
3912
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="175" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 175, true);">175</a>
3913
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="176" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 176, true);">176</a>
3914
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="177" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 177, true);">177</a>
3915
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="178" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 178, true);">178</a>
3916
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="179" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 179, true);">179</a>
3917
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="180" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 180, true);">180</a>
3918
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="181" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 181, true);">181</a>
3919
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="182" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 182, true);">182</a>
3920
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="183" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 183, true);">183</a>
3921
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="184" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 184, true);">184</a>
3922
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="185" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 185, true);">185</a>
3923
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="186" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 186, true);">186</a>
3924
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="187" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 187, true);">187</a>
3925
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="188" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 188, true);">188</a>
3926
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="189" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 189, true);">189</a>
3927
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="190" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 190, true);">190</a>
3928
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="191" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 191, true);">191</a>
3929
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="192" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 192, true);">192</a>
3930
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="193" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 193, true);">193</a>
3931
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="194" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 194, true);">194</a>
3932
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="195" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 195, true);">195</a>
3933
- <a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="196" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 196, true);">196</a>
3934
  </div>
3935
  <div class="code-wrap">
3936
  <div class="highlight"><pre><span></span><span class="c1"># /// script</span>
@@ -3957,9 +3857,6 @@ Cell: forward_and_backward_no_kernel | 99.38s | FAILED
3957
  <span class="kn">import</span><span class="w"> </span><span class="nn">logging</span>
3958
  <span class="kn">from</span><span class="w"> </span><span class="nn">transformers.models.gpt_oss.modeling_gpt_oss</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssRMSNorm</span>
3959
 
3960
- <span class="c1"># remove liger kernel for testing </span>
3961
- <span class="n">replace_kernel_forward_from_hub</span><span class="p">(</span><span class="n">GptOssRMSNorm</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
3962
-
3963
  <span class="c1"># set to debug logging</span>
3964
  <span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
3965
 
@@ -3998,6 +3895,8 @@ Cell: forward_and_backward_no_kernel | 99.38s | FAILED
3998
  <span class="n">tokenizer</span> <span class="o">=</span> <span class="n">PreTrainedTokenizerFast</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">model_id</span><span class="p">)</span>
3999
  <span class="n">quantization_config</span> <span class="o">=</span> <span class="n">Mxfp4Config</span><span class="p">(</span><span class="n">dequantize</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
4000
 
 
 
4001
  <span class="n">model</span> <span class="o">=</span> <span class="n">GptOssForCausalLM</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
4002
  <span class="n">model_id</span><span class="p">,</span>
4003
  <span class="n">dtype</span><span class="o">=</span><span class="s2">&quot;bfloat16&quot;</span><span class="p">,</span>
@@ -4018,14 +3917,9 @@ Cell: forward_and_backward_no_kernel | 99.38s | FAILED
4018
  <span class="n">reasoning_effort</span><span class="o">=</span><span class="s2">&quot;low&quot;</span><span class="p">,</span>
4019
  <span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s2">&quot;cuda&quot;</span><span class="p">)</span>
4020
 
4021
- <span class="n">max_tokens</span> <span class="o">=</span> <span class="mi">128</span> <span class="c1"># Reduced to help with memory usage</span>
4022
 
4023
- <span class="c1"># Clear memory before backward pass</span>
4024
- <span class="n">reset_peak_memory_stats</span><span class="p">()</span>
4025
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Pre-generation memory: </span><span class="si">{</span><span class="n">get_memory_stats</span><span class="p">()</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4026
-
4027
- <span class="c1"># forward and backward pass</span>
4028
- <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">autograd</span><span class="o">.</span><span class="n">set_grad_enabled</span><span class="p">(</span><span class="kc">True</span><span class="p">):</span>
4029
  <span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
4030
  <span class="n">generated</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span>
4031
  <span class="o">**</span><span class="n">inputs</span><span class="p">,</span>
@@ -4034,262 +3928,36 @@ Cell: forward_and_backward_no_kernel | 99.38s | FAILED
4034
  <span class="n">temperature</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
4035
  <span class="p">)</span>
4036
  <span class="n">end_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
4037
- <span class="nb">print</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">generated</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">skip_special_tokens</span><span class="o">=</span><span class="kc">False</span><span class="p">))</span>
4038
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Generation took </span><span class="si">{</span><span class="n">end_time</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">start_time</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2"> seconds&quot;</span><span class="p">)</span>
4039
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Post-generation memory: </span><span class="si">{</span><span class="n">get_memory_stats</span><span class="p">()</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4040
-
4041
- <span class="c1"># Use gradient checkpointing to reduce memory usage</span>
4042
- <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="s1">&#39;gradient_checkpointing_enable&#39;</span><span class="p">):</span>
4043
- <span class="n">model</span><span class="o">.</span><span class="n">gradient_checkpointing_enable</span><span class="p">()</span>
4044
- <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Enabled gradient checkpointing&quot;</span><span class="p">)</span>
4045
-
4046
- <span class="c1"># Reduce sequence length if needed for memory</span>
4047
- <span class="n">max_seq_len</span> <span class="o">=</span> <span class="mi">512</span> <span class="c1"># Limit sequence length for backward pass</span>
4048
- <span class="k">if</span> <span class="n">generated</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o">&gt;</span> <span class="n">max_seq_len</span><span class="p">:</span>
4049
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Truncating sequence from </span><span class="si">{</span><span class="n">generated</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="si">}</span><span class="s2"> to </span><span class="si">{</span><span class="n">max_seq_len</span><span class="si">}</span><span class="s2"> tokens&quot;</span><span class="p">)</span>
4050
- <span class="n">full_sequence</span> <span class="o">=</span> <span class="n">generated</span><span class="p">[:,</span> <span class="o">-</span><span class="n">max_seq_len</span><span class="p">:]</span>
4051
- <span class="k">else</span><span class="p">:</span>
4052
- <span class="n">full_sequence</span> <span class="o">=</span> <span class="n">generated</span>
4053
-
4054
- <span class="c1"># Get model outputs for the full sequence</span>
4055
- <span class="n">model</span><span class="o">.</span><span class="n">train</span><span class="p">()</span> <span class="c1"># Enable dropout and other training behaviors</span>
4056
-
4057
- <span class="k">try</span><span class="p">:</span>
4058
- <span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span>
4059
- <span class="n">input_ids</span><span class="o">=</span><span class="n">full_sequence</span><span class="p">,</span>
4060
- <span class="n">labels</span><span class="o">=</span><span class="n">full_sequence</span><span class="p">,</span> <span class="c1"># This will compute loss internally</span>
4061
- <span class="n">return_dict</span><span class="o">=</span><span class="kc">True</span>
4062
- <span class="p">)</span>
4063
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Post-forward memory: </span><span class="si">{</span><span class="n">get_memory_stats</span><span class="p">()</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4064
-
4065
- <span class="c1"># If model doesn&#39;t compute loss, compute it manually</span>
4066
- <span class="k">if</span> <span class="n">outputs</span><span class="o">.</span><span class="n">loss</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
4067
- <span class="n">shift_logits</span> <span class="o">=</span> <span class="n">outputs</span><span class="o">.</span><span class="n">logits</span><span class="p">[</span><span class="o">...</span><span class="p">,</span> <span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:]</span><span class="o">.</span><span class="n">contiguous</span><span class="p">()</span>
4068
- <span class="n">shift_labels</span> <span class="o">=</span> <span class="n">full_sequence</span><span class="p">[</span><span class="o">...</span><span class="p">,</span> <span class="mi">1</span><span class="p">:]</span><span class="o">.</span><span class="n">contiguous</span><span class="p">()</span>
4069
-
4070
- <span class="c1"># Use CrossEntropyLoss with ignore_index for padding tokens</span>
4071
- <span class="n">loss_fct</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">CrossEntropyLoss</span><span class="p">(</span><span class="n">ignore_index</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">pad_token_id</span> <span class="k">if</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">pad_token_id</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="o">-</span><span class="mi">100</span><span class="p">)</span>
4072
- <span class="n">loss</span> <span class="o">=</span> <span class="n">loss_fct</span><span class="p">(</span>
4073
- <span class="n">shift_logits</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">shift_logits</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)),</span>
4074
- <span class="n">shift_labels</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
4075
- <span class="p">)</span>
4076
- <span class="k">else</span><span class="p">:</span>
4077
- <span class="n">loss</span> <span class="o">=</span> <span class="n">outputs</span><span class="o">.</span><span class="n">loss</span>
4078
-
4079
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Loss: </span><span class="si">{</span><span class="n">loss</span><span class="o">.</span><span class="n">item</span><span class="p">()</span><span class="si">:</span><span class="s2">.4f</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4080
-
4081
- <span class="c1"># Clear intermediate tensors to save memory</span>
4082
- <span class="k">del</span> <span class="n">outputs</span>
4083
- <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">empty_cache</span><span class="p">()</span>
4084
-
4085
- <span class="c1"># Perform backward pass with memory management</span>
4086
- <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Running backward pass...&quot;</span><span class="p">)</span>
4087
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Pre-backward memory: </span><span class="si">{</span><span class="n">get_memory_stats</span><span class="p">()</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4088
-
4089
- <span class="n">loss</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span>
4090
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Post-backward memory: </span><span class="si">{</span><span class="n">get_memory_stats</span><span class="p">()</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4091
-
4092
- <span class="k">except</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">OutOfMemoryError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
4093
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;OOM during forward/backward pass: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4094
- <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Try reducing max_tokens or max_seq_len&quot;</span><span class="p">)</span>
4095
- <span class="k">raise</span>
4096
-
4097
- <span class="c1"># Calculate gradient statistics and print sample gradients</span>
4098
- <span class="n">total_norm</span> <span class="o">=</span> <span class="mf">0.0</span>
4099
- <span class="n">param_count</span> <span class="o">=</span> <span class="mi">0</span>
4100
- <span class="n">grad_samples</span> <span class="o">=</span> <span class="p">{}</span>
4101
-
4102
- <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">named_parameters</span><span class="p">():</span>
4103
- <span class="k">if</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
4104
- <span class="n">param_count</span> <span class="o">+=</span> <span class="mi">1</span>
4105
- <span class="n">grad_norm</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">item</span><span class="p">()</span>
4106
- <span class="n">total_norm</span> <span class="o">+=</span> <span class="n">grad_norm</span> <span class="o">**</span> <span class="mi">2</span>
4107
-
4108
- <span class="c1"># Collect gradient statistics for key layers</span>
4109
- <span class="k">if</span> <span class="nb">any</span><span class="p">(</span><span class="n">key</span> <span class="ow">in</span> <span class="n">name</span> <span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">&#39;embed&#39;</span><span class="p">,</span> <span class="s1">&#39;lm_head&#39;</span><span class="p">,</span> <span class="s1">&#39;mlp.up&#39;</span><span class="p">,</span> <span class="s1">&#39;mlp.down&#39;</span><span class="p">,</span> <span class="s1">&#39;self_attn.q_proj&#39;</span><span class="p">,</span> <span class="s1">&#39;norm&#39;</span><span class="p">]):</span>
4110
- <span class="n">grad_samples</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
4111
- <span class="s1">&#39;norm&#39;</span><span class="p">:</span> <span class="n">grad_norm</span><span class="p">,</span>
4112
- <span class="s1">&#39;mean&#39;</span><span class="p">:</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span><span class="o">.</span><span class="n">item</span><span class="p">(),</span>
4113
- <span class="s1">&#39;std&#39;</span><span class="p">:</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">std</span><span class="p">()</span><span class="o">.</span><span class="n">item</span><span class="p">(),</span>
4114
- <span class="s1">&#39;max&#39;</span><span class="p">:</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">max</span><span class="p">()</span><span class="o">.</span><span class="n">item</span><span class="p">(),</span>
4115
- <span class="s1">&#39;min&#39;</span><span class="p">:</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">min</span><span class="p">()</span><span class="o">.</span><span class="n">item</span><span class="p">(),</span>
4116
- <span class="p">}</span>
4117
-
4118
- <span class="n">total_norm</span> <span class="o">=</span> <span class="n">total_norm</span> <span class="o">**</span> <span class="mf">0.5</span>
4119
-
4120
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="se">\n</span><span class="s2">Gradient norm: </span><span class="si">{</span><span class="n">total_norm</span><span class="si">:</span><span class="s2">.4f</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4121
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Parameters with gradients: </span><span class="si">{</span><span class="n">param_count</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4122
-
4123
- <span class="c1"># Print sample gradients from important layers</span>
4124
- <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;</span><span class="se">\n</span><span class="s2">Sample gradient statistics:&quot;</span><span class="p">)</span>
4125
- <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">stats</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">grad_samples</span><span class="o">.</span><span class="n">items</span><span class="p">())[:</span><span class="mi">10</span><span class="p">]):</span>
4126
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot; </span><span class="si">{</span><span class="n">name</span><span class="p">[:</span><span class="mi">60</span><span class="p">]</span><span class="si">:</span><span class="s2">&lt;60</span><span class="si">}</span><span class="s2"> | norm: </span><span class="si">{</span><span class="n">stats</span><span class="p">[</span><span class="s1">&#39;norm&#39;</span><span class="p">]</span><span class="si">:</span><span class="s2">.4e</span><span class="si">}</span><span class="s2"> | mean: </span><span class="si">{</span><span class="n">stats</span><span class="p">[</span><span class="s1">&#39;mean&#39;</span><span class="p">]</span><span class="si">:</span><span class="s2">.4e</span><span class="si">}</span><span class="s2"> | std: </span><span class="si">{</span><span class="n">stats</span><span class="p">[</span><span class="s1">&#39;std&#39;</span><span class="p">]</span><span class="si">:</span><span class="s2">.4e</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4127
-
4128
- <span class="c1"># Optional: zero gradients for next iteration</span>
4129
- <span class="n">model</span><span class="o">.</span><span class="n">zero_grad</span><span class="p">()</span>
4130
- <span class="n">model</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span> <span class="c1"># Switch back to eval mode</span>
4131
  </pre></div>
4132
 
4133
- <div class="code-line-highlight" id="line-highlight-forward_and_backward_no_kernel"></div>
4134
  </div>
4135
  </div>
4136
  </div>
4137
- <div id="output-forward_and_backward_no_kernel" class="cell-output">
4138
- <div class="cell-stdout">Pre-generation memory: {&#x27;allocated_gb&#x27;: 9.390148608, &#x27;peak_gb&#x27;: 9.390148608, &#x27;reserved_gb&#x27;: 17.177772032}
4139
- &lt;|start|&gt;system&lt;|message|&gt;You are ChatGPT, a large language model trained by OpenAI.
4140
- Knowledge cutoff: 2024-06
4141
- Current date: 2025-09-24
4142
-
4143
- Reasoning: low
4144
-
4145
- # Valid channels: analysis, commentary, final. Channel must be included for every message.&lt;|end|&gt;&lt;|start|&gt;developer&lt;|message|&gt;# Instructions
4146
-
4147
- What is Tensor Parallelism?
4148
-
4149
- &lt;|end|&gt;&lt;|start|&gt;assistant&lt;|channel|&gt;analysis&lt;|message|&gt;Need to explain tensor parallelism.&lt;|end|&gt;&lt;|start|&gt;assistant&lt;|channel|&gt;final&lt;|message|&gt;**Tensor Parallelism** is a technique used to distribute the computation of a single tensor (or a set of tensors) across multiple devices (GPUs, TPUs, or other accelerators) in order to speed up large‑scale deep‑learning training and inference. It is one of several parallelism strategies—alongside data parallelism, pipeline parallelism, and model parallelism—designed to scale models that exceed the memory or compute capacity of a single device.
4150
-
4151
- ---
4152
-
4153
- ## 1. Why Tensor Parallelism?
4154
-
4155
- - **Memory constraints**: Modern
4156
- Generation took 13.15 seconds
4157
- Post-generation memory: {&#x27;allocated_gb&#x27;: 9.398670336, &#x27;peak_gb&#x27;: 9.514059776, &#x27;reserved_gb&#x27;: 17.188257792}
4158
- Enabled gradient checkpointing
4159
- Post-forward memory: {&#x27;allocated_gb&#x27;: 9.487933952, &#x27;peak_gb&#x27;: 9.514059776, &#x27;reserved_gb&#x27;: 17.188257792}
4160
- Loss: 1.9761
4161
- Running backward pass...
4162
- Pre-backward memory: {&#x27;allocated_gb&#x27;: 9.405890048, &#x27;peak_gb&#x27;: 9.514059776, &#x27;reserved_gb&#x27;: 17.177772032}
4163
- OOM during forward/backward pass: CUDA out of memory. Tried to allocate 508.00 MiB. GPU 2 has a total capacity of 22.30 GiB of which 118.69 MiB is free. Process 25557 has 22.18 GiB memory in use. Of the allocated memory 21.52 GiB is allocated by PyTorch, and 357.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
4164
- Try reducing max_tokens or max_seq_len
4165
- </div>
4166
- <div class="uv-install-logs" id="uv-logs-forward_and_backward_no_kernel">
4167
- <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4168
- <div class="uv-logs-content" style="display: none;">
4169
- Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
4170
  Downloading cpython-3.13.7-linux-x86_64-gnu (download)
4171
  Updating https://github.com/huggingface/transformers.git (HEAD)
4172
- Updated https://github.com/huggingface/transformers.git (7258ea44bc0c0a425a468f66f8559d1de8c4126d)
4173
- Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4174
- Downloading networkx (1.9MiB)
4175
- Downloading jedi (1.5MiB)
4176
- Building transformers @ git+https://github.com/huggingface/transformers.git@7258ea44bc0c0a425a468f66f8559d1de8c4126d
4177
- Downloading kiwisolver (1.4MiB)
4178
- Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4179
- Downloading nvidia-nccl-cu12 (307.4MiB)
4180
- Downloading nvidia-cublas-cu12 (566.8MiB)
4181
- Downloading nvidia-cudnn-cu12 (674.0MiB)
4182
- Downloading nvidia-cufft-cu12 (184.2MiB)
4183
- Downloading nvidia-curand-cu12 (60.7MiB)
4184
- Downloading nvidia-cusparse-cu12 (274.9MiB)
4185
- Downloading hf-xet (3.0MiB)
4186
- Downloading triton (148.4MiB)
4187
- Downloading nvidia-cufile-cu12 (1.1MiB)
4188
- Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4189
- Downloading tokenizers (3.1MiB)
4190
- Downloading matplotlib (8.3MiB)
4191
- Downloading sympy (6.0MiB)
4192
- Downloading pillow (6.3MiB)
4193
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
4194
- Downloading pygments (1.2MiB)
4195
- Downloading nvidia-cusolver-cu12 (255.1MiB)
4196
- Downloading numpy (15.9MiB)
4197
- Downloading torch (846.8MiB)
4198
- Downloading fonttools (4.7MiB)
4199
- Downloading nvidia-cufile-cu12
4200
- Downloading kiwisolver
4201
- Downloading pygments
4202
- Downloading tokenizers
4203
- Downloading hf-xet
4204
- Downloading networkx
4205
- Downloading fonttools
4206
- Downloading pillow
4207
- Downloading matplotlib
4208
- Downloading nvidia-cuda-cupti-cu12
4209
- Downloading numpy
4210
- Downloading sympy
4211
- Built transformers @ git+https://github.com/huggingface/transformers.git@7258ea44bc0c0a425a468f66f8559d1de8c4126d
4212
- Downloading nvidia-nvjitlink-cu12
4213
- Downloading jedi
4214
- Downloading nvidia-curand-cu12
4215
- Downloading nvidia-cuda-nvrtc-cu12
4216
- Downloading triton
4217
- Downloading nvidia-cufft-cu12
4218
- Downloading nvidia-cusolver-cu12
4219
- Downloading nvidia-cusparse-cu12
4220
- Downloading nvidia-cusparselt-cu12
4221
- Downloading nvidia-nccl-cu12
4222
- Downloading nvidia-cublas-cu12
4223
- Downloading nvidia-cudnn-cu12
4224
- Downloading torch
4225
- Installed 69 packages in 579ms
4226
- </div>
4227
  </div>
4228
- <div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00&lt;?, ?it/s]
4229
- Fetching 3 files: 33%|███▎ | 1/3 [00:07&lt;00:15, 7.84s/it]
4230
- Fetching 3 files: 67%|██████▋ | 2/3 [00:08&lt;00:03, 3.40s/it]
4231
- Fetching 3 files: 100%|██████████| 3/3 [00:08&lt;00:00, 2.71s/it]
4232
-
4233
- Loading checkpoint shards: 0%| | 0/3 [00:00&lt;?, ?it/s]
4234
- Loading checkpoint shards: 33%|███▎ | 1/3 [00:02&lt;00:04, 2.34s/it]
4235
- Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04&lt;00:02, 2.25s/it]
4236
- Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.80s/it]
4237
- Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.93s/it]
4238
- `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
4239
- Traceback (most recent call last):
4240
- File &quot;/repo/moe_benchmarks/megablocks/.uvnote/cells/forward_and_backward_no_kernel.py&quot;, line 154, in &lt;module&gt;
4241
- loss.backward()
4242
- ~~~~~~~~~~~~~^^
4243
- File &quot;/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/_tensor.py&quot;, line 647, in backward
4244
- torch.autograd.backward(
4245
- ~~~~~~~~~~~~~~~~~~~~~~~^
4246
- self, gradient, retain_graph, create_graph, inputs=inputs
4247
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4248
- )
4249
- ^
4250
- File &quot;/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/autograd/__init__.py&quot;, line 354, in backward
4251
- _engine_run_backward(
4252
- ~~~~~~~~~~~~~~~~~~~~^
4253
- tensors,
4254
- ^^^^^^^^
4255
- ...&lt;5 lines&gt;...
4256
- accumulate_grad=True,
4257
- ^^^^^^^^^^^^^^^^^^^^^
4258
- )
4259
- ^
4260
- File &quot;/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/autograd/graph.py&quot;, line 829, in _engine_run_backward
4261
- return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
4262
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4263
- t_outputs, *args, **kwargs
4264
- ^^^^^^^^^^^^^^^^^^^^^^^^^^
4265
- ) # Calls into the C++ engine to run the backward pass
4266
- ^
4267
- File &quot;/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/autograd/function.py&quot;, line 311, in apply
4268
- return user_fn(self, *args)
4269
- File &quot;/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/utils/checkpoint.py&quot;, line 319, in backward
4270
- torch.autograd.backward(outputs_with_grad, args_with_grad)
4271
- ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4272
- File &quot;/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/autograd/__init__.py&quot;, line 354, in backward
4273
- _engine_run_backward(
4274
- ~~~~~~~~~~~~~~~~~~~~^
4275
- tensors,
4276
- ^^^^^^^^
4277
- ...&lt;5 lines&gt;...
4278
- accumulate_grad=True,
4279
- ^^^^^^^^^^^^^^^^^^^^^
4280
- )
4281
- ^
4282
- File &quot;/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/autograd/graph.py&quot;, line 829, in _engine_run_backward
4283
- return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
4284
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4285
- t_outputs, *args, **kwargs
4286
- ^^^^^^^^^^^^^^^^^^^^^^^^^^
4287
- ) # Calls into the C++ engine to run the backward pass
4288
- ^
4289
- torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB. GPU 2 has a total capacity of 22.30 GiB of which 118.69 MiB is free. Process 25557 has 22.18 GiB memory in use. Of the allocated memory 21.52 GiB is allocated by PyTorch, and 357.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)</div>
4290
  </div>
4291
  </div>
4292
 
 
 
4293
  <h1>Kernels</h1>
4294
  <p>Next we can run with Megablocks kernels enabled.</p>
4295
  <h3>Forward</h3>
 
3718
  <h1>No Kernels</h1>
3719
  <p>First, we run the model without any custom kernels to get a reference point.</p>
3720
  <h2>Forward</h2>
3721
+ <div class="cell cell-failed" id="cell-no_kernels">
 
 
3722
  <div class="cell-header">
3723
  <span class="collapse-indicators">
3724
+ <span onclick="toggleCode('no_kernels')" style="cursor: pointer;">▼ code</span>
3725
+ <span onclick="toggleOutput('no_kernels')" style="cursor: pointer;">▼ output</span>
3726
+ <span id="uv-indicator-no_kernels" style="cursor: default; opacity: 0.3;">▶ uv-logs</span>
3727
  </span> |
3728
+ Cell: no_kernels | 19.21s | FAILED
3729
+ | <button class="run-btn" onclick="runCell('no_kernels')">▶ run</button>
3730
+ <button class="copy-btn" onclick="copyCell('no_kernels')">Copy</button>
3731
+ <a href="cells/no_kernels.py" target="_blank" class="raw-btn">Raw</a>
3732
  </div>
3733
+ <div id="code-no_kernels" class="cell-code" data-lines="98">
3734
  <div class="highlight-with-lines">
3735
+ <div class="line-numbers" id="lines-no_kernels">
3736
+ <a class="line-number" data-cell="no_kernels" data-line="1" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 1, true);">1</a>
3737
+ <a class="line-number" data-cell="no_kernels" data-line="2" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 2, true);">2</a>
3738
+ <a class="line-number" data-cell="no_kernels" data-line="3" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 3, true);">3</a>
3739
+ <a class="line-number" data-cell="no_kernels" data-line="4" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 4, true);">4</a>
3740
+ <a class="line-number" data-cell="no_kernels" data-line="5" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 5, true);">5</a>
3741
+ <a class="line-number" data-cell="no_kernels" data-line="6" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 6, true);">6</a>
3742
+ <a class="line-number" data-cell="no_kernels" data-line="7" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 7, true);">7</a>
3743
+ <a class="line-number" data-cell="no_kernels" data-line="8" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 8, true);">8</a>
3744
+ <a class="line-number" data-cell="no_kernels" data-line="9" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 9, true);">9</a>
3745
+ <a class="line-number" data-cell="no_kernels" data-line="10" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 10, true);">10</a>
3746
+ <a class="line-number" data-cell="no_kernels" data-line="11" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 11, true);">11</a>
3747
+ <a class="line-number" data-cell="no_kernels" data-line="12" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 12, true);">12</a>
3748
+ <a class="line-number" data-cell="no_kernels" data-line="13" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 13, true);">13</a>
3749
+ <a class="line-number" data-cell="no_kernels" data-line="14" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 14, true);">14</a>
3750
+ <a class="line-number" data-cell="no_kernels" data-line="15" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 15, true);">15</a>
3751
+ <a class="line-number" data-cell="no_kernels" data-line="16" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 16, true);">16</a>
3752
+ <a class="line-number" data-cell="no_kernels" data-line="17" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 17, true);">17</a>
3753
+ <a class="line-number" data-cell="no_kernels" data-line="18" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 18, true);">18</a>
3754
+ <a class="line-number" data-cell="no_kernels" data-line="19" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 19, true);">19</a>
3755
+ <a class="line-number" data-cell="no_kernels" data-line="20" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 20, true);">20</a>
3756
+ <a class="line-number" data-cell="no_kernels" data-line="21" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 21, true);">21</a>
3757
+ <a class="line-number" data-cell="no_kernels" data-line="22" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 22, true);">22</a>
3758
+ <a class="line-number" data-cell="no_kernels" data-line="23" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 23, true);">23</a>
3759
+ <a class="line-number" data-cell="no_kernels" data-line="24" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 24, true);">24</a>
3760
+ <a class="line-number" data-cell="no_kernels" data-line="25" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 25, true);">25</a>
3761
+ <a class="line-number" data-cell="no_kernels" data-line="26" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 26, true);">26</a>
3762
+ <a class="line-number" data-cell="no_kernels" data-line="27" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 27, true);">27</a>
3763
+ <a class="line-number" data-cell="no_kernels" data-line="28" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 28, true);">28</a>
3764
+ <a class="line-number" data-cell="no_kernels" data-line="29" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 29, true);">29</a>
3765
+ <a class="line-number" data-cell="no_kernels" data-line="30" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 30, true);">30</a>
3766
+ <a class="line-number" data-cell="no_kernels" data-line="31" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 31, true);">31</a>
3767
+ <a class="line-number" data-cell="no_kernels" data-line="32" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 32, true);">32</a>
3768
+ <a class="line-number" data-cell="no_kernels" data-line="33" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 33, true);">33</a>
3769
+ <a class="line-number" data-cell="no_kernels" data-line="34" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 34, true);">34</a>
3770
+ <a class="line-number" data-cell="no_kernels" data-line="35" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 35, true);">35</a>
3771
+ <a class="line-number" data-cell="no_kernels" data-line="36" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 36, true);">36</a>
3772
+ <a class="line-number" data-cell="no_kernels" data-line="37" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 37, true);">37</a>
3773
+ <a class="line-number" data-cell="no_kernels" data-line="38" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 38, true);">38</a>
3774
+ <a class="line-number" data-cell="no_kernels" data-line="39" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 39, true);">39</a>
3775
+ <a class="line-number" data-cell="no_kernels" data-line="40" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 40, true);">40</a>
3776
+ <a class="line-number" data-cell="no_kernels" data-line="41" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 41, true);">41</a>
3777
+ <a class="line-number" data-cell="no_kernels" data-line="42" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 42, true);">42</a>
3778
+ <a class="line-number" data-cell="no_kernels" data-line="43" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 43, true);">43</a>
3779
+ <a class="line-number" data-cell="no_kernels" data-line="44" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 44, true);">44</a>
3780
+ <a class="line-number" data-cell="no_kernels" data-line="45" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 45, true);">45</a>
3781
+ <a class="line-number" data-cell="no_kernels" data-line="46" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 46, true);">46</a>
3782
+ <a class="line-number" data-cell="no_kernels" data-line="47" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 47, true);">47</a>
3783
+ <a class="line-number" data-cell="no_kernels" data-line="48" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 48, true);">48</a>
3784
+ <a class="line-number" data-cell="no_kernels" data-line="49" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 49, true);">49</a>
3785
+ <a class="line-number" data-cell="no_kernels" data-line="50" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 50, true);">50</a>
3786
+ <a class="line-number" data-cell="no_kernels" data-line="51" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 51, true);">51</a>
3787
+ <a class="line-number" data-cell="no_kernels" data-line="52" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 52, true);">52</a>
3788
+ <a class="line-number" data-cell="no_kernels" data-line="53" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 53, true);">53</a>
3789
+ <a class="line-number" data-cell="no_kernels" data-line="54" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 54, true);">54</a>
3790
+ <a class="line-number" data-cell="no_kernels" data-line="55" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 55, true);">55</a>
3791
+ <a class="line-number" data-cell="no_kernels" data-line="56" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 56, true);">56</a>
3792
+ <a class="line-number" data-cell="no_kernels" data-line="57" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 57, true);">57</a>
3793
+ <a class="line-number" data-cell="no_kernels" data-line="58" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 58, true);">58</a>
3794
+ <a class="line-number" data-cell="no_kernels" data-line="59" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 59, true);">59</a>
3795
+ <a class="line-number" data-cell="no_kernels" data-line="60" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 60, true);">60</a>
3796
+ <a class="line-number" data-cell="no_kernels" data-line="61" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 61, true);">61</a>
3797
+ <a class="line-number" data-cell="no_kernels" data-line="62" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 62, true);">62</a>
3798
+ <a class="line-number" data-cell="no_kernels" data-line="63" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 63, true);">63</a>
3799
+ <a class="line-number" data-cell="no_kernels" data-line="64" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 64, true);">64</a>
3800
+ <a class="line-number" data-cell="no_kernels" data-line="65" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 65, true);">65</a>
3801
+ <a class="line-number" data-cell="no_kernels" data-line="66" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 66, true);">66</a>
3802
+ <a class="line-number" data-cell="no_kernels" data-line="67" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 67, true);">67</a>
3803
+ <a class="line-number" data-cell="no_kernels" data-line="68" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 68, true);">68</a>
3804
+ <a class="line-number" data-cell="no_kernels" data-line="69" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 69, true);">69</a>
3805
+ <a class="line-number" data-cell="no_kernels" data-line="70" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 70, true);">70</a>
3806
+ <a class="line-number" data-cell="no_kernels" data-line="71" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 71, true);">71</a>
3807
+ <a class="line-number" data-cell="no_kernels" data-line="72" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 72, true);">72</a>
3808
+ <a class="line-number" data-cell="no_kernels" data-line="73" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 73, true);">73</a>
3809
+ <a class="line-number" data-cell="no_kernels" data-line="74" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 74, true);">74</a>
3810
+ <a class="line-number" data-cell="no_kernels" data-line="75" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 75, true);">75</a>
3811
+ <a class="line-number" data-cell="no_kernels" data-line="76" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 76, true);">76</a>
3812
+ <a class="line-number" data-cell="no_kernels" data-line="77" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 77, true);">77</a>
3813
+ <a class="line-number" data-cell="no_kernels" data-line="78" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 78, true);">78</a>
3814
+ <a class="line-number" data-cell="no_kernels" data-line="79" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 79, true);">79</a>
3815
+ <a class="line-number" data-cell="no_kernels" data-line="80" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 80, true);">80</a>
3816
+ <a class="line-number" data-cell="no_kernels" data-line="81" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 81, true);">81</a>
3817
+ <a class="line-number" data-cell="no_kernels" data-line="82" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 82, true);">82</a>
3818
+ <a class="line-number" data-cell="no_kernels" data-line="83" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 83, true);">83</a>
3819
+ <a class="line-number" data-cell="no_kernels" data-line="84" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 84, true);">84</a>
3820
+ <a class="line-number" data-cell="no_kernels" data-line="85" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 85, true);">85</a>
3821
+ <a class="line-number" data-cell="no_kernels" data-line="86" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 86, true);">86</a>
3822
+ <a class="line-number" data-cell="no_kernels" data-line="87" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 87, true);">87</a>
3823
+ <a class="line-number" data-cell="no_kernels" data-line="88" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 88, true);">88</a>
3824
+ <a class="line-number" data-cell="no_kernels" data-line="89" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 89, true);">89</a>
3825
+ <a class="line-number" data-cell="no_kernels" data-line="90" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 90, true);">90</a>
3826
+ <a class="line-number" data-cell="no_kernels" data-line="91" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 91, true);">91</a>
3827
+ <a class="line-number" data-cell="no_kernels" data-line="92" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 92, true);">92</a>
3828
+ <a class="line-number" data-cell="no_kernels" data-line="93" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 93, true);">93</a>
3829
+ <a class="line-number" data-cell="no_kernels" data-line="94" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 94, true);">94</a>
3830
+ <a class="line-number" data-cell="no_kernels" data-line="95" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 95, true);">95</a>
3831
+ <a class="line-number" data-cell="no_kernels" data-line="96" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 96, true);">96</a>
3832
+ <a class="line-number" data-cell="no_kernels" data-line="97" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 97, true);">97</a>
3833
+ <a class="line-number" data-cell="no_kernels" data-line="98" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 98, true);">98</a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3834
  </div>
3835
  <div class="code-wrap">
3836
  <div class="highlight"><pre><span></span><span class="c1"># /// script</span>
 
3857
  <span class="kn">import</span><span class="w"> </span><span class="nn">logging</span>
3858
  <span class="kn">from</span><span class="w"> </span><span class="nn">transformers.models.gpt_oss.modeling_gpt_oss</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssRMSNorm</span>
3859
 
 
 
 
3860
  <span class="c1"># set to debug logging</span>
3861
  <span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
3862
 
 
3895
  <span class="n">tokenizer</span> <span class="o">=</span> <span class="n">PreTrainedTokenizerFast</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">model_id</span><span class="p">)</span>
3896
  <span class="n">quantization_config</span> <span class="o">=</span> <span class="n">Mxfp4Config</span><span class="p">(</span><span class="n">dequantize</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
3897
 
3898
+
3899
+
3900
  <span class="n">model</span> <span class="o">=</span> <span class="n">GptOssForCausalLM</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
3901
  <span class="n">model_id</span><span class="p">,</span>
3902
  <span class="n">dtype</span><span class="o">=</span><span class="s2">&quot;bfloat16&quot;</span><span class="p">,</span>
 
3917
  <span class="n">reasoning_effort</span><span class="o">=</span><span class="s2">&quot;low&quot;</span><span class="p">,</span>
3918
  <span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s2">&quot;cuda&quot;</span><span class="p">)</span>
3919
 
3920
+ <span class="n">max_tokens</span> <span class="o">=</span> <span class="mi">256</span>
3921
 
3922
+ <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">inference_mode</span><span class="p">():</span>
 
 
 
 
 
3923
  <span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
3924
  <span class="n">generated</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span>
3925
  <span class="o">**</span><span class="n">inputs</span><span class="p">,</span>
 
3928
  <span class="n">temperature</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
3929
  <span class="p">)</span>
3930
  <span class="n">end_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
3931
+
3932
+ <span class="nb">print</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">generated</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">skip_special_tokens</span><span class="o">=</span><span class="kc">False</span><span class="p">))</span>
3933
+ <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Generation took </span><span class="si">{</span><span class="n">end_time</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">start_time</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2"> seconds&quot;</span><span class="p">)</span>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3934
  </pre></div>
3935
 
3936
+ <div class="code-line-highlight" id="line-highlight-no_kernels"></div>
3937
  </div>
3938
  </div>
3939
  </div>
3940
+ <div id="output-no_kernels" class="cell-output">
3941
+ <div class="cell-stderr">Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3942
  Downloading cpython-3.13.7-linux-x86_64-gnu (download)
3943
  Updating https://github.com/huggingface/transformers.git (HEAD)
3944
+ Updated https://github.com/huggingface/transformers.git (e691f84412563b6abca098f3e044980725d8daa3)
3945
+ × No solution found when resolving script dependencies:
3946
+ ╰─▶ Because only transformers==4.57.0.dev0 is available and
3947
+ transformers==4.57.0.dev0 depends on huggingface-hub==1.0.0rc1,
3948
+ we can conclude that all versions of transformers depend on
3949
+ huggingface-hub==1.0.0rc1.
3950
+ And because kernels==0.10.0 depends on huggingface-hub&gt;=0.26.0,&lt;1.0,
3951
+ we can conclude that kernels==0.10.0 and all versions of transformers
3952
+ are incompatible.
3953
+ And because you require kernels==0.10.0 and transformers, we can
3954
+ conclude that your requirements are unsatisfiable.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3955
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3956
  </div>
3957
  </div>
3958
 
3959
+ <h2>Forward and Backward</h2>
3960
+ <p>Next, we'll attempt to run a forward and backward pass without any custom kernels. This will likely run out of memory since the default implementation is not optimized for memory usage.</p>
3961
  <h1>Kernels</h1>
3962
  <p>Next we can run with Megablocks kernels enabled.</p>
3963
  <h3>Forward</h3>
megablocks_yamoe/artifacts/binned_run/binned_results.json CHANGED
@@ -9,16 +9,16 @@
9
  "vary_inputs": true
10
  },
11
  "stats": {
12
- "avg_ms": 36.21514258000616,
13
- "min_ms": 33.172280000030696,
14
- "max_ms": 38.75413800005845,
15
- "std_ms": 1.401058294284512,
16
- "p50_ms": 36.36444199997868,
17
- "p95_ms": 38.060839599990004,
18
- "p99_ms": 38.46422802999541,
19
  "num_iters": 50,
20
- "tokens_per_s": 2761.275888368544,
21
- "throughput_variance": 108.05444381816277
22
  },
23
  "output_sum": 3.97190523147583
24
  }
 
9
  "vary_inputs": true
10
  },
11
  "stats": {
12
+ "avg_ms": 36.26809924006011,
13
+ "min_ms": 34.103908000361116,
14
+ "max_ms": 37.68557000057626,
15
+ "std_ms": 1.1598518125118418,
16
+ "p50_ms": 36.52223600056459,
17
+ "p95_ms": 37.6427445000445,
18
+ "p99_ms": 37.677440410316194,
19
  "num_iters": 50,
20
+ "tokens_per_s": 2757.2440269917565,
21
+ "throughput_variance": 89.13103199163609
22
  },
23
  "output_sum": 3.97190523147583
24
  }
megablocks_yamoe/artifacts/gptoss_run/gptoss_results.json CHANGED
@@ -9,16 +9,16 @@
9
  "vary_inputs": true
10
  },
11
  "stats": {
12
- "avg_ms": 45.94982444000152,
13
- "min_ms": 40.76497799997014,
14
- "max_ms": 52.299967999942965,
15
- "std_ms": 3.623045351544196,
16
- "p50_ms": 45.46925300002158,
17
- "p95_ms": 51.35251775002985,
18
- "p99_ms": 52.12179027996967,
19
  "num_iters": 50,
20
- "tokens_per_s": 2176.286878540176,
21
- "throughput_variance": 169.79505096491204
22
  },
23
  "output_sum": 11.53223705291748
24
  }
 
9
  "vary_inputs": true
10
  },
11
  "stats": {
12
+ "avg_ms": 46.913985819956,
13
+ "min_ms": 40.44806400088419,
14
+ "max_ms": 51.07520399997156,
15
+ "std_ms": 2.9921332618008196,
16
+ "p50_ms": 47.418902999652346,
17
+ "p95_ms": 50.800493049837314,
18
+ "p99_ms": 50.948625239852845,
19
  "num_iters": 50,
20
+ "tokens_per_s": 2131.560519794133,
21
+ "throughput_variance": 139.93911554997217
22
  },
23
  "output_sum": 11.53223705291748
24
  }
megablocks_yamoe/artifacts/gptoss_training_run/gptoss_training_results.json CHANGED
@@ -9,16 +9,16 @@
9
  "vary_inputs": true
10
  },
11
  "stats": {
12
- "avg_ms": 46.09780513999567,
13
- "min_ms": 38.8389360000474,
14
- "max_ms": 49.40391599996019,
15
- "std_ms": 2.4686999934552376,
16
- "p50_ms": 47.23983950003685,
17
- "p95_ms": 48.725092950002136,
18
- "p99_ms": 49.16830440000467,
19
  "num_iters": 50,
20
- "tokens_per_s": 2169.300679203864,
21
- "throughput_variance": 122.29861537972276
22
  },
23
  "output_sum": 11.53223705291748
24
  }
 
9
  "vary_inputs": true
10
  },
11
  "stats": {
12
+ "avg_ms": 46.289439859992854,
13
+ "min_ms": 39.97907499979192,
14
+ "max_ms": 50.58144600025116,
15
+ "std_ms": 2.9172154402078077,
16
+ "p50_ms": 46.64785849990949,
17
+ "p95_ms": 50.26727430031315,
18
+ "p99_ms": 50.5162941305025,
19
  "num_iters": 50,
20
+ "tokens_per_s": 2160.3199412751637,
21
+ "throughput_variance": 139.86427060112865
22
  },
23
  "output_sum": 11.53223705291748
24
  }
megablocks_yamoe/artifacts/yamoe_run/yamoe_results.json CHANGED
@@ -9,16 +9,16 @@
9
  "vary_inputs": true
10
  },
11
  "stats": {
12
- "avg_ms": 4.247618279998733,
13
- "min_ms": 4.12893800000802,
14
- "max_ms": 4.265831999987313,
15
- "std_ms": 0.020712896658640616,
16
- "p50_ms": 4.251555999985612,
17
- "p95_ms": 4.263803499975438,
18
- "p99_ms": 4.2652827100027935,
19
  "num_iters": 50,
20
- "tokens_per_s": 23542.605151428495,
21
- "throughput_variance": 117.11531020813602
22
  },
23
  "output_sum": 3.97190523147583
24
  }
 
9
  "vary_inputs": true
10
  },
11
  "stats": {
12
+ "avg_ms": 4.248197240067384,
13
+ "min_ms": 4.136622000260104,
14
+ "max_ms": 4.280714999367774,
15
+ "std_ms": 0.02141682051311511,
16
+ "p50_ms": 4.253484999935608,
17
+ "p95_ms": 4.265540049709671,
18
+ "p99_ms": 4.273649199667489,
19
  "num_iters": 50,
20
+ "tokens_per_s": 23539.396677922097,
21
+ "throughput_variance": 120.66648678204231
22
  },
23
  "output_sum": 3.97190523147583
24
  }
megablocks_yamoe/cells/__pycache__/bench_utils.cpython-311.pyc CHANGED
Binary files a/megablocks_yamoe/cells/__pycache__/bench_utils.cpython-311.pyc and b/megablocks_yamoe/cells/__pycache__/bench_utils.cpython-311.pyc differ
 
megablocks_yamoe/cells/__pycache__/config.cpython-311.pyc CHANGED
Binary files a/megablocks_yamoe/cells/__pycache__/config.cpython-311.pyc and b/megablocks_yamoe/cells/__pycache__/config.cpython-311.pyc differ
 
megablocks_yamoe/megablocks_yamoe.html CHANGED
@@ -3715,84 +3715,17 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
3715
  </div>
3716
 
3717
  <div class="main-content">
3718
- <div class="cell" id="cell-nv">
3719
- <div class="cell-header">
3720
- <span class="collapse-indicators">
3721
- <span onclick="toggleCode('nv')" style="cursor: pointer;">▼ code</span>
3722
- <span onclick="toggleOutput('nv')" style="cursor: pointer;">▼ output</span>
3723
- <span id="uv-indicator-nv" style="cursor: default; opacity: 0.3;">▶ uv-logs</span>
3724
- </span> |
3725
- Cell: nv | 0.55s
3726
- | <button class="run-btn" onclick="runCell('nv')">▶ run</button>
3727
- <button class="copy-btn" onclick="copyCell('nv')">Copy</button>
3728
- <a href="cells/nv.py" target="_blank" class="raw-btn">Raw</a>
3729
- </div>
3730
- <div id="code-nv" class="cell-code" data-lines="3">
3731
- <div class="highlight-with-lines">
3732
- <div class="line-numbers" id="lines-nv">
3733
- <a class="line-number" data-cell="nv" data-line="1" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 1, true);">1</a>
3734
- <a class="line-number" data-cell="nv" data-line="2" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 2, true);">2</a>
3735
- <a class="line-number" data-cell="nv" data-line="3" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 3, true);">3</a>
3736
- </div>
3737
- <div class="code-wrap">
3738
- <div class="highlight"><pre><span></span><span class="kn">import</span><span class="w"> </span><span class="nn">subprocess</span>
3739
-
3740
- <span class="nb">print</span><span class="p">(</span><span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">&quot;nvidia-smi&quot;</span><span class="p">],</span> <span class="n">capture_output</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">stdout</span><span class="p">)</span>
3741
- </pre></div>
3742
-
3743
- <div class="code-line-highlight" id="line-highlight-nv"></div>
3744
- </div>
3745
- </div>
3746
- </div>
3747
- <div id="output-nv" class="cell-output">
3748
- <div class="cell-stdout">Wed Sep 24 22:04:34 2025
3749
- +-----------------------------------------------------------------------------------------+
3750
- | NVIDIA-SMI 570.172.08 Driver Version: 570.172.08 CUDA Version: 12.8 |
3751
- |-----------------------------------------+------------------------+----------------------+
3752
- | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
3753
- | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
3754
- | | | MIG M. |
3755
- |=========================================+========================+======================|
3756
- | 0 NVIDIA A10G On | 00000000:00:1B.0 Off | 0 |
3757
- | 0% 36C P0 45W / 300W | 0MiB / 23028MiB | 0% Default |
3758
- | | | N/A |
3759
- +-----------------------------------------+------------------------+----------------------+
3760
- | 1 NVIDIA A10G On | 00000000:00:1C.0 Off | 0 |
3761
- | 0% 37C P0 47W / 300W | 0MiB / 23028MiB | 0% Default |
3762
- | | | N/A |
3763
- +-----------------------------------------+------------------------+----------------------+
3764
- | 2 NVIDIA A10G On | 00000000:00:1D.0 Off | 0 |
3765
- | 0% 35C P0 47W / 300W | 0MiB / 23028MiB | 0% Default |
3766
- | | | N/A |
3767
- +-----------------------------------------+------------------------+----------------------+
3768
- | 3 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
3769
- | 0% 37C P0 44W / 300W | 0MiB / 23028MiB | 0% Default |
3770
- | | | N/A |
3771
- +-----------------------------------------+------------------------+----------------------+
3772
-
3773
- +-----------------------------------------------------------------------------------------+
3774
- | Processes: |
3775
- | GPU GI CI PID Type Process name GPU Memory |
3776
- | ID ID Usage |
3777
- |=========================================================================================|
3778
- | No running processes found |
3779
- +-----------------------------------------------------------------------------------------+
3780
-
3781
- </div>
3782
- </div>
3783
- </div>
3784
-
3785
- <h1>Comparison of Megablocks and Yamoe Kernels</h1>
3786
  <p>This note compares the performance of the Megablocks and Yamoe kernels on the GPT-OSS-20B model.</p>
3787
  <h2>Megablocks kernel</h2>
3788
- <div class="cell" id="cell-setup2">
3789
  <div class="cell-header">
3790
  <span class="collapse-indicators">
3791
  <span onclick="toggleCode('setup2')" style="cursor: pointer;">▼ code</span>
3792
  <span onclick="toggleOutput('setup2')" style="cursor: pointer;">▼ output</span>
3793
- <span id="uv-indicator-setup2" onclick="toggleUvLogsFromHeader('setup2')" style="cursor: pointer;">▶ uv-logs</span>
3794
  </span> |
3795
- Cell: setup2 | 114.03s
3796
  | <button class="run-btn" onclick="runCell('setup2')">▶ run</button>
3797
  <button class="copy-btn" onclick="copyCell('setup2')">Copy</button>
3798
  <a href="cells/setup2.py" target="_blank" class="raw-btn">Raw</a>
@@ -4039,561 +3972,25 @@ Cell: setup2 | 114.03s
4039
  </div>
4040
  </div>
4041
  <div id="output-setup2" class="cell-output">
4042
- <div class="cell-stdout">&lt;|start|&gt;system&lt;|message|&gt;You are ChatGPT, a large language model trained by OpenAI.
4043
- Knowledge cutoff: 2024-06
4044
- Current date: 2025-09-24
4045
-
4046
- Reasoning: low
4047
-
4048
- # Valid channels: analysis, commentary, final. Channel must be included for every message.&lt;|end|&gt;&lt;|start|&gt;developer&lt;|message|&gt;# Instructions
4049
-
4050
- What is Tensor Parallelism?
4051
-
4052
- &lt;|end|&gt;&lt;|start|&gt;assistant&lt;|channel|&gt;analysis&lt;|message|&gt;We need to explain what Tensor Parallelism is. It&#x27;s a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices. Provide details: how it works, benefits, challenges, typical frameworks, etc. Also mention difference from data parallelism, pipeline parallelism. Provide example: splitting a weight matrix across GPUs, each GPU holds a slice, compute partial results, then gather. Provide mention of communication overhead, scaling, etc. Also mention that it&#x27;s used in large models like GPT-3, Megatron-LM, DeepSpeed. Provide references. Also mention that it&#x27;s also called model parallelism. Provide explanation of how it works in practice: e.g., for a linear layer, weight matrix W of shape (out_features, in_features). In tensor parallelism, split W along out_features dimension across GPUs. Each GPU computes partial output. Then gather outputs. Provide details on how to handle bias, etc. Provide mention of &quot;tensor model parallelism&quot; vs &quot;tensor parallelism&quot; synonyms. Provide mention of &quot;tensor parallelism&quot; in Megatron-LM: splitting weight matrices across GPUs. Provide mention of &quot;tensor parallelism&quot; in DeepSpeed: &quot;ZeRO-Offload&quot; etc. Provide mention
4053
- Generation took 31.36 seconds
4054
- </div>
4055
- <div class="uv-install-logs" id="uv-logs-setup2">
4056
- <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4057
- <div class="uv-logs-content" style="display: none;">
4058
- Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
4059
  Downloading cpython-3.13.7-linux-x86_64-gnu (download)
4060
  Updating https://github.com/huggingface/transformers.git (HEAD)
4061
- Updated https://github.com/huggingface/transformers.git (7258ea44bc0c0a425a468f66f8559d1de8c4126d)
4062
- Downloading jedi (1.5MiB)
4063
- Downloading pygments (1.2MiB)
4064
- Building transformers @ git+https://github.com/huggingface/transformers.git@7258ea44bc0c0a425a468f66f8559d1de8c4126d
4065
- Downloading matplotlib (8.3MiB)
4066
- Downloading networkx (1.9MiB)
4067
- Downloading sympy (6.0MiB)
4068
- Downloading nvidia-cublas-cu12 (566.8MiB)
4069
- Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4070
- Downloading nvidia-nccl-cu12 (307.4MiB)
4071
- Downloading hf-xet (3.0MiB)
4072
- Downloading nvidia-cusparse-cu12 (274.9MiB)
4073
- Downloading fonttools (4.7MiB)
4074
- Downloading nvidia-cufile-cu12 (1.1MiB)
4075
- Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4076
- Downloading triton (148.4MiB)
4077
- Downloading nvidia-cusolver-cu12 (255.1MiB)
4078
- Downloading tokenizers (3.1MiB)
4079
- Downloading kiwisolver (1.4MiB)
4080
- Downloading nvidia-curand-cu12 (60.7MiB)
4081
- Downloading pillow (6.3MiB)
4082
- Downloading numpy (15.9MiB)
4083
- Downloading nvidia-cufft-cu12 (184.2MiB)
4084
- Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4085
- Downloading nvidia-cudnn-cu12 (674.0MiB)
4086
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
4087
- Downloading torch (846.8MiB)
4088
- Downloading nvidia-cufile-cu12
4089
- Downloading kiwisolver
4090
- Downloading pygments
4091
- Downloading hf-xet
4092
- Downloading tokenizers
4093
- Downloading networkx
4094
- Downloading fonttools
4095
- Downloading pillow
4096
- Downloading matplotlib
4097
- Downloading nvidia-cuda-cupti-cu12
4098
- Downloading numpy
4099
- Downloading sympy
4100
- Built transformers @ git+https://github.com/huggingface/transformers.git@7258ea44bc0c0a425a468f66f8559d1de8c4126d
4101
- Downloading nvidia-nvjitlink-cu12
4102
- Downloading jedi
4103
- Downloading nvidia-curand-cu12
4104
- Downloading nvidia-cuda-nvrtc-cu12
4105
- Downloading triton
4106
- Downloading nvidia-cufft-cu12
4107
- Downloading nvidia-cusolver-cu12
4108
- Downloading nvidia-cusparselt-cu12
4109
- Downloading nvidia-cusparse-cu12
4110
- Downloading nvidia-nccl-cu12
4111
- Downloading nvidia-cublas-cu12
4112
- Downloading nvidia-cudnn-cu12
4113
- Downloading torch
4114
- Installed 69 packages in 509ms
4115
- </div>
4116
  </div>
4117
- <div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00&lt;?, ?it/s]
4118
- Fetching 3 files: 33%|███▎ | 1/3 [00:06&lt;00:12, 6.49s/it]
4119
- Fetching 3 files: 67%|██████▋ | 2/3 [00:07&lt;00:03, 3.44s/it]
4120
- Fetching 3 files: 100%|██████████| 3/3 [00:07&lt;00:00, 2.60s/it]
4121
- You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
4122
-
4123
- Loading checkpoint shards: 0%| | 0/3 [00:00&lt;?, ?it/s]
4124
- Loading checkpoint shards: 33%|███▎ | 1/3 [00:02&lt;00:04, 2.35s/it]
4125
- Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04&lt;00:02, 2.25s/it]
4126
- Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.80s/it]
4127
- Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.93s/it]
4128
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4129
-
4130
- Fetching 66 files: 0%| | 0/66 [00:00&lt;?, ?it/s]
4131
- Fetching 66 files: 2%|▏ | 1/66 [00:00&lt;00:10, 6.31it/s]
4132
- Fetching 66 files: 14%|█▎ | 9/66 [00:00&lt;00:02, 26.39it/s]
4133
- Fetching 66 files: 26%|██▌ | 17/66 [00:01&lt;00:03, 12.42it/s]
4134
- Fetching 66 files: 74%|███████▍ | 49/66 [00:01&lt;00:00, 45.00it/s]
4135
- Fetching 66 files: 91%|█████████ | 60/66 [00:01&lt;00:00, 45.67it/s]
4136
- Fetching 66 files: 100%|██████████| 66/66 [00:01&lt;00:00, 34.31it/s]
4137
- /tmp/uvnote-run-_uergc47/home/.cache/uv/environments-v2/setup2-adf2810b697d7b08/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4138
- No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4139
- warnings.warn(
4140
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4141
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4142
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4143
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4144
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4145
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4146
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4147
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4148
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4149
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4150
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4151
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4152
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4153
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4154
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4155
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4156
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4157
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4158
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4159
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4160
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4161
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4162
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4163
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4164
- /tmp/uvnote-run-_uergc47/home/.cache/uv/environments-v2/setup2-adf2810b697d7b08/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4165
- No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4166
- warnings.warn(
4167
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4168
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4169
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4170
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4171
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4172
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4173
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4174
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4175
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4176
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4177
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4178
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4179
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4180
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4181
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4182
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4183
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4184
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4185
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4186
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4187
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4188
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4189
- INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`</div>
4190
  </div>
4191
  </div>
4192
 
4193
  <h2>Yamoe Kernel</h2>
4194
- <div class="cell" id="cell-setup">
4195
- <div class="cell-header">
4196
- <span class="collapse-indicators">
4197
- <span onclick="toggleCode('setup')" style="cursor: pointer;">▼ code</span>
4198
- <span onclick="toggleOutput('setup')" style="cursor: pointer;">▼ output</span>
4199
- <span id="uv-indicator-setup" onclick="toggleUvLogsFromHeader('setup')" style="cursor: pointer;">▶ uv-logs</span>
4200
- </span> |
4201
- Cell: setup | 109.23s
4202
- | <button class="run-btn" onclick="runCell('setup')">▶ run</button>
4203
- <button class="copy-btn" onclick="copyCell('setup')">Copy</button>
4204
- <a href="cells/setup.py" target="_blank" class="raw-btn">Raw</a>
4205
- </div>
4206
- <div id="code-setup" class="cell-code" data-lines="116">
4207
- <div class="highlight-with-lines">
4208
- <div class="line-numbers" id="lines-setup">
4209
- <a class="line-number" data-cell="setup" data-line="1" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 1, true);">1</a>
4210
- <a class="line-number" data-cell="setup" data-line="2" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 2, true);">2</a>
4211
- <a class="line-number" data-cell="setup" data-line="3" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 3, true);">3</a>
4212
- <a class="line-number" data-cell="setup" data-line="4" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 4, true);">4</a>
4213
- <a class="line-number" data-cell="setup" data-line="5" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 5, true);">5</a>
4214
- <a class="line-number" data-cell="setup" data-line="6" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 6, true);">6</a>
4215
- <a class="line-number" data-cell="setup" data-line="7" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 7, true);">7</a>
4216
- <a class="line-number" data-cell="setup" data-line="8" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 8, true);">8</a>
4217
- <a class="line-number" data-cell="setup" data-line="9" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 9, true);">9</a>
4218
- <a class="line-number" data-cell="setup" data-line="10" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 10, true);">10</a>
4219
- <a class="line-number" data-cell="setup" data-line="11" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 11, true);">11</a>
4220
- <a class="line-number" data-cell="setup" data-line="12" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 12, true);">12</a>
4221
- <a class="line-number" data-cell="setup" data-line="13" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 13, true);">13</a>
4222
- <a class="line-number" data-cell="setup" data-line="14" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 14, true);">14</a>
4223
- <a class="line-number" data-cell="setup" data-line="15" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 15, true);">15</a>
4224
- <a class="line-number" data-cell="setup" data-line="16" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 16, true);">16</a>
4225
- <a class="line-number" data-cell="setup" data-line="17" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 17, true);">17</a>
4226
- <a class="line-number" data-cell="setup" data-line="18" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 18, true);">18</a>
4227
- <a class="line-number" data-cell="setup" data-line="19" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 19, true);">19</a>
4228
- <a class="line-number" data-cell="setup" data-line="20" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 20, true);">20</a>
4229
- <a class="line-number" data-cell="setup" data-line="21" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 21, true);">21</a>
4230
- <a class="line-number" data-cell="setup" data-line="22" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 22, true);">22</a>
4231
- <a class="line-number" data-cell="setup" data-line="23" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 23, true);">23</a>
4232
- <a class="line-number" data-cell="setup" data-line="24" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 24, true);">24</a>
4233
- <a class="line-number" data-cell="setup" data-line="25" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 25, true);">25</a>
4234
- <a class="line-number" data-cell="setup" data-line="26" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 26, true);">26</a>
4235
- <a class="line-number" data-cell="setup" data-line="27" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 27, true);">27</a>
4236
- <a class="line-number" data-cell="setup" data-line="28" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 28, true);">28</a>
4237
- <a class="line-number" data-cell="setup" data-line="29" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 29, true);">29</a>
4238
- <a class="line-number" data-cell="setup" data-line="30" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 30, true);">30</a>
4239
- <a class="line-number" data-cell="setup" data-line="31" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 31, true);">31</a>
4240
- <a class="line-number" data-cell="setup" data-line="32" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 32, true);">32</a>
4241
- <a class="line-number" data-cell="setup" data-line="33" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 33, true);">33</a>
4242
- <a class="line-number" data-cell="setup" data-line="34" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 34, true);">34</a>
4243
- <a class="line-number" data-cell="setup" data-line="35" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 35, true);">35</a>
4244
- <a class="line-number" data-cell="setup" data-line="36" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 36, true);">36</a>
4245
- <a class="line-number" data-cell="setup" data-line="37" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 37, true);">37</a>
4246
- <a class="line-number" data-cell="setup" data-line="38" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 38, true);">38</a>
4247
- <a class="line-number" data-cell="setup" data-line="39" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 39, true);">39</a>
4248
- <a class="line-number" data-cell="setup" data-line="40" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 40, true);">40</a>
4249
- <a class="line-number" data-cell="setup" data-line="41" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 41, true);">41</a>
4250
- <a class="line-number" data-cell="setup" data-line="42" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 42, true);">42</a>
4251
- <a class="line-number" data-cell="setup" data-line="43" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 43, true);">43</a>
4252
- <a class="line-number" data-cell="setup" data-line="44" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 44, true);">44</a>
4253
- <a class="line-number" data-cell="setup" data-line="45" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 45, true);">45</a>
4254
- <a class="line-number" data-cell="setup" data-line="46" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 46, true);">46</a>
4255
- <a class="line-number" data-cell="setup" data-line="47" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 47, true);">47</a>
4256
- <a class="line-number" data-cell="setup" data-line="48" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 48, true);">48</a>
4257
- <a class="line-number" data-cell="setup" data-line="49" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 49, true);">49</a>
4258
- <a class="line-number" data-cell="setup" data-line="50" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 50, true);">50</a>
4259
- <a class="line-number" data-cell="setup" data-line="51" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 51, true);">51</a>
4260
- <a class="line-number" data-cell="setup" data-line="52" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 52, true);">52</a>
4261
- <a class="line-number" data-cell="setup" data-line="53" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 53, true);">53</a>
4262
- <a class="line-number" data-cell="setup" data-line="54" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 54, true);">54</a>
4263
- <a class="line-number" data-cell="setup" data-line="55" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 55, true);">55</a>
4264
- <a class="line-number" data-cell="setup" data-line="56" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 56, true);">56</a>
4265
- <a class="line-number" data-cell="setup" data-line="57" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 57, true);">57</a>
4266
- <a class="line-number" data-cell="setup" data-line="58" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 58, true);">58</a>
4267
- <a class="line-number" data-cell="setup" data-line="59" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 59, true);">59</a>
4268
- <a class="line-number" data-cell="setup" data-line="60" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 60, true);">60</a>
4269
- <a class="line-number" data-cell="setup" data-line="61" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 61, true);">61</a>
4270
- <a class="line-number" data-cell="setup" data-line="62" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 62, true);">62</a>
4271
- <a class="line-number" data-cell="setup" data-line="63" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 63, true);">63</a>
4272
- <a class="line-number" data-cell="setup" data-line="64" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 64, true);">64</a>
4273
- <a class="line-number" data-cell="setup" data-line="65" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 65, true);">65</a>
4274
- <a class="line-number" data-cell="setup" data-line="66" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 66, true);">66</a>
4275
- <a class="line-number" data-cell="setup" data-line="67" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 67, true);">67</a>
4276
- <a class="line-number" data-cell="setup" data-line="68" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 68, true);">68</a>
4277
- <a class="line-number" data-cell="setup" data-line="69" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 69, true);">69</a>
4278
- <a class="line-number" data-cell="setup" data-line="70" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 70, true);">70</a>
4279
- <a class="line-number" data-cell="setup" data-line="71" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 71, true);">71</a>
4280
- <a class="line-number" data-cell="setup" data-line="72" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 72, true);">72</a>
4281
- <a class="line-number" data-cell="setup" data-line="73" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 73, true);">73</a>
4282
- <a class="line-number" data-cell="setup" data-line="74" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 74, true);">74</a>
4283
- <a class="line-number" data-cell="setup" data-line="75" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 75, true);">75</a>
4284
- <a class="line-number" data-cell="setup" data-line="76" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 76, true);">76</a>
4285
- <a class="line-number" data-cell="setup" data-line="77" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 77, true);">77</a>
4286
- <a class="line-number" data-cell="setup" data-line="78" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 78, true);">78</a>
4287
- <a class="line-number" data-cell="setup" data-line="79" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 79, true);">79</a>
4288
- <a class="line-number" data-cell="setup" data-line="80" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 80, true);">80</a>
4289
- <a class="line-number" data-cell="setup" data-line="81" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 81, true);">81</a>
4290
- <a class="line-number" data-cell="setup" data-line="82" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 82, true);">82</a>
4291
- <a class="line-number" data-cell="setup" data-line="83" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 83, true);">83</a>
4292
- <a class="line-number" data-cell="setup" data-line="84" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 84, true);">84</a>
4293
- <a class="line-number" data-cell="setup" data-line="85" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 85, true);">85</a>
4294
- <a class="line-number" data-cell="setup" data-line="86" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 86, true);">86</a>
4295
- <a class="line-number" data-cell="setup" data-line="87" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 87, true);">87</a>
4296
- <a class="line-number" data-cell="setup" data-line="88" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 88, true);">88</a>
4297
- <a class="line-number" data-cell="setup" data-line="89" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 89, true);">89</a>
4298
- <a class="line-number" data-cell="setup" data-line="90" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 90, true);">90</a>
4299
- <a class="line-number" data-cell="setup" data-line="91" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 91, true);">91</a>
4300
- <a class="line-number" data-cell="setup" data-line="92" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 92, true);">92</a>
4301
- <a class="line-number" data-cell="setup" data-line="93" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 93, true);">93</a>
4302
- <a class="line-number" data-cell="setup" data-line="94" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 94, true);">94</a>
4303
- <a class="line-number" data-cell="setup" data-line="95" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 95, true);">95</a>
4304
- <a class="line-number" data-cell="setup" data-line="96" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 96, true);">96</a>
4305
- <a class="line-number" data-cell="setup" data-line="97" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 97, true);">97</a>
4306
- <a class="line-number" data-cell="setup" data-line="98" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 98, true);">98</a>
4307
- <a class="line-number" data-cell="setup" data-line="99" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 99, true);">99</a>
4308
- <a class="line-number" data-cell="setup" data-line="100" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 100, true);">100</a>
4309
- <a class="line-number" data-cell="setup" data-line="101" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 101, true);">101</a>
4310
- <a class="line-number" data-cell="setup" data-line="102" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 102, true);">102</a>
4311
- <a class="line-number" data-cell="setup" data-line="103" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 103, true);">103</a>
4312
- <a class="line-number" data-cell="setup" data-line="104" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 104, true);">104</a>
4313
- <a class="line-number" data-cell="setup" data-line="105" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 105, true);">105</a>
4314
- <a class="line-number" data-cell="setup" data-line="106" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 106, true);">106</a>
4315
- <a class="line-number" data-cell="setup" data-line="107" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 107, true);">107</a>
4316
- <a class="line-number" data-cell="setup" data-line="108" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 108, true);">108</a>
4317
- <a class="line-number" data-cell="setup" data-line="109" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 109, true);">109</a>
4318
- <a class="line-number" data-cell="setup" data-line="110" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 110, true);">110</a>
4319
- <a class="line-number" data-cell="setup" data-line="111" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 111, true);">111</a>
4320
- <a class="line-number" data-cell="setup" data-line="112" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 112, true);">112</a>
4321
- <a class="line-number" data-cell="setup" data-line="113" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 113, true);">113</a>
4322
- <a class="line-number" data-cell="setup" data-line="114" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 114, true);">114</a>
4323
- <a class="line-number" data-cell="setup" data-line="115" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 115, true);">115</a>
4324
- <a class="line-number" data-cell="setup" data-line="116" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 116, true);">116</a>
4325
- </div>
4326
- <div class="code-wrap">
4327
- <div class="highlight"><pre><span></span><span class="c1"># /// script</span>
4328
- <span class="c1"># requires-python = &quot;&gt;=3.12&quot;</span>
4329
- <span class="c1"># dependencies = [</span>
4330
- <span class="c1"># &quot;accelerate&gt;=1.10.1&quot;,</span>
4331
- <span class="c1"># &quot;torch&gt;=2.7.0&quot;,</span>
4332
- <span class="c1"># &quot;kernels==0.10.0&quot;,</span>
4333
- <span class="c1"># &quot;transformers@https://github.com/huggingface/transformers.git&quot;,</span>
4334
- <span class="c1"># &quot;ipdb&gt;=0.13.13&quot;,</span>
4335
- <span class="c1"># &quot;matplotlib&gt;=3.7.2&quot;,</span>
4336
- <span class="c1"># &quot;numpy&gt;=1.24.3&quot;,</span>
4337
- <span class="c1"># ]</span>
4338
- <span class="c1"># ///</span>
4339
-
4340
- <span class="kn">import</span><span class="w"> </span><span class="nn">torch</span>
4341
- <span class="kn">from</span><span class="w"> </span><span class="nn">transformers</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssForCausalLM</span><span class="p">,</span> <span class="n">PreTrainedTokenizerFast</span><span class="p">,</span> <span class="n">Mxfp4Config</span>
4342
- <span class="kn">import</span><span class="w"> </span><span class="nn">time</span>
4343
- <span class="kn">import</span><span class="w"> </span><span class="nn">torch.nn</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">nn</span>
4344
- <span class="kn">from</span><span class="w"> </span><span class="nn">kernels</span><span class="w"> </span><span class="kn">import</span> <span class="n">register_kernel_mapping</span><span class="p">,</span> <span class="n">Mode</span><span class="p">,</span> <span class="n">LayerRepository</span>
4345
- <span class="kn">import</span><span class="w"> </span><span class="nn">sys</span>
4346
- <span class="kn">import</span><span class="w"> </span><span class="nn">torch.profiler</span>
4347
- <span class="kn">import</span><span class="w"> </span><span class="nn">gc</span>
4348
- <span class="kn">import</span><span class="w"> </span><span class="nn">logging</span>
4349
-
4350
- <span class="c1"># set to debug logging</span>
4351
- <span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
4352
-
4353
- <span class="k">def</span><span class="w"> </span><span class="nf">reset_peak_memory_stats</span><span class="p">():</span>
4354
- <span class="w"> </span><span class="sd">&quot;&quot;&quot;Clear CUDA cache and reset memory allocation counters.&quot;&quot;&quot;</span>
4355
- <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">empty_cache</span><span class="p">()</span>
4356
- <span class="k">if</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">is_available</span><span class="p">():</span>
4357
- <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">reset_peak_memory_stats</span><span class="p">()</span>
4358
- <span class="n">gc</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
4359
-
4360
- <span class="k">def</span><span class="w"> </span><span class="nf">get_memory_stats</span><span class="p">():</span>
4361
- <span class="w"> </span><span class="sd">&quot;&quot;&quot;Get current and peak CUDA memory usage.&quot;&quot;&quot;</span>
4362
- <span class="k">if</span> <span class="ow">not</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">is_available</span><span class="p">():</span>
4363
- <span class="k">return</span> <span class="p">{</span><span class="s2">&quot;allocated_gb&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&quot;peak_gb&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&quot;reserved_gb&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">}</span>
4364
- <span class="k">return</span> <span class="p">{</span>
4365
- <span class="s2">&quot;allocated_gb&quot;</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">memory_allocated</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
4366
- <span class="s2">&quot;peak_gb&quot;</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">max_memory_allocated</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
4367
- <span class="s2">&quot;reserved_gb&quot;</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">memory_reserved</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
4368
- <span class="p">}</span>
4369
-
4370
- <span class="k">def</span><span class="w"> </span><span class="nf">override_kernel_layer_name</span><span class="p">(</span><span class="n">cls_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">bool</span><span class="p">:</span>
4371
- <span class="w"> </span><span class="sd">&quot;&quot;&quot;Helper to dynamically override the kernel_layer_name in a model class.&quot;&quot;&quot;</span>
4372
- <span class="k">for</span> <span class="n">mod</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">modules</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
4373
- <span class="k">if</span> <span class="n">mod</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
4374
- <span class="k">continue</span>
4375
- <span class="n">obj</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">cls_name</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
4376
- <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="nb">type</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">issubclass</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
4377
- <span class="nb">setattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="s2">&quot;kernel_layer_name&quot;</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
4378
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Overrode </span><span class="si">{</span><span class="n">cls_name</span><span class="si">}</span><span class="s2">.kernel_layer_name to </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4379
- <span class="k">return</span> <span class="kc">True</span>
4380
- <span class="k">return</span> <span class="kc">False</span>
4381
-
4382
-
4383
- <span class="c1"># Init the model the normal way</span>
4384
- <span class="n">model_id</span> <span class="o">=</span> <span class="s2">&quot;openai/gpt-oss-20b&quot;</span>
4385
- <span class="n">tokenizer</span> <span class="o">=</span> <span class="n">PreTrainedTokenizerFast</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">model_id</span><span class="p">)</span>
4386
- <span class="n">quantization_config</span> <span class="o">=</span> <span class="n">Mxfp4Config</span><span class="p">(</span><span class="n">dequantize</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
4387
-
4388
-
4389
- <span class="kn">from</span><span class="w"> </span><span class="nn">kernels</span><span class="w"> </span><span class="kn">import</span> <span class="n">replace_kernel_forward_from_hub</span><span class="p">,</span> <span class="n">register_kernel_mapping</span><span class="p">,</span> <span class="n">LayerRepository</span><span class="p">,</span> <span class="n">Mode</span>
4390
-
4391
- <span class="kn">from</span><span class="w"> </span><span class="nn">transformers.models.gpt_oss.modeling_gpt_oss</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssMLP</span><span class="p">,</span> <span class="n">GptOssRMSNorm</span>
4392
-
4393
- <span class="n">replace_kernel_forward_from_hub</span><span class="p">(</span><span class="n">GptOssMLP</span><span class="p">,</span> <span class="s2">&quot;Yamoe&quot;</span><span class="p">)</span>
4394
- <span class="n">replace_kernel_forward_from_hub</span><span class="p">(</span><span class="n">GptOssRMSNorm</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
4395
- <span class="n">custom_mapping</span> <span class="o">=</span> <span class="p">{</span>
4396
- <span class="s2">&quot;Yamoe&quot;</span><span class="p">:</span> <span class="p">{</span>
4397
- <span class="s2">&quot;cuda&quot;</span><span class="p">:</span> <span class="p">{</span>
4398
- <span class="n">Mode</span><span class="o">.</span><span class="n">INFERENCE</span><span class="p">:</span> <span class="n">LayerRepository</span><span class="p">(</span>
4399
- <span class="n">repo_id</span><span class="o">=</span><span class="s2">&quot;drbh/yamoe&quot;</span><span class="p">,</span>
4400
- <span class="n">layer_name</span><span class="o">=</span><span class="s2">&quot;Yamoe&quot;</span><span class="p">,</span>
4401
- <span class="n">revision</span><span class="o">=</span><span class="s2">&quot;v0.3.0&quot;</span><span class="p">,</span>
4402
- <span class="p">)</span>
4403
- <span class="p">}</span>
4404
- <span class="p">}</span>
4405
- <span class="p">}</span>
4406
- <span class="n">register_kernel_mapping</span><span class="p">(</span><span class="n">custom_mapping</span><span class="p">)</span>
4407
-
4408
-
4409
- <span class="n">model</span> <span class="o">=</span> <span class="n">GptOssForCausalLM</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
4410
- <span class="n">model_id</span><span class="p">,</span>
4411
- <span class="n">dtype</span><span class="o">=</span><span class="s2">&quot;bfloat16&quot;</span><span class="p">,</span>
4412
- <span class="n">device_map</span><span class="o">=</span><span class="s2">&quot;auto&quot;</span><span class="p">,</span>
4413
- <span class="n">use_kernels</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
4414
- <span class="n">quantization_config</span><span class="o">=</span><span class="n">quantization_config</span><span class="p">,</span>
4415
- <span class="p">)</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
4416
-
4417
- <span class="n">messages</span> <span class="o">=</span> <span class="p">[</span>
4418
- <span class="p">{</span><span class="s2">&quot;role&quot;</span><span class="p">:</span> <span class="s2">&quot;system&quot;</span><span class="p">,</span> <span class="s2">&quot;content&quot;</span><span class="p">:</span> <span class="s2">&quot;What is Tensor Parallelism?&quot;</span><span class="p">},</span>
4419
- <span class="p">]</span>
4420
-
4421
- <span class="n">inputs</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">apply_chat_template</span><span class="p">(</span>
4422
- <span class="n">messages</span><span class="p">,</span>
4423
- <span class="n">add_generation_prompt</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
4424
- <span class="n">return_tensors</span><span class="o">=</span><span class="s2">&quot;pt&quot;</span><span class="p">,</span>
4425
- <span class="n">return_dict</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
4426
- <span class="n">reasoning_effort</span><span class="o">=</span><span class="s2">&quot;low&quot;</span><span class="p">,</span>
4427
- <span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s2">&quot;cuda&quot;</span><span class="p">)</span>
4428
-
4429
- <span class="n">max_tokens</span> <span class="o">=</span> <span class="mi">256</span>
4430
-
4431
- <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">inference_mode</span><span class="p">():</span>
4432
- <span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
4433
- <span class="n">generated</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span>
4434
- <span class="o">**</span><span class="n">inputs</span><span class="p">,</span>
4435
- <span class="n">max_new_tokens</span><span class="o">=</span><span class="n">max_tokens</span><span class="p">,</span>
4436
- <span class="n">do_sample</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
4437
- <span class="n">temperature</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
4438
- <span class="p">)</span>
4439
- <span class="n">end_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
4440
-
4441
- <span class="nb">print</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">generated</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">skip_special_tokens</span><span class="o">=</span><span class="kc">False</span><span class="p">))</span>
4442
- <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Generation took </span><span class="si">{</span><span class="n">end_time</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">start_time</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2"> seconds&quot;</span><span class="p">)</span>
4443
- </pre></div>
4444
-
4445
- <div class="code-line-highlight" id="line-highlight-setup"></div>
4446
- </div>
4447
- </div>
4448
- </div>
4449
- <div id="output-setup" class="cell-output">
4450
- <div class="cell-stdout">&lt;|start|&gt;system&lt;|message|&gt;You are ChatGPT, a large language model trained by OpenAI.
4451
- Knowledge cutoff: 2024-06
4452
- Current date: 2025-09-24
4453
-
4454
- Reasoning: low
4455
-
4456
- # Valid channels: analysis, commentary, final. Channel must be included for every message.&lt;|end|&gt;&lt;|start|&gt;developer&lt;|message|&gt;# Instructions
4457
-
4458
- What is Tensor Parallelism?
4459
-
4460
- &lt;|end|&gt;&lt;|start|&gt;assistant&lt;|channel|&gt;analysis&lt;|message|&gt;We need to explain what Tensor Parallelism is. It&#x27;s a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices. Provide details: how it works, benefits, challenges, typical use cases, differences from data parallelism, pipeline parallelism, model parallelism. Provide example: splitting a fully connected layer&#x27;s weight matrix across GPUs. Provide mention of frameworks: Megatron-LM, DeepSpeed, etc. Provide explanation of how forward/backward passes are computed. Provide mention of communication overhead, scaling, etc. Provide mention of &quot;tensor parallelism&quot; as part of &quot;model parallelism&quot; but specifically splitting tensors. Provide mention of &quot;tensor parallelism&quot; in context of transformer layers: splitting attention heads, feed-forward layers. Provide mention of &quot;tensor parallelism&quot; in context of &quot;DeepSpeed ZeRO Stage 3&quot; or &quot;Megatron-LM&#x27;s tensor parallelism&quot;. Provide mention of &quot;tensor parallelism&quot; as &quot;model parallelism across the weight matrices&quot; and &quot;tensor parallelism&quot; vs &quot;pipeline parallelism&quot;. Provide mention of &quot;tensor parallelism&quot; as &quot;splitting the weight matrix across GPUs, each GPU holds a slice of the matrix, and the input is broadcasted,
4461
- Generation took 26.26 seconds
4462
- </div>
4463
- <div class="uv-install-logs" id="uv-logs-setup">
4464
- <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4465
- <div class="uv-logs-content" style="display: none;">
4466
- Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
4467
- Downloading cpython-3.13.7-linux-x86_64-gnu (download)
4468
- Updating https://github.com/huggingface/transformers.git (HEAD)
4469
- Updated https://github.com/huggingface/transformers.git (7258ea44bc0c0a425a468f66f8559d1de8c4126d)
4470
- Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4471
- Downloading pillow (6.3MiB)
4472
- Building transformers @ git+https://github.com/huggingface/transformers.git@7258ea44bc0c0a425a468f66f8559d1de8c4126d
4473
- Downloading nvidia-cublas-cu12 (566.8MiB)
4474
- Downloading nvidia-nccl-cu12 (307.4MiB)
4475
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
4476
- Downloading numpy (15.9MiB)
4477
- Downloading hf-xet (3.0MiB)
4478
- Downloading nvidia-curand-cu12 (60.7MiB)
4479
- Downloading nvidia-cufft-cu12 (184.2MiB)
4480
- Downloading nvidia-cusolver-cu12 (255.1MiB)
4481
- Downloading nvidia-cudnn-cu12 (674.0MiB)
4482
- Downloading nvidia-cufile-cu12 (1.1MiB)
4483
- Downloading pygments (1.2MiB)
4484
- Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4485
- Downloading jedi (1.5MiB)
4486
- Downloading sympy (6.0MiB)
4487
- Downloading kiwisolver (1.4MiB)
4488
- Downloading matplotlib (8.3MiB)
4489
- Downloading nvidia-cusparse-cu12 (274.9MiB)
4490
- Downloading networkx (1.9MiB)
4491
- Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4492
- Downloading tokenizers (3.1MiB)
4493
- Downloading fonttools (4.7MiB)
4494
- Downloading triton (148.4MiB)
4495
- Downloading torch (846.8MiB)
4496
- Downloading nvidia-cufile-cu12
4497
- Downloading kiwisolver
4498
- Downloading pygments
4499
- Downloading hf-xet
4500
- Downloading tokenizers
4501
- Downloading networkx
4502
- Downloading fonttools
4503
- Downloading pillow
4504
- Downloading matplotlib
4505
- Downloading nvidia-cuda-cupti-cu12
4506
- Downloading numpy
4507
- Downloading sympy
4508
- Built transformers @ git+https://github.com/huggingface/transformers.git@7258ea44bc0c0a425a468f66f8559d1de8c4126d
4509
- Downloading nvidia-nvjitlink-cu12
4510
- Downloading jedi
4511
- Downloading nvidia-curand-cu12
4512
- Downloading nvidia-cuda-nvrtc-cu12
4513
- Downloading triton
4514
- Downloading nvidia-cufft-cu12
4515
- Downloading nvidia-cusolver-cu12
4516
- Downloading nvidia-cusparselt-cu12
4517
- Downloading nvidia-cusparse-cu12
4518
- Downloading nvidia-nccl-cu12
4519
- Downloading nvidia-cublas-cu12
4520
- Downloading nvidia-cudnn-cu12
4521
- Downloading torch
4522
- Installed 69 packages in 464ms
4523
- </div>
4524
- </div>
4525
- <div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00&lt;?, ?it/s]
4526
- Fetching 3 files: 33%|███▎ | 1/3 [00:07&lt;00:14, 7.38s/it]
4527
- Fetching 3 files: 67%|██████▋ | 2/3 [00:08&lt;00:03, 3.64s/it]
4528
- Fetching 3 files: 100%|██████████| 3/3 [00:08&lt;00:00, 2.80s/it]
4529
- You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
4530
-
4531
- Loading checkpoint shards: 0%| | 0/3 [00:00&lt;?, ?it/s]
4532
- Loading checkpoint shards: 33%|███▎ | 1/3 [00:02&lt;00:04, 2.34s/it]
4533
- Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04&lt;00:02, 2.25s/it]
4534
- Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.80s/it]
4535
- Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.93s/it]
4536
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4537
-
4538
- Fetching 6 files: 0%| | 0/6 [00:00&lt;?, ?it/s]
4539
- Fetching 6 files: 17%|█▋ | 1/6 [00:00&lt;00:00, 5.44it/s]
4540
- Fetching 6 files: 50%|█████ | 3/6 [00:00&lt;00:00, 6.96it/s]
4541
- Fetching 6 files: 100%|██████████| 6/6 [00:00&lt;00:00, 13.54it/s]
4542
- /tmp/uvnote-run-jc1wbhvj/home/.cache/uv/environments-v2/setup-1400c3ff0fc01263/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4543
- No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4544
- warnings.warn(
4545
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4546
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4547
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4548
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4549
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4550
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4551
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4552
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4553
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4554
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4555
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4556
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4557
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4558
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4559
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4560
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4561
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4562
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4563
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4564
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4565
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4566
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4567
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4568
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4569
- /tmp/uvnote-run-jc1wbhvj/home/.cache/uv/environments-v2/setup-1400c3ff0fc01263/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4570
- No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4571
- warnings.warn(
4572
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4573
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4574
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4575
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4576
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4577
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4578
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4579
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4580
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4581
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4582
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4583
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4584
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4585
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4586
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4587
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4588
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4589
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4590
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4591
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4592
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4593
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4594
- INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`</div>
4595
- </div>
4596
- </div>
4597
  </div>
4598
 
4599
  </body>
 
3715
  </div>
3716
 
3717
  <div class="main-content">
3718
+ <h1>Comparison of Megablocks and Yamoe Kernels</h1>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3719
  <p>This note compares the performance of the Megablocks and Yamoe kernels on the GPT-OSS-20B model.</p>
3720
  <h2>Megablocks kernel</h2>
3721
+ <div class="cell cell-failed" id="cell-setup2">
3722
  <div class="cell-header">
3723
  <span class="collapse-indicators">
3724
  <span onclick="toggleCode('setup2')" style="cursor: pointer;">▼ code</span>
3725
  <span onclick="toggleOutput('setup2')" style="cursor: pointer;">▼ output</span>
3726
+ <span id="uv-indicator-setup2" style="cursor: default; opacity: 0.3;">▶ uv-logs</span>
3727
  </span> |
3728
+ Cell: setup2 | 18.93s | FAILED
3729
  | <button class="run-btn" onclick="runCell('setup2')">▶ run</button>
3730
  <button class="copy-btn" onclick="copyCell('setup2')">Copy</button>
3731
  <a href="cells/setup2.py" target="_blank" class="raw-btn">Raw</a>
 
3972
  </div>
3973
  </div>
3974
  <div id="output-setup2" class="cell-output">
3975
+ <div class="cell-stderr">Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3976
  Downloading cpython-3.13.7-linux-x86_64-gnu (download)
3977
  Updating https://github.com/huggingface/transformers.git (HEAD)
3978
+ Updated https://github.com/huggingface/transformers.git (e691f84412563b6abca098f3e044980725d8daa3)
3979
+ × No solution found when resolving script dependencies:
3980
+ ╰─▶ Because only transformers==4.57.0.dev0 is available and
3981
+ transformers==4.57.0.dev0 depends on huggingface-hub==1.0.0rc1,
3982
+ we can conclude that all versions of transformers depend on
3983
+ huggingface-hub==1.0.0rc1.
3984
+ And because kernels==0.10.0 depends on huggingface-hub&gt;=0.26.0,&lt;1.0,
3985
+ we can conclude that kernels==0.10.0 and all versions of transformers
3986
+ are incompatible.
3987
+ And because you require kernels==0.10.0 and transformers, we can
3988
+ conclude that your requirements are unsatisfiable.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3989
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3990
  </div>
3991
  </div>
3992
 
3993
  <h2>Yamoe Kernel</h2>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3994
  </div>
3995
 
3996
  </body>
megablocks_yamoe/torch_profile.html CHANGED
@@ -3720,7 +3720,7 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
3720
  <span onclick="toggleOutput('utils')" style="cursor: pointer;">▼ output</span>
3721
  <span id="uv-indicator-utils" onclick="toggleUvLogsFromHeader('utils')" style="cursor: pointer;">▶ uv-logs</span>
3722
  </span> |
3723
- Cell: utils | deps: torch, numpy | 35.29s
3724
  | <button class="run-btn" onclick="runCell('utils')">▶ run</button>
3725
  <button class="copy-btn" onclick="copyCell('utils')">Copy</button>
3726
  <a href="cells/utils.py" target="_blank" class="raw-btn">Raw</a>
@@ -3794,24 +3794,24 @@ Cell: utils | deps: torch, numpy | 35.29s
3794
  <div class="uv-install-logs" id="uv-logs-utils">
3795
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
3796
  <div class="uv-logs-content" style="display: none;">
3797
- Downloading networkx (1.9MiB)
3798
- Downloading setuptools (1.1MiB)
3799
- Downloading numpy (16.2MiB)
3800
- Downloading sympy (6.0MiB)
3801
- Downloading nvidia-curand-cu12 (60.7MiB)
3802
- Downloading nvidia-nvjitlink-cu12 (37.4MiB)
3803
  Downloading nvidia-cusparse-cu12 (274.9MiB)
3804
- Downloading nvidia-cusolver-cu12 (255.1MiB)
3805
- Downloading nvidia-cudnn-cu12 (674.0MiB)
3806
- Downloading nvidia-nccl-cu12 (307.4MiB)
3807
  Downloading nvidia-cufile-cu12 (1.1MiB)
3808
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
3809
- Downloading nvidia-cublas-cu12 (566.8MiB)
3810
- Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
3811
  Downloading nvidia-cufft-cu12 (184.2MiB)
3812
- Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
3813
- Downloading torch (846.9MiB)
 
3814
  Downloading triton (148.3MiB)
 
 
 
 
 
 
3815
  Downloading nvidia-cufile-cu12
3816
  Downloading setuptools
3817
  Downloading networkx
@@ -3824,13 +3824,13 @@ Downloading triton (148.3MiB)
3824
  Downloading triton
3825
  Downloading nvidia-cufft-cu12
3826
  Downloading nvidia-cusolver-cu12
3827
- Downloading nvidia-cusparse-cu12
3828
  Downloading nvidia-cusparselt-cu12
 
3829
  Downloading nvidia-nccl-cu12
3830
  Downloading nvidia-cublas-cu12
3831
  Downloading nvidia-cudnn-cu12
3832
  Downloading torch
3833
- Installed 26 packages in 455ms
3834
  </div>
3835
  </div>
3836
  </div>
@@ -3843,7 +3843,7 @@ Installed 26 packages in 455ms
3843
  <span onclick="toggleOutput('bench_utils')" style="cursor: pointer;">▼ output</span>
3844
  <span id="uv-indicator-bench_utils" onclick="toggleUvLogsFromHeader('bench_utils')" style="cursor: pointer;">▶ uv-logs</span>
3845
  </span> |
3846
- Cell: bench_utils | deps: torch, numpy | 34.44s
3847
  | <button class="run-btn" onclick="runCell('bench_utils')">▶ run</button>
3848
  <button class="copy-btn" onclick="copyCell('bench_utils')">Copy</button>
3849
  <a href="cells/bench_utils.py" target="_blank" class="raw-btn">Raw</a>
@@ -4331,24 +4331,24 @@ Cell: bench_utils | deps: torch, numpy | 34.44s
4331
  <div class="uv-install-logs" id="uv-logs-bench_utils">
4332
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4333
  <div class="uv-logs-content" style="display: none;">
4334
- Downloading setuptools (1.1MiB)
 
 
 
 
4335
  Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4336
- Downloading nvidia-cublas-cu12 (566.8MiB)
4337
- Downloading nvidia-cudnn-cu12 (674.0MiB)
4338
  Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4339
- Downloading nvidia-cufile-cu12 (1.1MiB)
4340
- Downloading sympy (6.0MiB)
4341
  Downloading nvidia-cusparse-cu12 (274.9MiB)
4342
  Downloading nvidia-cusparselt-cu12 (273.9MiB)
4343
- Downloading triton (148.3MiB)
4344
- Downloading nvidia-curand-cu12 (60.7MiB)
4345
- Downloading torch (846.9MiB)
4346
- Downloading networkx (1.9MiB)
4347
  Downloading nvidia-cusolver-cu12 (255.1MiB)
4348
  Downloading nvidia-cufft-cu12 (184.2MiB)
4349
  Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4350
- Downloading numpy (16.2MiB)
4351
- Downloading nvidia-nccl-cu12 (307.4MiB)
 
 
4352
  Downloading nvidia-cufile-cu12
4353
  Downloading setuptools
4354
  Downloading networkx
@@ -4367,7 +4367,7 @@ Downloading nvidia-nccl-cu12 (307.4MiB)
4367
  Downloading nvidia-cublas-cu12
4368
  Downloading nvidia-cudnn-cu12
4369
  Downloading torch
4370
- Installed 26 packages in 447ms
4371
  </div>
4372
  </div>
4373
  </div>
@@ -4381,7 +4381,7 @@ Installed 26 packages in 447ms
4381
  <span onclick="toggleOutput('config')" style="cursor: pointer;">▼ output</span>
4382
  <span id="uv-indicator-config" onclick="toggleUvLogsFromHeader('config')" style="cursor: pointer;">▶ uv-logs</span>
4383
  </span> |
4384
- Cell: config | deps: torch, numpy | 34.69s
4385
  | <button class="run-btn" onclick="runCell('config')">▶ run</button>
4386
  <button class="copy-btn" onclick="copyCell('config')">Copy</button>
4387
  <a href="cells/config.py" target="_blank" class="raw-btn">Raw</a>
@@ -4442,23 +4442,23 @@ Cell: config | deps: torch, numpy | 34.69s
4442
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4443
  <div class="uv-logs-content" style="display: none;">
4444
  Downloading numpy (16.2MiB)
4445
- Downloading nvidia-cufft-cu12 (184.2MiB)
4446
- Downloading nvidia-cusolver-cu12 (255.1MiB)
4447
- Downloading torch (846.9MiB)
4448
- Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4449
  Downloading setuptools (1.1MiB)
4450
  Downloading triton (148.3MiB)
4451
- Downloading nvidia-cusparse-cu12 (274.9MiB)
4452
- Downloading networkx (1.9MiB)
4453
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
4454
- Downloading nvidia-nccl-cu12 (307.4MiB)
4455
- Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4456
- Downloading sympy (6.0MiB)
4457
  Downloading nvidia-cudnn-cu12 (674.0MiB)
4458
- Downloading nvidia-curand-cu12 (60.7MiB)
4459
- Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4460
  Downloading nvidia-cufile-cu12 (1.1MiB)
 
 
 
 
 
 
4461
  Downloading nvidia-cublas-cu12 (566.8MiB)
 
 
 
 
4462
  Downloading nvidia-cufile-cu12
4463
  Downloading setuptools
4464
  Downloading networkx
@@ -4471,13 +4471,13 @@ Downloading nvidia-cublas-cu12 (566.8MiB)
4471
  Downloading triton
4472
  Downloading nvidia-cufft-cu12
4473
  Downloading nvidia-cusolver-cu12
4474
- Downloading nvidia-cusparselt-cu12
4475
  Downloading nvidia-cusparse-cu12
 
4476
  Downloading nvidia-nccl-cu12
4477
  Downloading nvidia-cublas-cu12
4478
  Downloading nvidia-cudnn-cu12
4479
  Downloading torch
4480
- Installed 26 packages in 526ms
4481
  </div>
4482
  </div>
4483
  </div>
@@ -4490,7 +4490,7 @@ Installed 26 packages in 526ms
4490
  <span onclick="toggleOutput('save_data')" style="cursor: pointer;">▼ output</span>
4491
  <span id="uv-indicator-save_data" onclick="toggleUvLogsFromHeader('save_data')" style="cursor: pointer;">▶ uv-logs</span>
4492
  </span> |
4493
- Cell: save_data | deps: torch, numpy | 40.40s
4494
  | <button class="run-btn" onclick="runCell('save_data')">▶ run</button>
4495
  <button class="copy-btn" onclick="copyCell('save_data')">Copy</button>
4496
  <a href="cells/save_data.py" target="_blank" class="raw-btn">Raw</a>
@@ -4585,23 +4585,23 @@ Down sum: 206.729263
4585
  <div class="uv-install-logs" id="uv-logs-save_data">
4586
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4587
  <div class="uv-logs-content" style="display: none;">
4588
- Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4589
- Downloading nvidia-nccl-cu12 (307.4MiB)
4590
- Downloading nvidia-cusparse-cu12 (274.9MiB)
4591
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
4592
  Downloading setuptools (1.1MiB)
 
4593
  Downloading nvidia-cudnn-cu12 (674.0MiB)
4594
- Downloading numpy (16.2MiB)
4595
- Downloading triton (148.3MiB)
4596
- Downloading networkx (1.9MiB)
4597
- Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4598
  Downloading nvidia-curand-cu12 (60.7MiB)
4599
- Downloading nvidia-cufile-cu12 (1.1MiB)
4600
- Downloading nvidia-cufft-cu12 (184.2MiB)
4601
- Downloading sympy (6.0MiB)
4602
  Downloading torch (846.9MiB)
 
 
 
 
 
4603
  Downloading nvidia-cublas-cu12 (566.8MiB)
4604
  Downloading nvidia-cusolver-cu12 (255.1MiB)
 
 
4605
  Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4606
  Downloading nvidia-cufile-cu12
4607
  Downloading setuptools
@@ -4618,20 +4618,20 @@ Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4618
  Downloading nvidia-cusparselt-cu12
4619
  Downloading nvidia-cusparse-cu12
4620
  Downloading nvidia-nccl-cu12
4621
- Downloading nvidia-cublas-cu12
4622
  Downloading nvidia-cudnn-cu12
 
4623
  Downloading torch
4624
- Installed 26 packages in 563ms
4625
  </div>
4626
  </div>
4627
  <div class="cell-artifacts">
4628
  <h4>Artifacts:</h4>
4629
- <a href="artifacts/save_data/down_proj_bias.pt" class="artifact" target="_blank">down_proj_bias.pt</a>
4630
- <a href="artifacts/save_data/down_proj.pt" class="artifact" target="_blank">down_proj.pt</a>
4631
- <a href="artifacts/save_data/router_weight.pt" class="artifact" target="_blank">router_weight.pt</a>
4632
  <a href="artifacts/save_data/router_bias.pt" class="artifact" target="_blank">router_bias.pt</a>
 
 
4633
  <a href="artifacts/save_data/gate_up_proj_bias.pt" class="artifact" target="_blank">gate_up_proj_bias.pt</a>
4634
  <a href="artifacts/save_data/gate_up_proj.pt" class="artifact" target="_blank">gate_up_proj.pt</a>
 
4635
  </div>
4636
  </div>
4637
  </div>
@@ -4645,7 +4645,7 @@ Installed 26 packages in 563ms
4645
  <span onclick="toggleOutput('yamoe_run')" style="cursor: pointer;">▼ output</span>
4646
  <span id="uv-indicator-yamoe_run" onclick="toggleUvLogsFromHeader('yamoe_run')" style="cursor: pointer;">▶ uv-logs</span>
4647
  </span> |
4648
- Cell: yamoe_run | deps: torch, kernels, numpy | 38.77s
4649
  | <button class="run-btn" onclick="runCell('yamoe_run')">▶ run</button>
4650
  <button class="copy-btn" onclick="copyCell('yamoe_run')">Copy</button>
4651
  <a href="cells/yamoe_run.py" target="_blank" class="raw-btn">Raw</a>
@@ -4938,10 +4938,10 @@ Input Variation: +0.001 * iteration (deterministic)
4938
 
4939
  Warming up (10 iterations)...
4940
  Benchmarking (50 iterations)...
4941
- Progress: 20% complete (avg: 4.248 ms)
4942
- Progress: 40% complete (avg: 4.246 ms)
4943
- Progress: 60% complete (avg: 4.247 ms)
4944
- Progress: 80% complete (avg: 4.247 ms)
4945
 
4946
  Output tensors:
4947
  Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.049506, 0.054984], mean=0.000034, std=0.006508, norm=2.208791
@@ -4952,18 +4952,18 @@ Iterations: 50
4952
 
4953
  Latency Statistics:
4954
  Average: 4.248 ms
4955
- Min: 4.129 ms
4956
- Max: 4.266 ms
4957
  Std Dev: 0.021 ms
4958
 
4959
  Percentiles:
4960
- P50 (median): 4.252 ms
4961
- P95: 4.264 ms
4962
- P99: 4.265 ms
4963
 
4964
  Throughput:
4965
- Tokens/sec: 23542.6
4966
- Std Dev: 117.1
4967
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
4968
 
4969
  Saved benchmark results to yamoe_results.json
@@ -4973,25 +4973,25 @@ Output sum: 3.971905
4973
  <div class="uv-install-logs" id="uv-logs-yamoe_run">
4974
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4975
  <div class="uv-logs-content" style="display: none;">
4976
- Downloading networkx (1.9MiB)
4977
- Downloading sympy (6.0MiB)
 
 
4978
  Downloading setuptools (1.1MiB)
 
4979
  Downloading nvidia-nccl-cu12 (307.4MiB)
4980
- Downloading hf-xet (3.0MiB)
4981
- Downloading nvidia-cufile-cu12 (1.1MiB)
4982
- Downloading triton (148.3MiB)
4983
- Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4984
- Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4985
- Downloading nvidia-cusolver-cu12 (255.1MiB)
4986
  Downloading nvidia-cusparse-cu12 (274.9MiB)
4987
- Downloading nvidia-cudnn-cu12 (674.0MiB)
4988
  Downloading nvidia-cufft-cu12 (184.2MiB)
4989
  Downloading torch (846.9MiB)
 
 
 
 
4990
  Downloading nvidia-curand-cu12 (60.7MiB)
4991
- Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4992
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
4993
  Downloading nvidia-cublas-cu12 (566.8MiB)
4994
- Downloading numpy (16.2MiB)
4995
  Downloading nvidia-cufile-cu12
4996
  Downloading hf-xet
4997
  Downloading setuptools
@@ -5011,14 +5011,13 @@ Downloading numpy (16.2MiB)
5011
  Downloading nvidia-cublas-cu12
5012
  Downloading nvidia-cudnn-cu12
5013
  Downloading torch
5014
- Installed 37 packages in 449ms
5015
  </div>
5016
  </div>
5017
  <div class="cell-stderr">Fetching 6 files: 0%| | 0/6 [00:00&lt;?, ?it/s]
5018
- Fetching 6 files: 17%|█▋ | 1/6 [00:00&lt;00:00, 5.90it/s]
5019
- Fetching 6 files: 33%|███▎ | 2/6 [00:00&lt;00:00, 7.70it/s]
5020
- Fetching 6 files: 50%|█████ | 3/6 [00:00&lt;00:00, 4.70it/s]
5021
- Fetching 6 files: 100%|██████████| 6/6 [00:00&lt;00:00, 10.28it/s]</div>
5022
  <div class="cell-artifacts">
5023
  <h4>Artifacts:</h4>
5024
  <a href="artifacts/yamoe_run/yamoe_results.json" class="artifact" target="_blank">yamoe_results.json</a>
@@ -5035,7 +5034,7 @@ Fetching 6 files: 100%|██████████| 6/6 [00:00&lt;00:00, 10.2
5035
  <span onclick="toggleOutput('binned_run')" style="cursor: pointer;">▼ output</span>
5036
  <span id="uv-indicator-binned_run" onclick="toggleUvLogsFromHeader('binned_run')" style="cursor: pointer;">▶ uv-logs</span>
5037
  </span> |
5038
- Cell: binned_run | deps: torch, numpy | 38.76s
5039
  | <button class="run-btn" onclick="runCell('binned_run')">▶ run</button>
5040
  <button class="copy-btn" onclick="copyCell('binned_run')">Copy</button>
5041
  <a href="cells/binned_run.py" target="_blank" class="raw-btn">Raw</a>
@@ -5449,10 +5448,10 @@ Input Variation: +0.001 * iteration (deterministic)
5449
 
5450
  Warming up (10 iterations)...
5451
  Benchmarking (50 iterations)...
5452
- Progress: 20% complete (avg: 37.794 ms)
5453
- Progress: 40% complete (avg: 37.656 ms)
5454
- Progress: 60% complete (avg: 37.188 ms)
5455
- Progress: 80% complete (avg: 36.704 ms)
5456
 
5457
  Output tensors:
5458
  Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.049506, 0.054984], mean=0.000034, std=0.006508, norm=2.208791
@@ -5462,19 +5461,19 @@ Output tensors:
5462
  Iterations: 50
5463
 
5464
  Latency Statistics:
5465
- Average: 36.215 ms
5466
- Min: 33.172 ms
5467
- Max: 38.754 ms
5468
- Std Dev: 1.401 ms
5469
 
5470
  Percentiles:
5471
- P50 (median): 36.364 ms
5472
- P95: 38.061 ms
5473
- P99: 38.464 ms
5474
 
5475
  Throughput:
5476
- Tokens/sec: 2761.3
5477
- Std Dev: 108.1
5478
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
5479
 
5480
  Saved benchmark results to binned_results.json
@@ -5484,24 +5483,24 @@ Output sum: 3.971905
5484
  <div class="uv-install-logs" id="uv-logs-binned_run">
5485
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
5486
  <div class="uv-logs-content" style="display: none;">
5487
- Downloading networkx (1.9MiB)
5488
- Downloading numpy (16.2MiB)
5489
- Downloading setuptools (1.1MiB)
5490
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
5491
  Downloading nvidia-cufft-cu12 (184.2MiB)
5492
- Downloading nvidia-cudnn-cu12 (674.0MiB)
5493
- Downloading nvidia-cufile-cu12 (1.1MiB)
5494
- Downloading nvidia-curand-cu12 (60.7MiB)
5495
  Downloading nvidia-nccl-cu12 (307.4MiB)
5496
- Downloading nvidia-nvjitlink-cu12 (37.4MiB)
5497
- Downloading nvidia-cublas-cu12 (566.8MiB)
 
 
5498
  Downloading triton (148.3MiB)
 
5499
  Downloading nvidia-cusparse-cu12 (274.9MiB)
5500
- Downloading torch (846.9MiB)
 
 
5501
  Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
5502
- Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
5503
- Downloading sympy (6.0MiB)
5504
  Downloading nvidia-cusolver-cu12 (255.1MiB)
 
 
5505
  Downloading nvidia-cufile-cu12
5506
  Downloading setuptools
5507
  Downloading networkx
@@ -5514,13 +5513,13 @@ Downloading nvidia-cusolver-cu12 (255.1MiB)
5514
  Downloading triton
5515
  Downloading nvidia-cufft-cu12
5516
  Downloading nvidia-cusolver-cu12
5517
- Downloading nvidia-cusparse-cu12
5518
  Downloading nvidia-cusparselt-cu12
 
5519
  Downloading nvidia-nccl-cu12
5520
  Downloading nvidia-cublas-cu12
5521
  Downloading nvidia-cudnn-cu12
5522
  Downloading torch
5523
- Installed 26 packages in 455ms
5524
  </div>
5525
  </div>
5526
  <div class="cell-artifacts">
@@ -5539,7 +5538,7 @@ Installed 26 packages in 455ms
5539
  <span onclick="toggleOutput('gptoss_run')" style="cursor: pointer;">▼ output</span>
5540
  <span id="uv-indicator-gptoss_run" onclick="toggleUvLogsFromHeader('gptoss_run')" style="cursor: pointer;">▶ uv-logs</span>
5541
  </span> |
5542
- Cell: gptoss_run | deps: torch, numpy | 39.76s
5543
  | <button class="run-btn" onclick="runCell('gptoss_run')">▶ run</button>
5544
  <button class="copy-btn" onclick="copyCell('gptoss_run')">Copy</button>
5545
  <a href="cells/gptoss_run.py" target="_blank" class="raw-btn">Raw</a>
@@ -5857,10 +5856,10 @@ Input Variation: +0.001 * iteration (deterministic)
5857
 
5858
  Warming up (10 iterations)...
5859
  Benchmarking (50 iterations)...
5860
- Progress: 20% complete (avg: 51.012 ms)
5861
- Progress: 40% complete (avg: 49.954 ms)
5862
- Progress: 60% complete (avg: 48.390 ms)
5863
- Progress: 80% complete (avg: 46.993 ms)
5864
 
5865
  Output tensors:
5866
  Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.064982, 0.061193], mean=0.000100, std=0.013510, norm=4.585560
@@ -5870,19 +5869,19 @@ Output tensors:
5870
  Iterations: 50
5871
 
5872
  Latency Statistics:
5873
- Average: 45.950 ms
5874
- Min: 40.765 ms
5875
- Max: 52.300 ms
5876
- Std Dev: 3.623 ms
5877
 
5878
  Percentiles:
5879
- P50 (median): 45.469 ms
5880
- P95: 51.353 ms
5881
- P99: 52.122 ms
5882
 
5883
  Throughput:
5884
- Tokens/sec: 2176.3
5885
- Std Dev: 169.8
5886
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
5887
 
5888
  Saved benchmark results to gptoss_results.json
@@ -5892,24 +5891,24 @@ Output sum: 11.532237
5892
  <div class="uv-install-logs" id="uv-logs-gptoss_run">
5893
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
5894
  <div class="uv-logs-content" style="display: none;">
5895
- Downloading numpy (16.2MiB)
5896
- Downloading networkx (1.9MiB)
5897
- Downloading nvidia-cusparse-cu12 (274.9MiB)
5898
  Downloading setuptools (1.1MiB)
 
 
5899
  Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
 
 
 
 
5900
  Downloading nvidia-curand-cu12 (60.7MiB)
5901
  Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
 
 
 
5902
  Downloading nvidia-nccl-cu12 (307.4MiB)
 
 
5903
  Downloading nvidia-cusparselt-cu12 (273.9MiB)
5904
- Downloading nvidia-nvjitlink-cu12 (37.4MiB)
5905
- Downloading nvidia-cufile-cu12 (1.1MiB)
5906
- Downloading nvidia-cufft-cu12 (184.2MiB)
5907
  Downloading triton (148.3MiB)
5908
- Downloading nvidia-cusolver-cu12 (255.1MiB)
5909
- Downloading nvidia-cudnn-cu12 (674.0MiB)
5910
- Downloading torch (846.9MiB)
5911
- Downloading sympy (6.0MiB)
5912
- Downloading nvidia-cublas-cu12 (566.8MiB)
5913
  Downloading nvidia-cufile-cu12
5914
  Downloading setuptools
5915
  Downloading networkx
@@ -5922,13 +5921,13 @@ Downloading nvidia-cublas-cu12 (566.8MiB)
5922
  Downloading triton
5923
  Downloading nvidia-cufft-cu12
5924
  Downloading nvidia-cusolver-cu12
5925
- Downloading nvidia-cusparse-cu12
5926
  Downloading nvidia-cusparselt-cu12
 
5927
  Downloading nvidia-nccl-cu12
5928
  Downloading nvidia-cublas-cu12
5929
  Downloading nvidia-cudnn-cu12
5930
  Downloading torch
5931
- Installed 26 packages in 524ms
5932
  </div>
5933
  </div>
5934
  <div class="cell-artifacts">
@@ -5947,7 +5946,7 @@ Installed 26 packages in 524ms
5947
  <span onclick="toggleOutput('gptoss_training_run')" style="cursor: pointer;">▼ output</span>
5948
  <span id="uv-indicator-gptoss_training_run" onclick="toggleUvLogsFromHeader('gptoss_training_run')" style="cursor: pointer;">▶ uv-logs</span>
5949
  </span> |
5950
- Cell: gptoss_training_run | deps: torch, numpy | 40.42s
5951
  | <button class="run-btn" onclick="runCell('gptoss_training_run')">▶ run</button>
5952
  <button class="copy-btn" onclick="copyCell('gptoss_training_run')">Copy</button>
5953
  <a href="cells/gptoss_training_run.py" target="_blank" class="raw-btn">Raw</a>
@@ -6248,10 +6247,10 @@ Input Variation: +0.001 * iteration (deterministic)
6248
 
6249
  Warming up (10 iterations)...
6250
  Benchmarking (50 iterations)...
6251
- Progress: 20% complete (avg: 48.387 ms)
6252
- Progress: 40% complete (avg: 48.249 ms)
6253
- Progress: 60% complete (avg: 47.887 ms)
6254
- Progress: 80% complete (avg: 47.011 ms)
6255
 
6256
  Output tensors:
6257
  Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.064982, 0.061193], mean=0.000100, std=0.013510, norm=4.585560
@@ -6261,19 +6260,19 @@ Output tensors:
6261
  Iterations: 50
6262
 
6263
  Latency Statistics:
6264
- Average: 46.098 ms
6265
- Min: 38.839 ms
6266
- Max: 49.404 ms
6267
- Std Dev: 2.469 ms
6268
 
6269
  Percentiles:
6270
- P50 (median): 47.240 ms
6271
- P95: 48.725 ms
6272
- P99: 49.168 ms
6273
 
6274
  Throughput:
6275
- Tokens/sec: 2169.3
6276
- Std Dev: 122.3
6277
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
6278
 
6279
  Saved benchmark results to gptoss_training_results.json
@@ -6283,24 +6282,24 @@ Output sum: 11.532237
6283
  <div class="uv-install-logs" id="uv-logs-gptoss_training_run">
6284
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
6285
  <div class="uv-logs-content" style="display: none;">
6286
- Downloading nvidia-cufile-cu12 (1.1MiB)
6287
- Downloading setuptools (1.1MiB)
6288
- Downloading nvidia-nvjitlink-cu12 (37.4MiB)
6289
  Downloading sympy (6.0MiB)
6290
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
6291
- Downloading nvidia-cublas-cu12 (566.8MiB)
6292
  Downloading nvidia-nccl-cu12 (307.4MiB)
6293
- Downloading torch (846.9MiB)
6294
- Downloading nvidia-cudnn-cu12 (674.0MiB)
6295
- Downloading networkx (1.9MiB)
6296
- Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
6297
  Downloading nvidia-curand-cu12 (60.7MiB)
 
 
 
6298
  Downloading nvidia-cusolver-cu12 (255.1MiB)
6299
- Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
6300
- Downloading nvidia-cusparse-cu12 (274.9MiB)
6301
  Downloading numpy (16.2MiB)
6302
- Downloading nvidia-cufft-cu12 (184.2MiB)
6303
  Downloading triton (148.3MiB)
 
6304
  Downloading nvidia-cufile-cu12
6305
  Downloading setuptools
6306
  Downloading networkx
@@ -6313,13 +6312,13 @@ Downloading triton (148.3MiB)
6313
  Downloading triton
6314
  Downloading nvidia-cufft-cu12
6315
  Downloading nvidia-cusolver-cu12
6316
- Downloading nvidia-cusparselt-cu12
6317
  Downloading nvidia-cusparse-cu12
 
6318
  Downloading nvidia-nccl-cu12
6319
  Downloading nvidia-cublas-cu12
6320
  Downloading nvidia-cudnn-cu12
6321
  Downloading torch
6322
- Installed 26 packages in 451ms
6323
  </div>
6324
  </div>
6325
  <div class="cell-artifacts">
@@ -6338,7 +6337,7 @@ Installed 26 packages in 451ms
6338
  <span onclick="toggleOutput('megablocks_run')" style="cursor: pointer;">▼ output</span>
6339
  <span id="uv-indicator-megablocks_run" onclick="toggleUvLogsFromHeader('megablocks_run')" style="cursor: pointer;">▶ uv-logs</span>
6340
  </span> |
6341
- Cell: megablocks_run | deps: torch, numpy, kernels | 40.19s | FAILED
6342
  | <button class="run-btn" onclick="runCell('megablocks_run')">▶ run</button>
6343
  <button class="copy-btn" onclick="copyCell('megablocks_run')">Copy</button>
6344
  <a href="cells/megablocks_run.py" target="_blank" class="raw-btn">Raw</a>
@@ -6571,24 +6570,24 @@ Warming up (10 iterations)...
6571
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
6572
  <div class="uv-logs-content" style="display: none;">
6573
  Downloading numpy (16.2MiB)
6574
- Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
6575
- Downloading nvidia-cufile-cu12 (1.1MiB)
6576
- Downloading hf-xet (3.0MiB)
6577
- Downloading networkx (1.9MiB)
6578
- Downloading torch (846.9MiB)
6579
- Downloading nvidia-cusparse-cu12 (274.9MiB)
6580
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
6581
  Downloading nvidia-cudnn-cu12 (674.0MiB)
6582
- Downloading triton (148.3MiB)
6583
- Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
6584
  Downloading nvidia-curand-cu12 (60.7MiB)
 
 
6585
  Downloading nvidia-nvjitlink-cu12 (37.4MiB)
6586
- Downloading sympy (6.0MiB)
6587
- Downloading nvidia-cusolver-cu12 (255.1MiB)
6588
  Downloading nvidia-cublas-cu12 (566.8MiB)
6589
- Downloading nvidia-cufft-cu12 (184.2MiB)
6590
  Downloading nvidia-nccl-cu12 (307.4MiB)
6591
- Downloading setuptools (1.1MiB)
 
 
 
 
 
 
6592
  Downloading nvidia-cufile-cu12
6593
  Downloading hf-xet
6594
  Downloading setuptools
@@ -6602,25 +6601,25 @@ Downloading setuptools (1.1MiB)
6602
  Downloading triton
6603
  Downloading nvidia-cufft-cu12
6604
  Downloading nvidia-cusolver-cu12
6605
- Downloading nvidia-cusparse-cu12
6606
  Downloading nvidia-cusparselt-cu12
 
6607
  Downloading nvidia-nccl-cu12
6608
  Downloading nvidia-cublas-cu12
6609
  Downloading nvidia-cudnn-cu12
6610
  Downloading torch
6611
- Installed 37 packages in 449ms
6612
  </div>
6613
  </div>
6614
  <div class="cell-stderr">Fetching 66 files: 0%| | 0/66 [00:00&lt;?, ?it/s]
6615
- Fetching 66 files: 2%|▏ | 1/66 [00:00&lt;00:23, 2.74it/s]
6616
- Fetching 66 files: 14%|█▎ | 9/66 [00:00&lt;00:03, 17.38it/s]
6617
- Fetching 66 files: 26%|██▌ | 17/66 [00:01&lt;00:02, 17.85it/s]
6618
- Fetching 66 files: 55%|█████▍ | 36/66 [00:01&lt;00:00, 42.23it/s]
6619
- Fetching 66 files: 65%|██████▌ | 43/66 [00:01&lt;00:00, 38.03it/s]
6620
- Fetching 66 files: 74%|███████▍ | 49/66 [00:01&lt;00:00, 30.77it/s]
6621
- Fetching 66 files: 97%|█████████▋| 64/66 [00:01&lt;00:00, 48.18it/s]
6622
- Fetching 66 files: 100%|██████████| 66/66 [00:01&lt;00:00, 34.40it/s]
6623
- /tmp/tmptrubhjfl/cuda_utils.c:5:10: fatal error: Python.h: No such file or directory
6624
  5 | #include &lt;Python.h&gt;
6625
  | ^~~~~~~~~~
6626
  compilation terminated.
@@ -6637,87 +6636,87 @@ Traceback (most recent call last):
6637
  File &quot;/repo/moe_benchmarks/megablocks_yamoe/.uvnote/cells/bench_utils.py&quot;, line 177, in &lt;lambda&gt;
6638
  call = lambda x: fn(x, *args[1:], **kwargs)
6639
  ^^^^^^^^^^^^^^^^^^^^^^^^^^
6640
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py&quot;, line 1773, in _wrapped_call_impl
6641
  return self._call_impl(*args, **kwargs)
6642
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6643
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py&quot;, line 1784, in _call_impl
6644
  return forward_call(*args, **kwargs)
6645
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6646
  File &quot;/repo/moe_benchmarks/megablocks_yamoe/.uvnote/cells/megablocks_run.py&quot;, line 81, in forward
6647
  output, dummy_routing_weights = self.model(hidden_states)
6648
  ^^^^^^^^^^^^^^^^^^^^^^^^^
6649
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py&quot;, line 1773, in _wrapped_call_impl
6650
  return self._call_impl(*args, **kwargs)
6651
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6652
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py&quot;, line 1784, in _call_impl
6653
  return forward_call(*args, **kwargs)
6654
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6655
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py&quot;, line 896, in forward
6656
  output, expert_weights_out, *_ = moe_forward(
6657
  ^^^^^^^^^^^^
6658
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py&quot;, line 730, in moe_forward
6659
  x, tokens_per_expert = forward_fn(**forward_args)
6660
  ^^^^^^^^^^^^^^^^^^^^^^^^^^
6661
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py&quot;, line 457, in forward_once
6662
  x = permute_and_compute(
6663
  ^^^^^^^^^^^^^^^^^^^^
6664
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py&quot;, line 401, in permute_and_compute
6665
  x = ops.binned_gather(x, indices, bins, expert_capacity, top_k)
6666
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6667
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/autograd/function.py&quot;, line 576, in apply
6668
  return super().apply(*args, **kwargs) # type: ignore[misc]
6669
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6670
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/ops/stk_autocast.py&quot;, line 30, in decorate_fwd
6671
  return fwd(*args, **kwargs)
6672
  ^^^^^^^^^^^^^^^^^^^^
6673
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/ops/binned_gather.py&quot;, line 26, in forward
6674
  return kernels.binned_gather(x, indices, None, bins, bin_size, top_k)
6675
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6676
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/backend/kernels.py&quot;, line 419, in binned_gather
6677
  _binned_copy[(num_experts, expert_capacity)](
6678
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/jit.py&quot;, line 390, in &lt;lambda&gt;
6679
  return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
6680
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6681
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py&quot;, line 239, in run
6682
  benchmark()
6683
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py&quot;, line 228, in benchmark
6684
  timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
6685
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6686
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py&quot;, line 228, in &lt;dictcomp&gt;
6687
  timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
6688
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6689
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py&quot;, line 160, in _bench
6690
  return self.do_bench(kernel_call, quantiles=(0.5, 0.2, 0.8))
6691
  ^^^^^^^^^^^^^
6692
  File &quot;/usr/lib/python3.11/functools.py&quot;, line 1001, in __get__
6693
  val = self.func(instance)
6694
  ^^^^^^^^^^^^^^^^^^^
6695
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py&quot;, line 121, in do_bench
6696
  return driver.active.get_benchmarker()
6697
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6698
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/driver.py&quot;, line 30, in __getattr__
6699
  return getattr(self._initialize_obj(), name)
6700
  ^^^^^^^^^^^^^^^^^^^^^^
6701
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/driver.py&quot;, line 26, in _initialize_obj
6702
  self._obj = self._init_fn()
6703
  ^^^^^^^^^^^^^^^
6704
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/driver.py&quot;, line 12, in _create_driver
6705
  return active_drivers[0]()
6706
  ^^^^^^^^^^^^^^^^^^^
6707
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/driver.py&quot;, line 715, in __init__
6708
  self.utils = CudaUtils() # TODO: make static
6709
  ^^^^^^^^^^^
6710
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/driver.py&quot;, line 62, in __init__
6711
  mod = compile_module_from_src(
6712
  ^^^^^^^^^^^^^^^^^^^^^^^^
6713
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/build.py&quot;, line 88, in compile_module_from_src
6714
  so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [])
6715
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6716
- File &quot;/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/build.py&quot;, line 51, in _build
6717
  subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL)
6718
  File &quot;/usr/lib/python3.11/subprocess.py&quot;, line 413, in check_call
6719
  raise CalledProcessError(retcode, cmd)
6720
- subprocess.CalledProcessError: Command &#x27;[&#x27;/usr/bin/gcc&#x27;, &#x27;/tmp/tmptrubhjfl/cuda_utils.c&#x27;, &#x27;-O3&#x27;, &#x27;-shared&#x27;, &#x27;-fPIC&#x27;, &#x27;-Wno-psabi&#x27;, &#x27;-o&#x27;, &#x27;/tmp/tmptrubhjfl/cuda_utils.cpython-311-x86_64-linux-gnu.so&#x27;, &#x27;-lcuda&#x27;, &#x27;-L/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/lib&#x27;, &#x27;-L/usr/lib/x86_64-linux-gnu&#x27;, &#x27;-I/tmp/uvnote-run-68wjowzh/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/include&#x27;, &#x27;-I/tmp/tmptrubhjfl&#x27;, &#x27;-I/usr/include/python3.11&#x27;]&#x27; returned non-zero exit status 1.</div>
6721
  </div>
6722
  </div>
6723
 
 
3720
  <span onclick="toggleOutput('utils')" style="cursor: pointer;">▼ output</span>
3721
  <span id="uv-indicator-utils" onclick="toggleUvLogsFromHeader('utils')" style="cursor: pointer;">▶ uv-logs</span>
3722
  </span> |
3723
+ Cell: utils | deps: torch, numpy | 34.88s
3724
  | <button class="run-btn" onclick="runCell('utils')">▶ run</button>
3725
  <button class="copy-btn" onclick="copyCell('utils')">Copy</button>
3726
  <a href="cells/utils.py" target="_blank" class="raw-btn">Raw</a>
 
3794
  <div class="uv-install-logs" id="uv-logs-utils">
3795
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
3796
  <div class="uv-logs-content" style="display: none;">
3797
+ Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
3798
+ Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
 
 
 
 
3799
  Downloading nvidia-cusparse-cu12 (274.9MiB)
3800
+ Downloading nvidia-nvjitlink-cu12 (37.4MiB)
 
 
3801
  Downloading nvidia-cufile-cu12 (1.1MiB)
3802
+ Downloading sympy (6.0MiB)
3803
+ Downloading nvidia-cusolver-cu12 (255.1MiB)
 
3804
  Downloading nvidia-cufft-cu12 (184.2MiB)
3805
+ Downloading nvidia-cublas-cu12 (566.8MiB)
3806
+ Downloading nvidia-curand-cu12 (60.7MiB)
3807
+ Downloading networkx (1.9MiB)
3808
  Downloading triton (148.3MiB)
3809
+ Downloading nvidia-nccl-cu12 (307.4MiB)
3810
+ Downloading numpy (16.2MiB)
3811
+ Downloading torch (846.9MiB)
3812
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
3813
+ Downloading setuptools (1.1MiB)
3814
+ Downloading nvidia-cudnn-cu12 (674.0MiB)
3815
  Downloading nvidia-cufile-cu12
3816
  Downloading setuptools
3817
  Downloading networkx
 
3824
  Downloading triton
3825
  Downloading nvidia-cufft-cu12
3826
  Downloading nvidia-cusolver-cu12
 
3827
  Downloading nvidia-cusparselt-cu12
3828
+ Downloading nvidia-cusparse-cu12
3829
  Downloading nvidia-nccl-cu12
3830
  Downloading nvidia-cublas-cu12
3831
  Downloading nvidia-cudnn-cu12
3832
  Downloading torch
3833
+ Installed 26 packages in 452ms
3834
  </div>
3835
  </div>
3836
  </div>
 
3843
  <span onclick="toggleOutput('bench_utils')" style="cursor: pointer;">▼ output</span>
3844
  <span id="uv-indicator-bench_utils" onclick="toggleUvLogsFromHeader('bench_utils')" style="cursor: pointer;">▶ uv-logs</span>
3845
  </span> |
3846
+ Cell: bench_utils | deps: torch, numpy | 34.66s
3847
  | <button class="run-btn" onclick="runCell('bench_utils')">▶ run</button>
3848
  <button class="copy-btn" onclick="copyCell('bench_utils')">Copy</button>
3849
  <a href="cells/bench_utils.py" target="_blank" class="raw-btn">Raw</a>
 
4331
  <div class="uv-install-logs" id="uv-logs-bench_utils">
4332
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4333
  <div class="uv-logs-content" style="display: none;">
4334
+ Downloading numpy (16.2MiB)
4335
+ Downloading sympy (6.0MiB)
4336
+ Downloading networkx (1.9MiB)
4337
+ Downloading nvidia-nccl-cu12 (307.4MiB)
4338
+ Downloading nvidia-curand-cu12 (60.7MiB)
4339
  Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
 
 
4340
  Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4341
+ Downloading nvidia-cudnn-cu12 (674.0MiB)
 
4342
  Downloading nvidia-cusparse-cu12 (274.9MiB)
4343
  Downloading nvidia-cusparselt-cu12 (273.9MiB)
4344
+ Downloading setuptools (1.1MiB)
 
 
 
4345
  Downloading nvidia-cusolver-cu12 (255.1MiB)
4346
  Downloading nvidia-cufft-cu12 (184.2MiB)
4347
  Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4348
+ Downloading nvidia-cublas-cu12 (566.8MiB)
4349
+ Downloading nvidia-cufile-cu12 (1.1MiB)
4350
+ Downloading triton (148.3MiB)
4351
+ Downloading torch (846.9MiB)
4352
  Downloading nvidia-cufile-cu12
4353
  Downloading setuptools
4354
  Downloading networkx
 
4367
  Downloading nvidia-cublas-cu12
4368
  Downloading nvidia-cudnn-cu12
4369
  Downloading torch
4370
+ Installed 26 packages in 535ms
4371
  </div>
4372
  </div>
4373
  </div>
 
4381
  <span onclick="toggleOutput('config')" style="cursor: pointer;">▼ output</span>
4382
  <span id="uv-indicator-config" onclick="toggleUvLogsFromHeader('config')" style="cursor: pointer;">▶ uv-logs</span>
4383
  </span> |
4384
+ Cell: config | deps: torch, numpy | 35.36s
4385
  | <button class="run-btn" onclick="runCell('config')">▶ run</button>
4386
  <button class="copy-btn" onclick="copyCell('config')">Copy</button>
4387
  <a href="cells/config.py" target="_blank" class="raw-btn">Raw</a>
 
4442
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4443
  <div class="uv-logs-content" style="display: none;">
4444
  Downloading numpy (16.2MiB)
4445
+ Downloading sympy (6.0MiB)
4446
+ Downloading networkx (1.9MiB)
 
 
4447
  Downloading setuptools (1.1MiB)
4448
  Downloading triton (148.3MiB)
 
 
 
 
 
 
4449
  Downloading nvidia-cudnn-cu12 (674.0MiB)
 
 
4450
  Downloading nvidia-cufile-cu12 (1.1MiB)
4451
+ Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4452
+ Downloading torch (846.9MiB)
4453
+ Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4454
+ Downloading nvidia-cusparse-cu12 (274.9MiB)
4455
+ Downloading nvidia-cusolver-cu12 (255.1MiB)
4456
+ Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4457
  Downloading nvidia-cublas-cu12 (566.8MiB)
4458
+ Downloading nvidia-nccl-cu12 (307.4MiB)
4459
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
4460
+ Downloading nvidia-curand-cu12 (60.7MiB)
4461
+ Downloading nvidia-cufft-cu12 (184.2MiB)
4462
  Downloading nvidia-cufile-cu12
4463
  Downloading setuptools
4464
  Downloading networkx
 
4471
  Downloading triton
4472
  Downloading nvidia-cufft-cu12
4473
  Downloading nvidia-cusolver-cu12
 
4474
  Downloading nvidia-cusparse-cu12
4475
+ Downloading nvidia-cusparselt-cu12
4476
  Downloading nvidia-nccl-cu12
4477
  Downloading nvidia-cublas-cu12
4478
  Downloading nvidia-cudnn-cu12
4479
  Downloading torch
4480
+ Installed 26 packages in 452ms
4481
  </div>
4482
  </div>
4483
  </div>
 
4490
  <span onclick="toggleOutput('save_data')" style="cursor: pointer;">▼ output</span>
4491
  <span id="uv-indicator-save_data" onclick="toggleUvLogsFromHeader('save_data')" style="cursor: pointer;">▶ uv-logs</span>
4492
  </span> |
4493
+ Cell: save_data | deps: torch, numpy | 39.03s
4494
  | <button class="run-btn" onclick="runCell('save_data')">▶ run</button>
4495
  <button class="copy-btn" onclick="copyCell('save_data')">Copy</button>
4496
  <a href="cells/save_data.py" target="_blank" class="raw-btn">Raw</a>
 
4585
  <div class="uv-install-logs" id="uv-logs-save_data">
4586
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4587
  <div class="uv-logs-content" style="display: none;">
4588
+ Downloading nvidia-cufft-cu12 (184.2MiB)
4589
+ Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4590
+ Downloading sympy (6.0MiB)
 
4591
  Downloading setuptools (1.1MiB)
4592
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
4593
  Downloading nvidia-cudnn-cu12 (674.0MiB)
 
 
 
 
4594
  Downloading nvidia-curand-cu12 (60.7MiB)
 
 
 
4595
  Downloading torch (846.9MiB)
4596
+ Downloading networkx (1.9MiB)
4597
+ Downloading nvidia-cusparse-cu12 (274.9MiB)
4598
+ Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4599
+ Downloading numpy (16.2MiB)
4600
+ Downloading triton (148.3MiB)
4601
  Downloading nvidia-cublas-cu12 (566.8MiB)
4602
  Downloading nvidia-cusolver-cu12 (255.1MiB)
4603
+ Downloading nvidia-nccl-cu12 (307.4MiB)
4604
+ Downloading nvidia-cufile-cu12 (1.1MiB)
4605
  Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4606
  Downloading nvidia-cufile-cu12
4607
  Downloading setuptools
 
4618
  Downloading nvidia-cusparselt-cu12
4619
  Downloading nvidia-cusparse-cu12
4620
  Downloading nvidia-nccl-cu12
 
4621
  Downloading nvidia-cudnn-cu12
4622
+ Downloading nvidia-cublas-cu12
4623
  Downloading torch
4624
+ Installed 26 packages in 447ms
4625
  </div>
4626
  </div>
4627
  <div class="cell-artifacts">
4628
  <h4>Artifacts:</h4>
 
 
 
4629
  <a href="artifacts/save_data/router_bias.pt" class="artifact" target="_blank">router_bias.pt</a>
4630
+ <a href="artifacts/save_data/router_weight.pt" class="artifact" target="_blank">router_weight.pt</a>
4631
+ <a href="artifacts/save_data/down_proj_bias.pt" class="artifact" target="_blank">down_proj_bias.pt</a>
4632
  <a href="artifacts/save_data/gate_up_proj_bias.pt" class="artifact" target="_blank">gate_up_proj_bias.pt</a>
4633
  <a href="artifacts/save_data/gate_up_proj.pt" class="artifact" target="_blank">gate_up_proj.pt</a>
4634
+ <a href="artifacts/save_data/down_proj.pt" class="artifact" target="_blank">down_proj.pt</a>
4635
  </div>
4636
  </div>
4637
  </div>
 
4645
  <span onclick="toggleOutput('yamoe_run')" style="cursor: pointer;">▼ output</span>
4646
  <span id="uv-indicator-yamoe_run" onclick="toggleUvLogsFromHeader('yamoe_run')" style="cursor: pointer;">▶ uv-logs</span>
4647
  </span> |
4648
+ Cell: yamoe_run | deps: torch, kernels, numpy | 39.06s
4649
  | <button class="run-btn" onclick="runCell('yamoe_run')">▶ run</button>
4650
  <button class="copy-btn" onclick="copyCell('yamoe_run')">Copy</button>
4651
  <a href="cells/yamoe_run.py" target="_blank" class="raw-btn">Raw</a>
 
4938
 
4939
  Warming up (10 iterations)...
4940
  Benchmarking (50 iterations)...
4941
+ Progress: 20% complete (avg: 4.247 ms)
4942
+ Progress: 40% complete (avg: 4.244 ms)
4943
+ Progress: 60% complete (avg: 4.246 ms)
4944
+ Progress: 80% complete (avg: 4.246 ms)
4945
 
4946
  Output tensors:
4947
  Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.049506, 0.054984], mean=0.000034, std=0.006508, norm=2.208791
 
4952
 
4953
  Latency Statistics:
4954
  Average: 4.248 ms
4955
+ Min: 4.137 ms
4956
+ Max: 4.281 ms
4957
  Std Dev: 0.021 ms
4958
 
4959
  Percentiles:
4960
+ P50 (median): 4.253 ms
4961
+ P95: 4.266 ms
4962
+ P99: 4.274 ms
4963
 
4964
  Throughput:
4965
+ Tokens/sec: 23539.4
4966
+ Std Dev: 120.7
4967
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
4968
 
4969
  Saved benchmark results to yamoe_results.json
 
4973
  <div class="uv-install-logs" id="uv-logs-yamoe_run">
4974
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4975
  <div class="uv-logs-content" style="display: none;">
4976
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
4977
+ Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4978
+ Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4979
+ Downloading nvidia-cufile-cu12 (1.1MiB)
4980
  Downloading setuptools (1.1MiB)
4981
+ Downloading numpy (16.2MiB)
4982
  Downloading nvidia-nccl-cu12 (307.4MiB)
4983
+ Downloading networkx (1.9MiB)
 
 
 
 
 
4984
  Downloading nvidia-cusparse-cu12 (274.9MiB)
4985
+ Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4986
  Downloading nvidia-cufft-cu12 (184.2MiB)
4987
  Downloading torch (846.9MiB)
4988
+ Downloading triton (148.3MiB)
4989
+ Downloading hf-xet (3.0MiB)
4990
+ Downloading nvidia-cudnn-cu12 (674.0MiB)
4991
+ Downloading sympy (6.0MiB)
4992
  Downloading nvidia-curand-cu12 (60.7MiB)
4993
+ Downloading nvidia-cusolver-cu12 (255.1MiB)
 
4994
  Downloading nvidia-cublas-cu12 (566.8MiB)
 
4995
  Downloading nvidia-cufile-cu12
4996
  Downloading hf-xet
4997
  Downloading setuptools
 
5011
  Downloading nvidia-cublas-cu12
5012
  Downloading nvidia-cudnn-cu12
5013
  Downloading torch
5014
+ Installed 37 packages in 553ms
5015
  </div>
5016
  </div>
5017
  <div class="cell-stderr">Fetching 6 files: 0%| | 0/6 [00:00&lt;?, ?it/s]
5018
+ Fetching 6 files: 17%|█▋ | 1/6 [00:00&lt;00:01, 2.76it/s]
5019
+ Fetching 6 files: 50%|█████ | 3/6 [00:00&lt;00:00, 3.03it/s]
5020
+ Fetching 6 files: 100%|██████████| 6/6 [00:00&lt;00:00, 6.01it/s]</div>
 
5021
  <div class="cell-artifacts">
5022
  <h4>Artifacts:</h4>
5023
  <a href="artifacts/yamoe_run/yamoe_results.json" class="artifact" target="_blank">yamoe_results.json</a>
 
5034
  <span onclick="toggleOutput('binned_run')" style="cursor: pointer;">▼ output</span>
5035
  <span id="uv-indicator-binned_run" onclick="toggleUvLogsFromHeader('binned_run')" style="cursor: pointer;">▶ uv-logs</span>
5036
  </span> |
5037
+ Cell: binned_run | deps: torch, numpy | 39.51s
5038
  | <button class="run-btn" onclick="runCell('binned_run')">▶ run</button>
5039
  <button class="copy-btn" onclick="copyCell('binned_run')">Copy</button>
5040
  <a href="cells/binned_run.py" target="_blank" class="raw-btn">Raw</a>
 
5448
 
5449
  Warming up (10 iterations)...
5450
  Benchmarking (50 iterations)...
5451
+ Progress: 20% complete (avg: 37.524 ms)
5452
+ Progress: 40% complete (avg: 37.442 ms)
5453
+ Progress: 60% complete (avg: 37.122 ms)
5454
+ Progress: 80% complete (avg: 36.627 ms)
5455
 
5456
  Output tensors:
5457
  Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.049506, 0.054984], mean=0.000034, std=0.006508, norm=2.208791
 
5461
  Iterations: 50
5462
 
5463
  Latency Statistics:
5464
+ Average: 36.268 ms
5465
+ Min: 34.104 ms
5466
+ Max: 37.686 ms
5467
+ Std Dev: 1.160 ms
5468
 
5469
  Percentiles:
5470
+ P50 (median): 36.522 ms
5471
+ P95: 37.643 ms
5472
+ P99: 37.677 ms
5473
 
5474
  Throughput:
5475
+ Tokens/sec: 2757.2
5476
+ Std Dev: 89.1
5477
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
5478
 
5479
  Saved benchmark results to binned_results.json
 
5483
  <div class="uv-install-logs" id="uv-logs-binned_run">
5484
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
5485
  <div class="uv-logs-content" style="display: none;">
 
 
 
 
5486
  Downloading nvidia-cufft-cu12 (184.2MiB)
 
 
 
5487
  Downloading nvidia-nccl-cu12 (307.4MiB)
5488
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
5489
+ Downloading nvidia-curand-cu12 (60.7MiB)
5490
+ Downloading nvidia-cufile-cu12 (1.1MiB)
5491
+ Downloading sympy (6.0MiB)
5492
  Downloading triton (148.3MiB)
5493
+ Downloading nvidia-cublas-cu12 (566.8MiB)
5494
  Downloading nvidia-cusparse-cu12 (274.9MiB)
5495
+ Downloading nvidia-nvjitlink-cu12 (37.4MiB)
5496
+ Downloading networkx (1.9MiB)
5497
+ Downloading nvidia-cudnn-cu12 (674.0MiB)
5498
  Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
5499
+ Downloading torch (846.9MiB)
5500
+ Downloading setuptools (1.1MiB)
5501
  Downloading nvidia-cusolver-cu12 (255.1MiB)
5502
+ Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
5503
+ Downloading numpy (16.2MiB)
5504
  Downloading nvidia-cufile-cu12
5505
  Downloading setuptools
5506
  Downloading networkx
 
5513
  Downloading triton
5514
  Downloading nvidia-cufft-cu12
5515
  Downloading nvidia-cusolver-cu12
 
5516
  Downloading nvidia-cusparselt-cu12
5517
+ Downloading nvidia-cusparse-cu12
5518
  Downloading nvidia-nccl-cu12
5519
  Downloading nvidia-cublas-cu12
5520
  Downloading nvidia-cudnn-cu12
5521
  Downloading torch
5522
+ Installed 26 packages in 453ms
5523
  </div>
5524
  </div>
5525
  <div class="cell-artifacts">
 
5538
  <span onclick="toggleOutput('gptoss_run')" style="cursor: pointer;">▼ output</span>
5539
  <span id="uv-indicator-gptoss_run" onclick="toggleUvLogsFromHeader('gptoss_run')" style="cursor: pointer;">▶ uv-logs</span>
5540
  </span> |
5541
+ Cell: gptoss_run | deps: torch, numpy | 40.20s
5542
  | <button class="run-btn" onclick="runCell('gptoss_run')">▶ run</button>
5543
  <button class="copy-btn" onclick="copyCell('gptoss_run')">Copy</button>
5544
  <a href="cells/gptoss_run.py" target="_blank" class="raw-btn">Raw</a>
 
5856
 
5857
  Warming up (10 iterations)...
5858
  Benchmarking (50 iterations)...
5859
+ Progress: 20% complete (avg: 50.493 ms)
5860
+ Progress: 40% complete (avg: 49.981 ms)
5861
+ Progress: 60% complete (avg: 49.061 ms)
5862
+ Progress: 80% complete (avg: 47.981 ms)
5863
 
5864
  Output tensors:
5865
  Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.064982, 0.061193], mean=0.000100, std=0.013510, norm=4.585560
 
5869
  Iterations: 50
5870
 
5871
  Latency Statistics:
5872
+ Average: 46.914 ms
5873
+ Min: 40.448 ms
5874
+ Max: 51.075 ms
5875
+ Std Dev: 2.992 ms
5876
 
5877
  Percentiles:
5878
+ P50 (median): 47.419 ms
5879
+ P95: 50.800 ms
5880
+ P99: 50.949 ms
5881
 
5882
  Throughput:
5883
+ Tokens/sec: 2131.6
5884
+ Std Dev: 139.9
5885
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
5886
 
5887
  Saved benchmark results to gptoss_results.json
 
5891
  <div class="uv-install-logs" id="uv-logs-gptoss_run">
5892
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
5893
  <div class="uv-logs-content" style="display: none;">
 
 
 
5894
  Downloading setuptools (1.1MiB)
5895
+ Downloading nvidia-cublas-cu12 (566.8MiB)
5896
+ Downloading nvidia-nvjitlink-cu12 (37.4MiB)
5897
  Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
5898
+ Downloading nvidia-cufft-cu12 (184.2MiB)
5899
+ Downloading nvidia-cusparse-cu12 (274.9MiB)
5900
+ Downloading nvidia-cudnn-cu12 (674.0MiB)
5901
+ Downloading sympy (6.0MiB)
5902
  Downloading nvidia-curand-cu12 (60.7MiB)
5903
  Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
5904
+ Downloading networkx (1.9MiB)
5905
+ Downloading nvidia-cusolver-cu12 (255.1MiB)
5906
+ Downloading nvidia-cufile-cu12 (1.1MiB)
5907
  Downloading nvidia-nccl-cu12 (307.4MiB)
5908
+ Downloading numpy (16.2MiB)
5909
+ Downloading torch (846.9MiB)
5910
  Downloading nvidia-cusparselt-cu12 (273.9MiB)
 
 
 
5911
  Downloading triton (148.3MiB)
 
 
 
 
 
5912
  Downloading nvidia-cufile-cu12
5913
  Downloading setuptools
5914
  Downloading networkx
 
5921
  Downloading triton
5922
  Downloading nvidia-cufft-cu12
5923
  Downloading nvidia-cusolver-cu12
 
5924
  Downloading nvidia-cusparselt-cu12
5925
+ Downloading nvidia-cusparse-cu12
5926
  Downloading nvidia-nccl-cu12
5927
  Downloading nvidia-cublas-cu12
5928
  Downloading nvidia-cudnn-cu12
5929
  Downloading torch
5930
+ Installed 26 packages in 452ms
5931
  </div>
5932
  </div>
5933
  <div class="cell-artifacts">
 
5946
  <span onclick="toggleOutput('gptoss_training_run')" style="cursor: pointer;">▼ output</span>
5947
  <span id="uv-indicator-gptoss_training_run" onclick="toggleUvLogsFromHeader('gptoss_training_run')" style="cursor: pointer;">▶ uv-logs</span>
5948
  </span> |
5949
+ Cell: gptoss_training_run | deps: torch, numpy | 40.63s
5950
  | <button class="run-btn" onclick="runCell('gptoss_training_run')">▶ run</button>
5951
  <button class="copy-btn" onclick="copyCell('gptoss_training_run')">Copy</button>
5952
  <a href="cells/gptoss_training_run.py" target="_blank" class="raw-btn">Raw</a>
 
6247
 
6248
  Warming up (10 iterations)...
6249
  Benchmarking (50 iterations)...
6250
+ Progress: 20% complete (avg: 49.824 ms)
6251
+ Progress: 40% complete (avg: 49.309 ms)
6252
+ Progress: 60% complete (avg: 48.365 ms)
6253
+ Progress: 80% complete (avg: 47.278 ms)
6254
 
6255
  Output tensors:
6256
  Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.064982, 0.061193], mean=0.000100, std=0.013510, norm=4.585560
 
6260
  Iterations: 50
6261
 
6262
  Latency Statistics:
6263
+ Average: 46.289 ms
6264
+ Min: 39.979 ms
6265
+ Max: 50.581 ms
6266
+ Std Dev: 2.917 ms
6267
 
6268
  Percentiles:
6269
+ P50 (median): 46.648 ms
6270
+ P95: 50.267 ms
6271
+ P99: 50.516 ms
6272
 
6273
  Throughput:
6274
+ Tokens/sec: 2160.3
6275
+ Std Dev: 139.9
6276
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
6277
 
6278
  Saved benchmark results to gptoss_training_results.json
 
6282
  <div class="uv-install-logs" id="uv-logs-gptoss_training_run">
6283
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
6284
  <div class="uv-logs-content" style="display: none;">
6285
+ Downloading nvidia-cufft-cu12 (184.2MiB)
6286
+ Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
6287
+ Downloading networkx (1.9MiB)
6288
  Downloading sympy (6.0MiB)
6289
+ Downloading nvidia-cusparse-cu12 (274.9MiB)
 
6290
  Downloading nvidia-nccl-cu12 (307.4MiB)
6291
+ Downloading setuptools (1.1MiB)
 
 
 
6292
  Downloading nvidia-curand-cu12 (60.7MiB)
6293
+ Downloading nvidia-cufile-cu12 (1.1MiB)
6294
+ Downloading nvidia-cudnn-cu12 (674.0MiB)
6295
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
6296
  Downloading nvidia-cusolver-cu12 (255.1MiB)
6297
+ Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
6298
+ Downloading nvidia-nvjitlink-cu12 (37.4MiB)
6299
  Downloading numpy (16.2MiB)
6300
+ Downloading torch (846.9MiB)
6301
  Downloading triton (148.3MiB)
6302
+ Downloading nvidia-cublas-cu12 (566.8MiB)
6303
  Downloading nvidia-cufile-cu12
6304
  Downloading setuptools
6305
  Downloading networkx
 
6312
  Downloading triton
6313
  Downloading nvidia-cufft-cu12
6314
  Downloading nvidia-cusolver-cu12
 
6315
  Downloading nvidia-cusparse-cu12
6316
+ Downloading nvidia-cusparselt-cu12
6317
  Downloading nvidia-nccl-cu12
6318
  Downloading nvidia-cublas-cu12
6319
  Downloading nvidia-cudnn-cu12
6320
  Downloading torch
6321
+ Installed 26 packages in 570ms
6322
  </div>
6323
  </div>
6324
  <div class="cell-artifacts">
 
6337
  <span onclick="toggleOutput('megablocks_run')" style="cursor: pointer;">▼ output</span>
6338
  <span id="uv-indicator-megablocks_run" onclick="toggleUvLogsFromHeader('megablocks_run')" style="cursor: pointer;">▶ uv-logs</span>
6339
  </span> |
6340
+ Cell: megablocks_run | deps: torch, numpy, kernels | 40.35s | FAILED
6341
  | <button class="run-btn" onclick="runCell('megablocks_run')">▶ run</button>
6342
  <button class="copy-btn" onclick="copyCell('megablocks_run')">Copy</button>
6343
  <a href="cells/megablocks_run.py" target="_blank" class="raw-btn">Raw</a>
 
6570
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
6571
  <div class="uv-logs-content" style="display: none;">
6572
  Downloading numpy (16.2MiB)
6573
+ Downloading sympy (6.0MiB)
6574
+ Downloading setuptools (1.1MiB)
 
 
 
 
 
6575
  Downloading nvidia-cudnn-cu12 (674.0MiB)
 
 
6576
  Downloading nvidia-curand-cu12 (60.7MiB)
6577
+ Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
6578
+ Downloading hf-xet (3.0MiB)
6579
  Downloading nvidia-nvjitlink-cu12 (37.4MiB)
6580
+ Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
 
6581
  Downloading nvidia-cublas-cu12 (566.8MiB)
6582
+ Downloading nvidia-cusolver-cu12 (255.1MiB)
6583
  Downloading nvidia-nccl-cu12 (307.4MiB)
6584
+ Downloading nvidia-cufft-cu12 (184.2MiB)
6585
+ Downloading triton (148.3MiB)
6586
+ Downloading nvidia-cusparse-cu12 (274.9MiB)
6587
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
6588
+ Downloading nvidia-cufile-cu12 (1.1MiB)
6589
+ Downloading torch (846.9MiB)
6590
+ Downloading networkx (1.9MiB)
6591
  Downloading nvidia-cufile-cu12
6592
  Downloading hf-xet
6593
  Downloading setuptools
 
6601
  Downloading triton
6602
  Downloading nvidia-cufft-cu12
6603
  Downloading nvidia-cusolver-cu12
 
6604
  Downloading nvidia-cusparselt-cu12
6605
+ Downloading nvidia-cusparse-cu12
6606
  Downloading nvidia-nccl-cu12
6607
  Downloading nvidia-cublas-cu12
6608
  Downloading nvidia-cudnn-cu12
6609
  Downloading torch
6610
+ Installed 37 packages in 448ms
6611
  </div>
6612
  </div>
6613
  <div class="cell-stderr">Fetching 66 files: 0%| | 0/66 [00:00&lt;?, ?it/s]
6614
+ Fetching 66 files: 2%|▏ | 1/66 [00:00&lt;00:28, 2.31it/s]
6615
+ Fetching 66 files: 14%|█▎ | 9/66 [00:00&lt;00:03, 18.19it/s]
6616
+ Fetching 66 files: 26%|██▌ | 17/66 [00:01&lt;00:02, 16.61it/s]
6617
+ Fetching 66 files: 52%|█████▏ | 34/66 [00:01&lt;00:00, 38.17it/s]
6618
+ Fetching 66 files: 64%|██████▎ | 42/66 [00:01&lt;00:00, 36.62it/s]
6619
+ Fetching 66 files: 73%|███████▎ | 48/66 [00:01&lt;00:00, 28.57it/s]
6620
+ Fetching 66 files: 92%|█████████▏| 61/66 [00:01&lt;00:00, 39.67it/s]
6621
+ Fetching 66 files: 100%|██████████| 66/66 [00:02&lt;00:00, 32.91it/s]
6622
+ /tmp/tmp1397kafx/cuda_utils.c:5:10: fatal error: Python.h: No such file or directory
6623
  5 | #include &lt;Python.h&gt;
6624
  | ^~~~~~~~~~
6625
  compilation terminated.
 
6636
  File &quot;/repo/moe_benchmarks/megablocks_yamoe/.uvnote/cells/bench_utils.py&quot;, line 177, in &lt;lambda&gt;
6637
  call = lambda x: fn(x, *args[1:], **kwargs)
6638
  ^^^^^^^^^^^^^^^^^^^^^^^^^^
6639
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py&quot;, line 1773, in _wrapped_call_impl
6640
  return self._call_impl(*args, **kwargs)
6641
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6642
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py&quot;, line 1784, in _call_impl
6643
  return forward_call(*args, **kwargs)
6644
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6645
  File &quot;/repo/moe_benchmarks/megablocks_yamoe/.uvnote/cells/megablocks_run.py&quot;, line 81, in forward
6646
  output, dummy_routing_weights = self.model(hidden_states)
6647
  ^^^^^^^^^^^^^^^^^^^^^^^^^
6648
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py&quot;, line 1773, in _wrapped_call_impl
6649
  return self._call_impl(*args, **kwargs)
6650
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6651
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py&quot;, line 1784, in _call_impl
6652
  return forward_call(*args, **kwargs)
6653
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6654
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py&quot;, line 896, in forward
6655
  output, expert_weights_out, *_ = moe_forward(
6656
  ^^^^^^^^^^^^
6657
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py&quot;, line 730, in moe_forward
6658
  x, tokens_per_expert = forward_fn(**forward_args)
6659
  ^^^^^^^^^^^^^^^^^^^^^^^^^^
6660
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py&quot;, line 457, in forward_once
6661
  x = permute_and_compute(
6662
  ^^^^^^^^^^^^^^^^^^^^
6663
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py&quot;, line 401, in permute_and_compute
6664
  x = ops.binned_gather(x, indices, bins, expert_capacity, top_k)
6665
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6666
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/autograd/function.py&quot;, line 576, in apply
6667
  return super().apply(*args, **kwargs) # type: ignore[misc]
6668
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6669
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/ops/stk_autocast.py&quot;, line 30, in decorate_fwd
6670
  return fwd(*args, **kwargs)
6671
  ^^^^^^^^^^^^^^^^^^^^
6672
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/ops/binned_gather.py&quot;, line 26, in forward
6673
  return kernels.binned_gather(x, indices, None, bins, bin_size, top_k)
6674
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6675
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/backend/kernels.py&quot;, line 419, in binned_gather
6676
  _binned_copy[(num_experts, expert_capacity)](
6677
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/jit.py&quot;, line 390, in &lt;lambda&gt;
6678
  return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
6679
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6680
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py&quot;, line 239, in run
6681
  benchmark()
6682
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py&quot;, line 228, in benchmark
6683
  timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
6684
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6685
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py&quot;, line 228, in &lt;dictcomp&gt;
6686
  timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
6687
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6688
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py&quot;, line 160, in _bench
6689
  return self.do_bench(kernel_call, quantiles=(0.5, 0.2, 0.8))
6690
  ^^^^^^^^^^^^^
6691
  File &quot;/usr/lib/python3.11/functools.py&quot;, line 1001, in __get__
6692
  val = self.func(instance)
6693
  ^^^^^^^^^^^^^^^^^^^
6694
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py&quot;, line 121, in do_bench
6695
  return driver.active.get_benchmarker()
6696
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6697
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/driver.py&quot;, line 30, in __getattr__
6698
  return getattr(self._initialize_obj(), name)
6699
  ^^^^^^^^^^^^^^^^^^^^^^
6700
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/driver.py&quot;, line 26, in _initialize_obj
6701
  self._obj = self._init_fn()
6702
  ^^^^^^^^^^^^^^^
6703
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/driver.py&quot;, line 12, in _create_driver
6704
  return active_drivers[0]()
6705
  ^^^^^^^^^^^^^^^^^^^
6706
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/driver.py&quot;, line 715, in __init__
6707
  self.utils = CudaUtils() # TODO: make static
6708
  ^^^^^^^^^^^
6709
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/driver.py&quot;, line 62, in __init__
6710
  mod = compile_module_from_src(
6711
  ^^^^^^^^^^^^^^^^^^^^^^^^
6712
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/build.py&quot;, line 88, in compile_module_from_src
6713
  so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [])
6714
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6715
+ File &quot;/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/build.py&quot;, line 51, in _build
6716
  subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL)
6717
  File &quot;/usr/lib/python3.11/subprocess.py&quot;, line 413, in check_call
6718
  raise CalledProcessError(retcode, cmd)
6719
+ subprocess.CalledProcessError: Command &#x27;[&#x27;/usr/bin/gcc&#x27;, &#x27;/tmp/tmp1397kafx/cuda_utils.c&#x27;, &#x27;-O3&#x27;, &#x27;-shared&#x27;, &#x27;-fPIC&#x27;, &#x27;-Wno-psabi&#x27;, &#x27;-o&#x27;, &#x27;/tmp/tmp1397kafx/cuda_utils.cpython-311-x86_64-linux-gnu.so&#x27;, &#x27;-lcuda&#x27;, &#x27;-L/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/lib&#x27;, &#x27;-L/usr/lib/x86_64-linux-gnu&#x27;, &#x27;-I/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/include&#x27;, &#x27;-I/tmp/tmp1397kafx&#x27;, &#x27;-I/usr/include/python3.11&#x27;]&#x27; returned non-zero exit status 1.</div>
6720
  </div>
6721
  </div>
6722