Upload folder using huggingface_hub
Browse files- megablocks/megablocks_only.html +131 -463
- megablocks_yamoe/artifacts/binned_run/binned_results.json +9 -9
- megablocks_yamoe/artifacts/gptoss_run/gptoss_results.json +9 -9
- megablocks_yamoe/artifacts/gptoss_training_run/gptoss_training_results.json +9 -9
- megablocks_yamoe/artifacts/yamoe_run/yamoe_results.json +9 -9
- megablocks_yamoe/cells/__pycache__/bench_utils.cpython-311.pyc +0 -0
- megablocks_yamoe/cells/__pycache__/config.cpython-311.pyc +0 -0
- megablocks_yamoe/megablocks_yamoe.html +16 -619
- megablocks_yamoe/torch_profile.html +224 -225
megablocks/megablocks_only.html
CHANGED
|
@@ -3718,219 +3718,119 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
|
|
| 3718 |
<h1>No Kernels</h1>
|
| 3719 |
<p>First, we run the model without any custom kernels to get a reference point.</p>
|
| 3720 |
<h2>Forward</h2>
|
| 3721 |
-
<
|
| 3722 |
-
<p>Next, we'll attempt to run a forward and backward pass without any custom kernels. This will likely run out of memory since the default implementation is not optimized for memory usage.</p>
|
| 3723 |
-
<div class="cell cell-failed" id="cell-forward_and_backward_no_kernel">
|
| 3724 |
<div class="cell-header">
|
| 3725 |
<span class="collapse-indicators">
|
| 3726 |
-
<span onclick="toggleCode('
|
| 3727 |
-
<span onclick="toggleOutput('
|
| 3728 |
-
<span id="uv-indicator-
|
| 3729 |
</span> |
|
| 3730 |
-
Cell:
|
| 3731 |
-
| <button class="run-btn" onclick="runCell('
|
| 3732 |
-
<button class="copy-btn" onclick="copyCell('
|
| 3733 |
-
<a href="cells/
|
| 3734 |
</div>
|
| 3735 |
-
<div id="code-
|
| 3736 |
<div class="highlight-with-lines">
|
| 3737 |
-
<div class="line-numbers" id="lines-
|
| 3738 |
-
<a class="line-number" data-cell="
|
| 3739 |
-
<a class="line-number" data-cell="
|
| 3740 |
-
<a class="line-number" data-cell="
|
| 3741 |
-
<a class="line-number" data-cell="
|
| 3742 |
-
<a class="line-number" data-cell="
|
| 3743 |
-
<a class="line-number" data-cell="
|
| 3744 |
-
<a class="line-number" data-cell="
|
| 3745 |
-
<a class="line-number" data-cell="
|
| 3746 |
-
<a class="line-number" data-cell="
|
| 3747 |
-
<a class="line-number" data-cell="
|
| 3748 |
-
<a class="line-number" data-cell="
|
| 3749 |
-
<a class="line-number" data-cell="
|
| 3750 |
-
<a class="line-number" data-cell="
|
| 3751 |
-
<a class="line-number" data-cell="
|
| 3752 |
-
<a class="line-number" data-cell="
|
| 3753 |
-
<a class="line-number" data-cell="
|
| 3754 |
-
<a class="line-number" data-cell="
|
| 3755 |
-
<a class="line-number" data-cell="
|
| 3756 |
-
<a class="line-number" data-cell="
|
| 3757 |
-
<a class="line-number" data-cell="
|
| 3758 |
-
<a class="line-number" data-cell="
|
| 3759 |
-
<a class="line-number" data-cell="
|
| 3760 |
-
<a class="line-number" data-cell="
|
| 3761 |
-
<a class="line-number" data-cell="
|
| 3762 |
-
<a class="line-number" data-cell="
|
| 3763 |
-
<a class="line-number" data-cell="
|
| 3764 |
-
<a class="line-number" data-cell="
|
| 3765 |
-
<a class="line-number" data-cell="
|
| 3766 |
-
<a class="line-number" data-cell="
|
| 3767 |
-
<a class="line-number" data-cell="
|
| 3768 |
-
<a class="line-number" data-cell="
|
| 3769 |
-
<a class="line-number" data-cell="
|
| 3770 |
-
<a class="line-number" data-cell="
|
| 3771 |
-
<a class="line-number" data-cell="
|
| 3772 |
-
<a class="line-number" data-cell="
|
| 3773 |
-
<a class="line-number" data-cell="
|
| 3774 |
-
<a class="line-number" data-cell="
|
| 3775 |
-
<a class="line-number" data-cell="
|
| 3776 |
-
<a class="line-number" data-cell="
|
| 3777 |
-
<a class="line-number" data-cell="
|
| 3778 |
-
<a class="line-number" data-cell="
|
| 3779 |
-
<a class="line-number" data-cell="
|
| 3780 |
-
<a class="line-number" data-cell="
|
| 3781 |
-
<a class="line-number" data-cell="
|
| 3782 |
-
<a class="line-number" data-cell="
|
| 3783 |
-
<a class="line-number" data-cell="
|
| 3784 |
-
<a class="line-number" data-cell="
|
| 3785 |
-
<a class="line-number" data-cell="
|
| 3786 |
-
<a class="line-number" data-cell="
|
| 3787 |
-
<a class="line-number" data-cell="
|
| 3788 |
-
<a class="line-number" data-cell="
|
| 3789 |
-
<a class="line-number" data-cell="
|
| 3790 |
-
<a class="line-number" data-cell="
|
| 3791 |
-
<a class="line-number" data-cell="
|
| 3792 |
-
<a class="line-number" data-cell="
|
| 3793 |
-
<a class="line-number" data-cell="
|
| 3794 |
-
<a class="line-number" data-cell="
|
| 3795 |
-
<a class="line-number" data-cell="
|
| 3796 |
-
<a class="line-number" data-cell="
|
| 3797 |
-
<a class="line-number" data-cell="
|
| 3798 |
-
<a class="line-number" data-cell="
|
| 3799 |
-
<a class="line-number" data-cell="
|
| 3800 |
-
<a class="line-number" data-cell="
|
| 3801 |
-
<a class="line-number" data-cell="
|
| 3802 |
-
<a class="line-number" data-cell="
|
| 3803 |
-
<a class="line-number" data-cell="
|
| 3804 |
-
<a class="line-number" data-cell="
|
| 3805 |
-
<a class="line-number" data-cell="
|
| 3806 |
-
<a class="line-number" data-cell="
|
| 3807 |
-
<a class="line-number" data-cell="
|
| 3808 |
-
<a class="line-number" data-cell="
|
| 3809 |
-
<a class="line-number" data-cell="
|
| 3810 |
-
<a class="line-number" data-cell="
|
| 3811 |
-
<a class="line-number" data-cell="
|
| 3812 |
-
<a class="line-number" data-cell="
|
| 3813 |
-
<a class="line-number" data-cell="
|
| 3814 |
-
<a class="line-number" data-cell="
|
| 3815 |
-
<a class="line-number" data-cell="
|
| 3816 |
-
<a class="line-number" data-cell="
|
| 3817 |
-
<a class="line-number" data-cell="
|
| 3818 |
-
<a class="line-number" data-cell="
|
| 3819 |
-
<a class="line-number" data-cell="
|
| 3820 |
-
<a class="line-number" data-cell="
|
| 3821 |
-
<a class="line-number" data-cell="
|
| 3822 |
-
<a class="line-number" data-cell="
|
| 3823 |
-
<a class="line-number" data-cell="
|
| 3824 |
-
<a class="line-number" data-cell="
|
| 3825 |
-
<a class="line-number" data-cell="
|
| 3826 |
-
<a class="line-number" data-cell="
|
| 3827 |
-
<a class="line-number" data-cell="
|
| 3828 |
-
<a class="line-number" data-cell="
|
| 3829 |
-
<a class="line-number" data-cell="
|
| 3830 |
-
<a class="line-number" data-cell="
|
| 3831 |
-
<a class="line-number" data-cell="
|
| 3832 |
-
<a class="line-number" data-cell="
|
| 3833 |
-
<a class="line-number" data-cell="
|
| 3834 |
-
<a class="line-number" data-cell="
|
| 3835 |
-
<a class="line-number" data-cell="
|
| 3836 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="99" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 99, true);">99</a>
|
| 3837 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="100" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 100, true);">100</a>
|
| 3838 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="101" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 101, true);">101</a>
|
| 3839 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="102" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 102, true);">102</a>
|
| 3840 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="103" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 103, true);">103</a>
|
| 3841 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="104" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 104, true);">104</a>
|
| 3842 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="105" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 105, true);">105</a>
|
| 3843 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="106" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 106, true);">106</a>
|
| 3844 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="107" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 107, true);">107</a>
|
| 3845 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="108" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 108, true);">108</a>
|
| 3846 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="109" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 109, true);">109</a>
|
| 3847 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="110" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 110, true);">110</a>
|
| 3848 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="111" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 111, true);">111</a>
|
| 3849 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="112" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 112, true);">112</a>
|
| 3850 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="113" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 113, true);">113</a>
|
| 3851 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="114" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 114, true);">114</a>
|
| 3852 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="115" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 115, true);">115</a>
|
| 3853 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="116" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 116, true);">116</a>
|
| 3854 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="117" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 117, true);">117</a>
|
| 3855 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="118" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 118, true);">118</a>
|
| 3856 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="119" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 119, true);">119</a>
|
| 3857 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="120" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 120, true);">120</a>
|
| 3858 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="121" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 121, true);">121</a>
|
| 3859 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="122" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 122, true);">122</a>
|
| 3860 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="123" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 123, true);">123</a>
|
| 3861 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="124" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 124, true);">124</a>
|
| 3862 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="125" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 125, true);">125</a>
|
| 3863 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="126" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 126, true);">126</a>
|
| 3864 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="127" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 127, true);">127</a>
|
| 3865 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="128" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 128, true);">128</a>
|
| 3866 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="129" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 129, true);">129</a>
|
| 3867 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="130" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 130, true);">130</a>
|
| 3868 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="131" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 131, true);">131</a>
|
| 3869 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="132" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 132, true);">132</a>
|
| 3870 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="133" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 133, true);">133</a>
|
| 3871 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="134" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 134, true);">134</a>
|
| 3872 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="135" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 135, true);">135</a>
|
| 3873 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="136" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 136, true);">136</a>
|
| 3874 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="137" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 137, true);">137</a>
|
| 3875 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="138" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 138, true);">138</a>
|
| 3876 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="139" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 139, true);">139</a>
|
| 3877 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="140" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 140, true);">140</a>
|
| 3878 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="141" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 141, true);">141</a>
|
| 3879 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="142" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 142, true);">142</a>
|
| 3880 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="143" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 143, true);">143</a>
|
| 3881 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="144" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 144, true);">144</a>
|
| 3882 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="145" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 145, true);">145</a>
|
| 3883 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="146" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 146, true);">146</a>
|
| 3884 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="147" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 147, true);">147</a>
|
| 3885 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="148" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 148, true);">148</a>
|
| 3886 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="149" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 149, true);">149</a>
|
| 3887 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="150" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 150, true);">150</a>
|
| 3888 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="151" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 151, true);">151</a>
|
| 3889 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="152" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 152, true);">152</a>
|
| 3890 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="153" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 153, true);">153</a>
|
| 3891 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="154" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 154, true);">154</a>
|
| 3892 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="155" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 155, true);">155</a>
|
| 3893 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="156" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 156, true);">156</a>
|
| 3894 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="157" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 157, true);">157</a>
|
| 3895 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="158" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 158, true);">158</a>
|
| 3896 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="159" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 159, true);">159</a>
|
| 3897 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="160" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 160, true);">160</a>
|
| 3898 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="161" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 161, true);">161</a>
|
| 3899 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="162" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 162, true);">162</a>
|
| 3900 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="163" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 163, true);">163</a>
|
| 3901 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="164" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 164, true);">164</a>
|
| 3902 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="165" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 165, true);">165</a>
|
| 3903 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="166" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 166, true);">166</a>
|
| 3904 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="167" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 167, true);">167</a>
|
| 3905 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="168" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 168, true);">168</a>
|
| 3906 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="169" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 169, true);">169</a>
|
| 3907 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="170" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 170, true);">170</a>
|
| 3908 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="171" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 171, true);">171</a>
|
| 3909 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="172" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 172, true);">172</a>
|
| 3910 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="173" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 173, true);">173</a>
|
| 3911 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="174" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 174, true);">174</a>
|
| 3912 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="175" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 175, true);">175</a>
|
| 3913 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="176" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 176, true);">176</a>
|
| 3914 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="177" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 177, true);">177</a>
|
| 3915 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="178" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 178, true);">178</a>
|
| 3916 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="179" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 179, true);">179</a>
|
| 3917 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="180" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 180, true);">180</a>
|
| 3918 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="181" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 181, true);">181</a>
|
| 3919 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="182" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 182, true);">182</a>
|
| 3920 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="183" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 183, true);">183</a>
|
| 3921 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="184" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 184, true);">184</a>
|
| 3922 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="185" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 185, true);">185</a>
|
| 3923 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="186" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 186, true);">186</a>
|
| 3924 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="187" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 187, true);">187</a>
|
| 3925 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="188" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 188, true);">188</a>
|
| 3926 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="189" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 189, true);">189</a>
|
| 3927 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="190" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 190, true);">190</a>
|
| 3928 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="191" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 191, true);">191</a>
|
| 3929 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="192" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 192, true);">192</a>
|
| 3930 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="193" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 193, true);">193</a>
|
| 3931 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="194" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 194, true);">194</a>
|
| 3932 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="195" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 195, true);">195</a>
|
| 3933 |
-
<a class="line-number" data-cell="forward_and_backward_no_kernel" data-line="196" href="#cell-forward_and_backward_no_kernel" onclick="event.preventDefault(); selectCellLine('forward_and_backward_no_kernel', 196, true);">196</a>
|
| 3934 |
</div>
|
| 3935 |
<div class="code-wrap">
|
| 3936 |
<div class="highlight"><pre><span></span><span class="c1"># /// script</span>
|
|
@@ -3957,9 +3857,6 @@ Cell: forward_and_backward_no_kernel | 99.38s | FAILED
|
|
| 3957 |
<span class="kn">import</span><span class="w"> </span><span class="nn">logging</span>
|
| 3958 |
<span class="kn">from</span><span class="w"> </span><span class="nn">transformers.models.gpt_oss.modeling_gpt_oss</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssRMSNorm</span>
|
| 3959 |
|
| 3960 |
-
<span class="c1"># remove liger kernel for testing </span>
|
| 3961 |
-
<span class="n">replace_kernel_forward_from_hub</span><span class="p">(</span><span class="n">GptOssRMSNorm</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
|
| 3962 |
-
|
| 3963 |
<span class="c1"># set to debug logging</span>
|
| 3964 |
<span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
|
| 3965 |
|
|
@@ -3998,6 +3895,8 @@ Cell: forward_and_backward_no_kernel | 99.38s | FAILED
|
|
| 3998 |
<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">PreTrainedTokenizerFast</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">model_id</span><span class="p">)</span>
|
| 3999 |
<span class="n">quantization_config</span> <span class="o">=</span> <span class="n">Mxfp4Config</span><span class="p">(</span><span class="n">dequantize</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
| 4000 |
|
|
|
|
|
|
|
| 4001 |
<span class="n">model</span> <span class="o">=</span> <span class="n">GptOssForCausalLM</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
|
| 4002 |
<span class="n">model_id</span><span class="p">,</span>
|
| 4003 |
<span class="n">dtype</span><span class="o">=</span><span class="s2">"bfloat16"</span><span class="p">,</span>
|
|
@@ -4018,14 +3917,9 @@ Cell: forward_and_backward_no_kernel | 99.38s | FAILED
|
|
| 4018 |
<span class="n">reasoning_effort</span><span class="o">=</span><span class="s2">"low"</span><span class="p">,</span>
|
| 4019 |
<span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s2">"cuda"</span><span class="p">)</span>
|
| 4020 |
|
| 4021 |
-
<span class="n">max_tokens</span> <span class="o">=</span> <span class="mi">
|
| 4022 |
|
| 4023 |
-
<span class="
|
| 4024 |
-
<span class="n">reset_peak_memory_stats</span><span class="p">()</span>
|
| 4025 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Pre-generation memory: </span><span class="si">{</span><span class="n">get_memory_stats</span><span class="p">()</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
| 4026 |
-
|
| 4027 |
-
<span class="c1"># forward and backward pass</span>
|
| 4028 |
-
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">autograd</span><span class="o">.</span><span class="n">set_grad_enabled</span><span class="p">(</span><span class="kc">True</span><span class="p">):</span>
|
| 4029 |
<span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
|
| 4030 |
<span class="n">generated</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span>
|
| 4031 |
<span class="o">**</span><span class="n">inputs</span><span class="p">,</span>
|
|
@@ -4034,262 +3928,36 @@ Cell: forward_and_backward_no_kernel | 99.38s | FAILED
|
|
| 4034 |
<span class="n">temperature</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
| 4035 |
<span class="p">)</span>
|
| 4036 |
<span class="n">end_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
|
| 4037 |
-
|
| 4038 |
-
|
| 4039 |
-
|
| 4040 |
-
|
| 4041 |
-
<span class="c1"># Use gradient checkpointing to reduce memory usage</span>
|
| 4042 |
-
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="s1">'gradient_checkpointing_enable'</span><span class="p">):</span>
|
| 4043 |
-
<span class="n">model</span><span class="o">.</span><span class="n">gradient_checkpointing_enable</span><span class="p">()</span>
|
| 4044 |
-
<span class="nb">print</span><span class="p">(</span><span class="s2">"Enabled gradient checkpointing"</span><span class="p">)</span>
|
| 4045 |
-
|
| 4046 |
-
<span class="c1"># Reduce sequence length if needed for memory</span>
|
| 4047 |
-
<span class="n">max_seq_len</span> <span class="o">=</span> <span class="mi">512</span> <span class="c1"># Limit sequence length for backward pass</span>
|
| 4048 |
-
<span class="k">if</span> <span class="n">generated</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o">></span> <span class="n">max_seq_len</span><span class="p">:</span>
|
| 4049 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Truncating sequence from </span><span class="si">{</span><span class="n">generated</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="si">}</span><span class="s2"> to </span><span class="si">{</span><span class="n">max_seq_len</span><span class="si">}</span><span class="s2"> tokens"</span><span class="p">)</span>
|
| 4050 |
-
<span class="n">full_sequence</span> <span class="o">=</span> <span class="n">generated</span><span class="p">[:,</span> <span class="o">-</span><span class="n">max_seq_len</span><span class="p">:]</span>
|
| 4051 |
-
<span class="k">else</span><span class="p">:</span>
|
| 4052 |
-
<span class="n">full_sequence</span> <span class="o">=</span> <span class="n">generated</span>
|
| 4053 |
-
|
| 4054 |
-
<span class="c1"># Get model outputs for the full sequence</span>
|
| 4055 |
-
<span class="n">model</span><span class="o">.</span><span class="n">train</span><span class="p">()</span> <span class="c1"># Enable dropout and other training behaviors</span>
|
| 4056 |
-
|
| 4057 |
-
<span class="k">try</span><span class="p">:</span>
|
| 4058 |
-
<span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span>
|
| 4059 |
-
<span class="n">input_ids</span><span class="o">=</span><span class="n">full_sequence</span><span class="p">,</span>
|
| 4060 |
-
<span class="n">labels</span><span class="o">=</span><span class="n">full_sequence</span><span class="p">,</span> <span class="c1"># This will compute loss internally</span>
|
| 4061 |
-
<span class="n">return_dict</span><span class="o">=</span><span class="kc">True</span>
|
| 4062 |
-
<span class="p">)</span>
|
| 4063 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Post-forward memory: </span><span class="si">{</span><span class="n">get_memory_stats</span><span class="p">()</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
| 4064 |
-
|
| 4065 |
-
<span class="c1"># If model doesn't compute loss, compute it manually</span>
|
| 4066 |
-
<span class="k">if</span> <span class="n">outputs</span><span class="o">.</span><span class="n">loss</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
| 4067 |
-
<span class="n">shift_logits</span> <span class="o">=</span> <span class="n">outputs</span><span class="o">.</span><span class="n">logits</span><span class="p">[</span><span class="o">...</span><span class="p">,</span> <span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:]</span><span class="o">.</span><span class="n">contiguous</span><span class="p">()</span>
|
| 4068 |
-
<span class="n">shift_labels</span> <span class="o">=</span> <span class="n">full_sequence</span><span class="p">[</span><span class="o">...</span><span class="p">,</span> <span class="mi">1</span><span class="p">:]</span><span class="o">.</span><span class="n">contiguous</span><span class="p">()</span>
|
| 4069 |
-
|
| 4070 |
-
<span class="c1"># Use CrossEntropyLoss with ignore_index for padding tokens</span>
|
| 4071 |
-
<span class="n">loss_fct</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">CrossEntropyLoss</span><span class="p">(</span><span class="n">ignore_index</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">pad_token_id</span> <span class="k">if</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">pad_token_id</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="o">-</span><span class="mi">100</span><span class="p">)</span>
|
| 4072 |
-
<span class="n">loss</span> <span class="o">=</span> <span class="n">loss_fct</span><span class="p">(</span>
|
| 4073 |
-
<span class="n">shift_logits</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">shift_logits</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)),</span>
|
| 4074 |
-
<span class="n">shift_labels</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
|
| 4075 |
-
<span class="p">)</span>
|
| 4076 |
-
<span class="k">else</span><span class="p">:</span>
|
| 4077 |
-
<span class="n">loss</span> <span class="o">=</span> <span class="n">outputs</span><span class="o">.</span><span class="n">loss</span>
|
| 4078 |
-
|
| 4079 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Loss: </span><span class="si">{</span><span class="n">loss</span><span class="o">.</span><span class="n">item</span><span class="p">()</span><span class="si">:</span><span class="s2">.4f</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
| 4080 |
-
|
| 4081 |
-
<span class="c1"># Clear intermediate tensors to save memory</span>
|
| 4082 |
-
<span class="k">del</span> <span class="n">outputs</span>
|
| 4083 |
-
<span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">empty_cache</span><span class="p">()</span>
|
| 4084 |
-
|
| 4085 |
-
<span class="c1"># Perform backward pass with memory management</span>
|
| 4086 |
-
<span class="nb">print</span><span class="p">(</span><span class="s2">"Running backward pass..."</span><span class="p">)</span>
|
| 4087 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Pre-backward memory: </span><span class="si">{</span><span class="n">get_memory_stats</span><span class="p">()</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
| 4088 |
-
|
| 4089 |
-
<span class="n">loss</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span>
|
| 4090 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Post-backward memory: </span><span class="si">{</span><span class="n">get_memory_stats</span><span class="p">()</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
| 4091 |
-
|
| 4092 |
-
<span class="k">except</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">OutOfMemoryError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
|
| 4093 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"OOM during forward/backward pass: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
| 4094 |
-
<span class="nb">print</span><span class="p">(</span><span class="s2">"Try reducing max_tokens or max_seq_len"</span><span class="p">)</span>
|
| 4095 |
-
<span class="k">raise</span>
|
| 4096 |
-
|
| 4097 |
-
<span class="c1"># Calculate gradient statistics and print sample gradients</span>
|
| 4098 |
-
<span class="n">total_norm</span> <span class="o">=</span> <span class="mf">0.0</span>
|
| 4099 |
-
<span class="n">param_count</span> <span class="o">=</span> <span class="mi">0</span>
|
| 4100 |
-
<span class="n">grad_samples</span> <span class="o">=</span> <span class="p">{}</span>
|
| 4101 |
-
|
| 4102 |
-
<span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">named_parameters</span><span class="p">():</span>
|
| 4103 |
-
<span class="k">if</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
| 4104 |
-
<span class="n">param_count</span> <span class="o">+=</span> <span class="mi">1</span>
|
| 4105 |
-
<span class="n">grad_norm</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">item</span><span class="p">()</span>
|
| 4106 |
-
<span class="n">total_norm</span> <span class="o">+=</span> <span class="n">grad_norm</span> <span class="o">**</span> <span class="mi">2</span>
|
| 4107 |
-
|
| 4108 |
-
<span class="c1"># Collect gradient statistics for key layers</span>
|
| 4109 |
-
<span class="k">if</span> <span class="nb">any</span><span class="p">(</span><span class="n">key</span> <span class="ow">in</span> <span class="n">name</span> <span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'embed'</span><span class="p">,</span> <span class="s1">'lm_head'</span><span class="p">,</span> <span class="s1">'mlp.up'</span><span class="p">,</span> <span class="s1">'mlp.down'</span><span class="p">,</span> <span class="s1">'self_attn.q_proj'</span><span class="p">,</span> <span class="s1">'norm'</span><span class="p">]):</span>
|
| 4110 |
-
<span class="n">grad_samples</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
|
| 4111 |
-
<span class="s1">'norm'</span><span class="p">:</span> <span class="n">grad_norm</span><span class="p">,</span>
|
| 4112 |
-
<span class="s1">'mean'</span><span class="p">:</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span><span class="o">.</span><span class="n">item</span><span class="p">(),</span>
|
| 4113 |
-
<span class="s1">'std'</span><span class="p">:</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">std</span><span class="p">()</span><span class="o">.</span><span class="n">item</span><span class="p">(),</span>
|
| 4114 |
-
<span class="s1">'max'</span><span class="p">:</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">max</span><span class="p">()</span><span class="o">.</span><span class="n">item</span><span class="p">(),</span>
|
| 4115 |
-
<span class="s1">'min'</span><span class="p">:</span> <span class="n">p</span><span class="o">.</span><span class="n">grad</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">min</span><span class="p">()</span><span class="o">.</span><span class="n">item</span><span class="p">(),</span>
|
| 4116 |
-
<span class="p">}</span>
|
| 4117 |
-
|
| 4118 |
-
<span class="n">total_norm</span> <span class="o">=</span> <span class="n">total_norm</span> <span class="o">**</span> <span class="mf">0.5</span>
|
| 4119 |
-
|
| 4120 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="se">\n</span><span class="s2">Gradient norm: </span><span class="si">{</span><span class="n">total_norm</span><span class="si">:</span><span class="s2">.4f</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
| 4121 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Parameters with gradients: </span><span class="si">{</span><span class="n">param_count</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
| 4122 |
-
|
| 4123 |
-
<span class="c1"># Print sample gradients from important layers</span>
|
| 4124 |
-
<span class="nb">print</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">Sample gradient statistics:"</span><span class="p">)</span>
|
| 4125 |
-
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">stats</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">grad_samples</span><span class="o">.</span><span class="n">items</span><span class="p">())[:</span><span class="mi">10</span><span class="p">]):</span>
|
| 4126 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">" </span><span class="si">{</span><span class="n">name</span><span class="p">[:</span><span class="mi">60</span><span class="p">]</span><span class="si">:</span><span class="s2"><60</span><span class="si">}</span><span class="s2"> | norm: </span><span class="si">{</span><span class="n">stats</span><span class="p">[</span><span class="s1">'norm'</span><span class="p">]</span><span class="si">:</span><span class="s2">.4e</span><span class="si">}</span><span class="s2"> | mean: </span><span class="si">{</span><span class="n">stats</span><span class="p">[</span><span class="s1">'mean'</span><span class="p">]</span><span class="si">:</span><span class="s2">.4e</span><span class="si">}</span><span class="s2"> | std: </span><span class="si">{</span><span class="n">stats</span><span class="p">[</span><span class="s1">'std'</span><span class="p">]</span><span class="si">:</span><span class="s2">.4e</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
| 4127 |
-
|
| 4128 |
-
<span class="c1"># Optional: zero gradients for next iteration</span>
|
| 4129 |
-
<span class="n">model</span><span class="o">.</span><span class="n">zero_grad</span><span class="p">()</span>
|
| 4130 |
-
<span class="n">model</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span> <span class="c1"># Switch back to eval mode</span>
|
| 4131 |
</pre></div>
|
| 4132 |
|
| 4133 |
-
<div class="code-line-highlight" id="line-highlight-
|
| 4134 |
</div>
|
| 4135 |
</div>
|
| 4136 |
</div>
|
| 4137 |
-
<div id="output-
|
| 4138 |
-
<div class="cell-
|
| 4139 |
-
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
|
| 4140 |
-
Knowledge cutoff: 2024-06
|
| 4141 |
-
Current date: 2025-09-24
|
| 4142 |
-
|
| 4143 |
-
Reasoning: low
|
| 4144 |
-
|
| 4145 |
-
# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions
|
| 4146 |
-
|
| 4147 |
-
What is Tensor Parallelism?
|
| 4148 |
-
|
| 4149 |
-
<|end|><|start|>assistant<|channel|>analysis<|message|>Need to explain tensor parallelism.<|end|><|start|>assistant<|channel|>final<|message|>**Tensor Parallelism** is a technique used to distribute the computation of a single tensor (or a set of tensors) across multiple devices (GPUs, TPUs, or other accelerators) in order to speed up large‑scale deep‑learning training and inference. It is one of several parallelism strategies—alongside data parallelism, pipeline parallelism, and model parallelism—designed to scale models that exceed the memory or compute capacity of a single device.
|
| 4150 |
-
|
| 4151 |
-
---
|
| 4152 |
-
|
| 4153 |
-
## 1. Why Tensor Parallelism?
|
| 4154 |
-
|
| 4155 |
-
- **Memory constraints**: Modern
|
| 4156 |
-
Generation took 13.15 seconds
|
| 4157 |
-
Post-generation memory: {'allocated_gb': 9.398670336, 'peak_gb': 9.514059776, 'reserved_gb': 17.188257792}
|
| 4158 |
-
Enabled gradient checkpointing
|
| 4159 |
-
Post-forward memory: {'allocated_gb': 9.487933952, 'peak_gb': 9.514059776, 'reserved_gb': 17.188257792}
|
| 4160 |
-
Loss: 1.9761
|
| 4161 |
-
Running backward pass...
|
| 4162 |
-
Pre-backward memory: {'allocated_gb': 9.405890048, 'peak_gb': 9.514059776, 'reserved_gb': 17.177772032}
|
| 4163 |
-
OOM during forward/backward pass: CUDA out of memory. Tried to allocate 508.00 MiB. GPU 2 has a total capacity of 22.30 GiB of which 118.69 MiB is free. Process 25557 has 22.18 GiB memory in use. Of the allocated memory 21.52 GiB is allocated by PyTorch, and 357.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
|
| 4164 |
-
Try reducing max_tokens or max_seq_len
|
| 4165 |
-
</div>
|
| 4166 |
-
<div class="uv-install-logs" id="uv-logs-forward_and_backward_no_kernel">
|
| 4167 |
-
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4168 |
-
<div class="uv-logs-content" style="display: none;">
|
| 4169 |
-
Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
|
| 4170 |
Downloading cpython-3.13.7-linux-x86_64-gnu (download)
|
| 4171 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4172 |
-
Updated https://github.com/huggingface/transformers.git (
|
| 4173 |
-
|
| 4174 |
-
|
| 4175 |
-
|
| 4176 |
-
|
| 4177 |
-
|
| 4178 |
-
|
| 4179 |
-
|
| 4180 |
-
|
| 4181 |
-
|
| 4182 |
-
|
| 4183 |
-
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4184 |
-
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4185 |
-
Downloading hf-xet (3.0MiB)
|
| 4186 |
-
Downloading triton (148.4MiB)
|
| 4187 |
-
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4188 |
-
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4189 |
-
Downloading tokenizers (3.1MiB)
|
| 4190 |
-
Downloading matplotlib (8.3MiB)
|
| 4191 |
-
Downloading sympy (6.0MiB)
|
| 4192 |
-
Downloading pillow (6.3MiB)
|
| 4193 |
-
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4194 |
-
Downloading pygments (1.2MiB)
|
| 4195 |
-
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4196 |
-
Downloading numpy (15.9MiB)
|
| 4197 |
-
Downloading torch (846.8MiB)
|
| 4198 |
-
Downloading fonttools (4.7MiB)
|
| 4199 |
-
Downloading nvidia-cufile-cu12
|
| 4200 |
-
Downloading kiwisolver
|
| 4201 |
-
Downloading pygments
|
| 4202 |
-
Downloading tokenizers
|
| 4203 |
-
Downloading hf-xet
|
| 4204 |
-
Downloading networkx
|
| 4205 |
-
Downloading fonttools
|
| 4206 |
-
Downloading pillow
|
| 4207 |
-
Downloading matplotlib
|
| 4208 |
-
Downloading nvidia-cuda-cupti-cu12
|
| 4209 |
-
Downloading numpy
|
| 4210 |
-
Downloading sympy
|
| 4211 |
-
Built transformers @ git+https://github.com/huggingface/transformers.git@7258ea44bc0c0a425a468f66f8559d1de8c4126d
|
| 4212 |
-
Downloading nvidia-nvjitlink-cu12
|
| 4213 |
-
Downloading jedi
|
| 4214 |
-
Downloading nvidia-curand-cu12
|
| 4215 |
-
Downloading nvidia-cuda-nvrtc-cu12
|
| 4216 |
-
Downloading triton
|
| 4217 |
-
Downloading nvidia-cufft-cu12
|
| 4218 |
-
Downloading nvidia-cusolver-cu12
|
| 4219 |
-
Downloading nvidia-cusparse-cu12
|
| 4220 |
-
Downloading nvidia-cusparselt-cu12
|
| 4221 |
-
Downloading nvidia-nccl-cu12
|
| 4222 |
-
Downloading nvidia-cublas-cu12
|
| 4223 |
-
Downloading nvidia-cudnn-cu12
|
| 4224 |
-
Downloading torch
|
| 4225 |
-
Installed 69 packages in 579ms
|
| 4226 |
-
</div>
|
| 4227 |
</div>
|
| 4228 |
-
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4229 |
-
Fetching 3 files: 33%|███▎ | 1/3 [00:07<00:15, 7.84s/it]
|
| 4230 |
-
Fetching 3 files: 67%|██████▋ | 2/3 [00:08<00:03, 3.40s/it]
|
| 4231 |
-
Fetching 3 files: 100%|██████████| 3/3 [00:08<00:00, 2.71s/it]
|
| 4232 |
-
|
| 4233 |
-
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4234 |
-
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.34s/it]
|
| 4235 |
-
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.25s/it]
|
| 4236 |
-
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.80s/it]
|
| 4237 |
-
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.93s/it]
|
| 4238 |
-
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
|
| 4239 |
-
Traceback (most recent call last):
|
| 4240 |
-
File "/repo/moe_benchmarks/megablocks/.uvnote/cells/forward_and_backward_no_kernel.py", line 154, in <module>
|
| 4241 |
-
loss.backward()
|
| 4242 |
-
~~~~~~~~~~~~~^^
|
| 4243 |
-
File "/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/_tensor.py", line 647, in backward
|
| 4244 |
-
torch.autograd.backward(
|
| 4245 |
-
~~~~~~~~~~~~~~~~~~~~~~~^
|
| 4246 |
-
self, gradient, retain_graph, create_graph, inputs=inputs
|
| 4247 |
-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 4248 |
-
)
|
| 4249 |
-
^
|
| 4250 |
-
File "/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/autograd/__init__.py", line 354, in backward
|
| 4251 |
-
_engine_run_backward(
|
| 4252 |
-
~~~~~~~~~~~~~~~~~~~~^
|
| 4253 |
-
tensors,
|
| 4254 |
-
^^^^^^^^
|
| 4255 |
-
...<5 lines>...
|
| 4256 |
-
accumulate_grad=True,
|
| 4257 |
-
^^^^^^^^^^^^^^^^^^^^^
|
| 4258 |
-
)
|
| 4259 |
-
^
|
| 4260 |
-
File "/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/autograd/graph.py", line 829, in _engine_run_backward
|
| 4261 |
-
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
|
| 4262 |
-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 4263 |
-
t_outputs, *args, **kwargs
|
| 4264 |
-
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 4265 |
-
) # Calls into the C++ engine to run the backward pass
|
| 4266 |
-
^
|
| 4267 |
-
File "/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/autograd/function.py", line 311, in apply
|
| 4268 |
-
return user_fn(self, *args)
|
| 4269 |
-
File "/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/utils/checkpoint.py", line 319, in backward
|
| 4270 |
-
torch.autograd.backward(outputs_with_grad, args_with_grad)
|
| 4271 |
-
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 4272 |
-
File "/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/autograd/__init__.py", line 354, in backward
|
| 4273 |
-
_engine_run_backward(
|
| 4274 |
-
~~~~~~~~~~~~~~~~~~~~^
|
| 4275 |
-
tensors,
|
| 4276 |
-
^^^^^^^^
|
| 4277 |
-
...<5 lines>...
|
| 4278 |
-
accumulate_grad=True,
|
| 4279 |
-
^^^^^^^^^^^^^^^^^^^^^
|
| 4280 |
-
)
|
| 4281 |
-
^
|
| 4282 |
-
File "/tmp/uvnote-run-yr7p57do/home/.cache/uv/environments-v2/forward-and-backward-no-kernel-349948fac2e1b63b/lib/python3.13/site-packages/torch/autograd/graph.py", line 829, in _engine_run_backward
|
| 4283 |
-
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
|
| 4284 |
-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 4285 |
-
t_outputs, *args, **kwargs
|
| 4286 |
-
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 4287 |
-
) # Calls into the C++ engine to run the backward pass
|
| 4288 |
-
^
|
| 4289 |
-
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB. GPU 2 has a total capacity of 22.30 GiB of which 118.69 MiB is free. Process 25557 has 22.18 GiB memory in use. Of the allocated memory 21.52 GiB is allocated by PyTorch, and 357.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)</div>
|
| 4290 |
</div>
|
| 4291 |
</div>
|
| 4292 |
|
|
|
|
|
|
|
| 4293 |
<h1>Kernels</h1>
|
| 4294 |
<p>Next we can run with Megablocks kernels enabled.</p>
|
| 4295 |
<h3>Forward</h3>
|
|
|
|
| 3718 |
<h1>No Kernels</h1>
|
| 3719 |
<p>First, we run the model without any custom kernels to get a reference point.</p>
|
| 3720 |
<h2>Forward</h2>
|
| 3721 |
+
<div class="cell cell-failed" id="cell-no_kernels">
|
|
|
|
|
|
|
| 3722 |
<div class="cell-header">
|
| 3723 |
<span class="collapse-indicators">
|
| 3724 |
+
<span onclick="toggleCode('no_kernels')" style="cursor: pointer;">▼ code</span>
|
| 3725 |
+
<span onclick="toggleOutput('no_kernels')" style="cursor: pointer;">▼ output</span>
|
| 3726 |
+
<span id="uv-indicator-no_kernels" style="cursor: default; opacity: 0.3;">▶ uv-logs</span>
|
| 3727 |
</span> |
|
| 3728 |
+
Cell: no_kernels | 19.21s | FAILED
|
| 3729 |
+
| <button class="run-btn" onclick="runCell('no_kernels')">▶ run</button>
|
| 3730 |
+
<button class="copy-btn" onclick="copyCell('no_kernels')">Copy</button>
|
| 3731 |
+
<a href="cells/no_kernels.py" target="_blank" class="raw-btn">Raw</a>
|
| 3732 |
</div>
|
| 3733 |
+
<div id="code-no_kernels" class="cell-code" data-lines="98">
|
| 3734 |
<div class="highlight-with-lines">
|
| 3735 |
+
<div class="line-numbers" id="lines-no_kernels">
|
| 3736 |
+
<a class="line-number" data-cell="no_kernels" data-line="1" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 1, true);">1</a>
|
| 3737 |
+
<a class="line-number" data-cell="no_kernels" data-line="2" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 2, true);">2</a>
|
| 3738 |
+
<a class="line-number" data-cell="no_kernels" data-line="3" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 3, true);">3</a>
|
| 3739 |
+
<a class="line-number" data-cell="no_kernels" data-line="4" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 4, true);">4</a>
|
| 3740 |
+
<a class="line-number" data-cell="no_kernels" data-line="5" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 5, true);">5</a>
|
| 3741 |
+
<a class="line-number" data-cell="no_kernels" data-line="6" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 6, true);">6</a>
|
| 3742 |
+
<a class="line-number" data-cell="no_kernels" data-line="7" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 7, true);">7</a>
|
| 3743 |
+
<a class="line-number" data-cell="no_kernels" data-line="8" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 8, true);">8</a>
|
| 3744 |
+
<a class="line-number" data-cell="no_kernels" data-line="9" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 9, true);">9</a>
|
| 3745 |
+
<a class="line-number" data-cell="no_kernels" data-line="10" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 10, true);">10</a>
|
| 3746 |
+
<a class="line-number" data-cell="no_kernels" data-line="11" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 11, true);">11</a>
|
| 3747 |
+
<a class="line-number" data-cell="no_kernels" data-line="12" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 12, true);">12</a>
|
| 3748 |
+
<a class="line-number" data-cell="no_kernels" data-line="13" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 13, true);">13</a>
|
| 3749 |
+
<a class="line-number" data-cell="no_kernels" data-line="14" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 14, true);">14</a>
|
| 3750 |
+
<a class="line-number" data-cell="no_kernels" data-line="15" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 15, true);">15</a>
|
| 3751 |
+
<a class="line-number" data-cell="no_kernels" data-line="16" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 16, true);">16</a>
|
| 3752 |
+
<a class="line-number" data-cell="no_kernels" data-line="17" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 17, true);">17</a>
|
| 3753 |
+
<a class="line-number" data-cell="no_kernels" data-line="18" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 18, true);">18</a>
|
| 3754 |
+
<a class="line-number" data-cell="no_kernels" data-line="19" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 19, true);">19</a>
|
| 3755 |
+
<a class="line-number" data-cell="no_kernels" data-line="20" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 20, true);">20</a>
|
| 3756 |
+
<a class="line-number" data-cell="no_kernels" data-line="21" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 21, true);">21</a>
|
| 3757 |
+
<a class="line-number" data-cell="no_kernels" data-line="22" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 22, true);">22</a>
|
| 3758 |
+
<a class="line-number" data-cell="no_kernels" data-line="23" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 23, true);">23</a>
|
| 3759 |
+
<a class="line-number" data-cell="no_kernels" data-line="24" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 24, true);">24</a>
|
| 3760 |
+
<a class="line-number" data-cell="no_kernels" data-line="25" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 25, true);">25</a>
|
| 3761 |
+
<a class="line-number" data-cell="no_kernels" data-line="26" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 26, true);">26</a>
|
| 3762 |
+
<a class="line-number" data-cell="no_kernels" data-line="27" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 27, true);">27</a>
|
| 3763 |
+
<a class="line-number" data-cell="no_kernels" data-line="28" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 28, true);">28</a>
|
| 3764 |
+
<a class="line-number" data-cell="no_kernels" data-line="29" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 29, true);">29</a>
|
| 3765 |
+
<a class="line-number" data-cell="no_kernels" data-line="30" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 30, true);">30</a>
|
| 3766 |
+
<a class="line-number" data-cell="no_kernels" data-line="31" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 31, true);">31</a>
|
| 3767 |
+
<a class="line-number" data-cell="no_kernels" data-line="32" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 32, true);">32</a>
|
| 3768 |
+
<a class="line-number" data-cell="no_kernels" data-line="33" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 33, true);">33</a>
|
| 3769 |
+
<a class="line-number" data-cell="no_kernels" data-line="34" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 34, true);">34</a>
|
| 3770 |
+
<a class="line-number" data-cell="no_kernels" data-line="35" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 35, true);">35</a>
|
| 3771 |
+
<a class="line-number" data-cell="no_kernels" data-line="36" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 36, true);">36</a>
|
| 3772 |
+
<a class="line-number" data-cell="no_kernels" data-line="37" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 37, true);">37</a>
|
| 3773 |
+
<a class="line-number" data-cell="no_kernels" data-line="38" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 38, true);">38</a>
|
| 3774 |
+
<a class="line-number" data-cell="no_kernels" data-line="39" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 39, true);">39</a>
|
| 3775 |
+
<a class="line-number" data-cell="no_kernels" data-line="40" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 40, true);">40</a>
|
| 3776 |
+
<a class="line-number" data-cell="no_kernels" data-line="41" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 41, true);">41</a>
|
| 3777 |
+
<a class="line-number" data-cell="no_kernels" data-line="42" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 42, true);">42</a>
|
| 3778 |
+
<a class="line-number" data-cell="no_kernels" data-line="43" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 43, true);">43</a>
|
| 3779 |
+
<a class="line-number" data-cell="no_kernels" data-line="44" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 44, true);">44</a>
|
| 3780 |
+
<a class="line-number" data-cell="no_kernels" data-line="45" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 45, true);">45</a>
|
| 3781 |
+
<a class="line-number" data-cell="no_kernels" data-line="46" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 46, true);">46</a>
|
| 3782 |
+
<a class="line-number" data-cell="no_kernels" data-line="47" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 47, true);">47</a>
|
| 3783 |
+
<a class="line-number" data-cell="no_kernels" data-line="48" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 48, true);">48</a>
|
| 3784 |
+
<a class="line-number" data-cell="no_kernels" data-line="49" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 49, true);">49</a>
|
| 3785 |
+
<a class="line-number" data-cell="no_kernels" data-line="50" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 50, true);">50</a>
|
| 3786 |
+
<a class="line-number" data-cell="no_kernels" data-line="51" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 51, true);">51</a>
|
| 3787 |
+
<a class="line-number" data-cell="no_kernels" data-line="52" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 52, true);">52</a>
|
| 3788 |
+
<a class="line-number" data-cell="no_kernels" data-line="53" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 53, true);">53</a>
|
| 3789 |
+
<a class="line-number" data-cell="no_kernels" data-line="54" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 54, true);">54</a>
|
| 3790 |
+
<a class="line-number" data-cell="no_kernels" data-line="55" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 55, true);">55</a>
|
| 3791 |
+
<a class="line-number" data-cell="no_kernels" data-line="56" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 56, true);">56</a>
|
| 3792 |
+
<a class="line-number" data-cell="no_kernels" data-line="57" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 57, true);">57</a>
|
| 3793 |
+
<a class="line-number" data-cell="no_kernels" data-line="58" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 58, true);">58</a>
|
| 3794 |
+
<a class="line-number" data-cell="no_kernels" data-line="59" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 59, true);">59</a>
|
| 3795 |
+
<a class="line-number" data-cell="no_kernels" data-line="60" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 60, true);">60</a>
|
| 3796 |
+
<a class="line-number" data-cell="no_kernels" data-line="61" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 61, true);">61</a>
|
| 3797 |
+
<a class="line-number" data-cell="no_kernels" data-line="62" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 62, true);">62</a>
|
| 3798 |
+
<a class="line-number" data-cell="no_kernels" data-line="63" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 63, true);">63</a>
|
| 3799 |
+
<a class="line-number" data-cell="no_kernels" data-line="64" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 64, true);">64</a>
|
| 3800 |
+
<a class="line-number" data-cell="no_kernels" data-line="65" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 65, true);">65</a>
|
| 3801 |
+
<a class="line-number" data-cell="no_kernels" data-line="66" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 66, true);">66</a>
|
| 3802 |
+
<a class="line-number" data-cell="no_kernels" data-line="67" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 67, true);">67</a>
|
| 3803 |
+
<a class="line-number" data-cell="no_kernels" data-line="68" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 68, true);">68</a>
|
| 3804 |
+
<a class="line-number" data-cell="no_kernels" data-line="69" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 69, true);">69</a>
|
| 3805 |
+
<a class="line-number" data-cell="no_kernels" data-line="70" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 70, true);">70</a>
|
| 3806 |
+
<a class="line-number" data-cell="no_kernels" data-line="71" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 71, true);">71</a>
|
| 3807 |
+
<a class="line-number" data-cell="no_kernels" data-line="72" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 72, true);">72</a>
|
| 3808 |
+
<a class="line-number" data-cell="no_kernels" data-line="73" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 73, true);">73</a>
|
| 3809 |
+
<a class="line-number" data-cell="no_kernels" data-line="74" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 74, true);">74</a>
|
| 3810 |
+
<a class="line-number" data-cell="no_kernels" data-line="75" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 75, true);">75</a>
|
| 3811 |
+
<a class="line-number" data-cell="no_kernels" data-line="76" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 76, true);">76</a>
|
| 3812 |
+
<a class="line-number" data-cell="no_kernels" data-line="77" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 77, true);">77</a>
|
| 3813 |
+
<a class="line-number" data-cell="no_kernels" data-line="78" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 78, true);">78</a>
|
| 3814 |
+
<a class="line-number" data-cell="no_kernels" data-line="79" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 79, true);">79</a>
|
| 3815 |
+
<a class="line-number" data-cell="no_kernels" data-line="80" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 80, true);">80</a>
|
| 3816 |
+
<a class="line-number" data-cell="no_kernels" data-line="81" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 81, true);">81</a>
|
| 3817 |
+
<a class="line-number" data-cell="no_kernels" data-line="82" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 82, true);">82</a>
|
| 3818 |
+
<a class="line-number" data-cell="no_kernels" data-line="83" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 83, true);">83</a>
|
| 3819 |
+
<a class="line-number" data-cell="no_kernels" data-line="84" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 84, true);">84</a>
|
| 3820 |
+
<a class="line-number" data-cell="no_kernels" data-line="85" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 85, true);">85</a>
|
| 3821 |
+
<a class="line-number" data-cell="no_kernels" data-line="86" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 86, true);">86</a>
|
| 3822 |
+
<a class="line-number" data-cell="no_kernels" data-line="87" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 87, true);">87</a>
|
| 3823 |
+
<a class="line-number" data-cell="no_kernels" data-line="88" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 88, true);">88</a>
|
| 3824 |
+
<a class="line-number" data-cell="no_kernels" data-line="89" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 89, true);">89</a>
|
| 3825 |
+
<a class="line-number" data-cell="no_kernels" data-line="90" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 90, true);">90</a>
|
| 3826 |
+
<a class="line-number" data-cell="no_kernels" data-line="91" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 91, true);">91</a>
|
| 3827 |
+
<a class="line-number" data-cell="no_kernels" data-line="92" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 92, true);">92</a>
|
| 3828 |
+
<a class="line-number" data-cell="no_kernels" data-line="93" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 93, true);">93</a>
|
| 3829 |
+
<a class="line-number" data-cell="no_kernels" data-line="94" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 94, true);">94</a>
|
| 3830 |
+
<a class="line-number" data-cell="no_kernels" data-line="95" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 95, true);">95</a>
|
| 3831 |
+
<a class="line-number" data-cell="no_kernels" data-line="96" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 96, true);">96</a>
|
| 3832 |
+
<a class="line-number" data-cell="no_kernels" data-line="97" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 97, true);">97</a>
|
| 3833 |
+
<a class="line-number" data-cell="no_kernels" data-line="98" href="#cell-no_kernels" onclick="event.preventDefault(); selectCellLine('no_kernels', 98, true);">98</a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3834 |
</div>
|
| 3835 |
<div class="code-wrap">
|
| 3836 |
<div class="highlight"><pre><span></span><span class="c1"># /// script</span>
|
|
|
|
| 3857 |
<span class="kn">import</span><span class="w"> </span><span class="nn">logging</span>
|
| 3858 |
<span class="kn">from</span><span class="w"> </span><span class="nn">transformers.models.gpt_oss.modeling_gpt_oss</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssRMSNorm</span>
|
| 3859 |
|
|
|
|
|
|
|
|
|
|
| 3860 |
<span class="c1"># set to debug logging</span>
|
| 3861 |
<span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
|
| 3862 |
|
|
|
|
| 3895 |
<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">PreTrainedTokenizerFast</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">model_id</span><span class="p">)</span>
|
| 3896 |
<span class="n">quantization_config</span> <span class="o">=</span> <span class="n">Mxfp4Config</span><span class="p">(</span><span class="n">dequantize</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
| 3897 |
|
| 3898 |
+
|
| 3899 |
+
|
| 3900 |
<span class="n">model</span> <span class="o">=</span> <span class="n">GptOssForCausalLM</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
|
| 3901 |
<span class="n">model_id</span><span class="p">,</span>
|
| 3902 |
<span class="n">dtype</span><span class="o">=</span><span class="s2">"bfloat16"</span><span class="p">,</span>
|
|
|
|
| 3917 |
<span class="n">reasoning_effort</span><span class="o">=</span><span class="s2">"low"</span><span class="p">,</span>
|
| 3918 |
<span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s2">"cuda"</span><span class="p">)</span>
|
| 3919 |
|
| 3920 |
+
<span class="n">max_tokens</span> <span class="o">=</span> <span class="mi">256</span>
|
| 3921 |
|
| 3922 |
+
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">inference_mode</span><span class="p">():</span>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3923 |
<span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
|
| 3924 |
<span class="n">generated</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span>
|
| 3925 |
<span class="o">**</span><span class="n">inputs</span><span class="p">,</span>
|
|
|
|
| 3928 |
<span class="n">temperature</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
| 3929 |
<span class="p">)</span>
|
| 3930 |
<span class="n">end_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
|
| 3931 |
+
|
| 3932 |
+
<span class="nb">print</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">generated</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">skip_special_tokens</span><span class="o">=</span><span class="kc">False</span><span class="p">))</span>
|
| 3933 |
+
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Generation took </span><span class="si">{</span><span class="n">end_time</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">start_time</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2"> seconds"</span><span class="p">)</span>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3934 |
</pre></div>
|
| 3935 |
|
| 3936 |
+
<div class="code-line-highlight" id="line-highlight-no_kernels"></div>
|
| 3937 |
</div>
|
| 3938 |
</div>
|
| 3939 |
</div>
|
| 3940 |
+
<div id="output-no_kernels" class="cell-output">
|
| 3941 |
+
<div class="cell-stderr">Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3942 |
Downloading cpython-3.13.7-linux-x86_64-gnu (download)
|
| 3943 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 3944 |
+
Updated https://github.com/huggingface/transformers.git (e691f84412563b6abca098f3e044980725d8daa3)
|
| 3945 |
+
× No solution found when resolving script dependencies:
|
| 3946 |
+
╰─▶ Because only transformers==4.57.0.dev0 is available and
|
| 3947 |
+
transformers==4.57.0.dev0 depends on huggingface-hub==1.0.0rc1,
|
| 3948 |
+
we can conclude that all versions of transformers depend on
|
| 3949 |
+
huggingface-hub==1.0.0rc1.
|
| 3950 |
+
And because kernels==0.10.0 depends on huggingface-hub>=0.26.0,<1.0,
|
| 3951 |
+
we can conclude that kernels==0.10.0 and all versions of transformers
|
| 3952 |
+
are incompatible.
|
| 3953 |
+
And because you require kernels==0.10.0 and transformers, we can
|
| 3954 |
+
conclude that your requirements are unsatisfiable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3955 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3956 |
</div>
|
| 3957 |
</div>
|
| 3958 |
|
| 3959 |
+
<h2>Forward and Backward</h2>
|
| 3960 |
+
<p>Next, we'll attempt to run a forward and backward pass without any custom kernels. This will likely run out of memory since the default implementation is not optimized for memory usage.</p>
|
| 3961 |
<h1>Kernels</h1>
|
| 3962 |
<p>Next we can run with Megablocks kernels enabled.</p>
|
| 3963 |
<h3>Forward</h3>
|
megablocks_yamoe/artifacts/binned_run/binned_results.json
CHANGED
|
@@ -9,16 +9,16 @@
|
|
| 9 |
"vary_inputs": true
|
| 10 |
},
|
| 11 |
"stats": {
|
| 12 |
-
"avg_ms": 36.
|
| 13 |
-
"min_ms":
|
| 14 |
-
"max_ms":
|
| 15 |
-
"std_ms": 1.
|
| 16 |
-
"p50_ms": 36.
|
| 17 |
-
"p95_ms":
|
| 18 |
-
"p99_ms":
|
| 19 |
"num_iters": 50,
|
| 20 |
-
"tokens_per_s":
|
| 21 |
-
"throughput_variance":
|
| 22 |
},
|
| 23 |
"output_sum": 3.97190523147583
|
| 24 |
}
|
|
|
|
| 9 |
"vary_inputs": true
|
| 10 |
},
|
| 11 |
"stats": {
|
| 12 |
+
"avg_ms": 36.26809924006011,
|
| 13 |
+
"min_ms": 34.103908000361116,
|
| 14 |
+
"max_ms": 37.68557000057626,
|
| 15 |
+
"std_ms": 1.1598518125118418,
|
| 16 |
+
"p50_ms": 36.52223600056459,
|
| 17 |
+
"p95_ms": 37.6427445000445,
|
| 18 |
+
"p99_ms": 37.677440410316194,
|
| 19 |
"num_iters": 50,
|
| 20 |
+
"tokens_per_s": 2757.2440269917565,
|
| 21 |
+
"throughput_variance": 89.13103199163609
|
| 22 |
},
|
| 23 |
"output_sum": 3.97190523147583
|
| 24 |
}
|
megablocks_yamoe/artifacts/gptoss_run/gptoss_results.json
CHANGED
|
@@ -9,16 +9,16 @@
|
|
| 9 |
"vary_inputs": true
|
| 10 |
},
|
| 11 |
"stats": {
|
| 12 |
-
"avg_ms":
|
| 13 |
-
"min_ms": 40.
|
| 14 |
-
"max_ms":
|
| 15 |
-
"std_ms":
|
| 16 |
-
"p50_ms":
|
| 17 |
-
"p95_ms":
|
| 18 |
-
"p99_ms":
|
| 19 |
"num_iters": 50,
|
| 20 |
-
"tokens_per_s":
|
| 21 |
-
"throughput_variance":
|
| 22 |
},
|
| 23 |
"output_sum": 11.53223705291748
|
| 24 |
}
|
|
|
|
| 9 |
"vary_inputs": true
|
| 10 |
},
|
| 11 |
"stats": {
|
| 12 |
+
"avg_ms": 46.913985819956,
|
| 13 |
+
"min_ms": 40.44806400088419,
|
| 14 |
+
"max_ms": 51.07520399997156,
|
| 15 |
+
"std_ms": 2.9921332618008196,
|
| 16 |
+
"p50_ms": 47.418902999652346,
|
| 17 |
+
"p95_ms": 50.800493049837314,
|
| 18 |
+
"p99_ms": 50.948625239852845,
|
| 19 |
"num_iters": 50,
|
| 20 |
+
"tokens_per_s": 2131.560519794133,
|
| 21 |
+
"throughput_variance": 139.93911554997217
|
| 22 |
},
|
| 23 |
"output_sum": 11.53223705291748
|
| 24 |
}
|
megablocks_yamoe/artifacts/gptoss_training_run/gptoss_training_results.json
CHANGED
|
@@ -9,16 +9,16 @@
|
|
| 9 |
"vary_inputs": true
|
| 10 |
},
|
| 11 |
"stats": {
|
| 12 |
-
"avg_ms": 46.
|
| 13 |
-
"min_ms":
|
| 14 |
-
"max_ms":
|
| 15 |
-
"std_ms": 2.
|
| 16 |
-
"p50_ms":
|
| 17 |
-
"p95_ms":
|
| 18 |
-
"p99_ms":
|
| 19 |
"num_iters": 50,
|
| 20 |
-
"tokens_per_s":
|
| 21 |
-
"throughput_variance":
|
| 22 |
},
|
| 23 |
"output_sum": 11.53223705291748
|
| 24 |
}
|
|
|
|
| 9 |
"vary_inputs": true
|
| 10 |
},
|
| 11 |
"stats": {
|
| 12 |
+
"avg_ms": 46.289439859992854,
|
| 13 |
+
"min_ms": 39.97907499979192,
|
| 14 |
+
"max_ms": 50.58144600025116,
|
| 15 |
+
"std_ms": 2.9172154402078077,
|
| 16 |
+
"p50_ms": 46.64785849990949,
|
| 17 |
+
"p95_ms": 50.26727430031315,
|
| 18 |
+
"p99_ms": 50.5162941305025,
|
| 19 |
"num_iters": 50,
|
| 20 |
+
"tokens_per_s": 2160.3199412751637,
|
| 21 |
+
"throughput_variance": 139.86427060112865
|
| 22 |
},
|
| 23 |
"output_sum": 11.53223705291748
|
| 24 |
}
|
megablocks_yamoe/artifacts/yamoe_run/yamoe_results.json
CHANGED
|
@@ -9,16 +9,16 @@
|
|
| 9 |
"vary_inputs": true
|
| 10 |
},
|
| 11 |
"stats": {
|
| 12 |
-
"avg_ms": 4.
|
| 13 |
-
"min_ms": 4.
|
| 14 |
-
"max_ms": 4.
|
| 15 |
-
"std_ms": 0.
|
| 16 |
-
"p50_ms": 4.
|
| 17 |
-
"p95_ms": 4.
|
| 18 |
-
"p99_ms": 4.
|
| 19 |
"num_iters": 50,
|
| 20 |
-
"tokens_per_s":
|
| 21 |
-
"throughput_variance":
|
| 22 |
},
|
| 23 |
"output_sum": 3.97190523147583
|
| 24 |
}
|
|
|
|
| 9 |
"vary_inputs": true
|
| 10 |
},
|
| 11 |
"stats": {
|
| 12 |
+
"avg_ms": 4.248197240067384,
|
| 13 |
+
"min_ms": 4.136622000260104,
|
| 14 |
+
"max_ms": 4.280714999367774,
|
| 15 |
+
"std_ms": 0.02141682051311511,
|
| 16 |
+
"p50_ms": 4.253484999935608,
|
| 17 |
+
"p95_ms": 4.265540049709671,
|
| 18 |
+
"p99_ms": 4.273649199667489,
|
| 19 |
"num_iters": 50,
|
| 20 |
+
"tokens_per_s": 23539.396677922097,
|
| 21 |
+
"throughput_variance": 120.66648678204231
|
| 22 |
},
|
| 23 |
"output_sum": 3.97190523147583
|
| 24 |
}
|
megablocks_yamoe/cells/__pycache__/bench_utils.cpython-311.pyc
CHANGED
|
Binary files a/megablocks_yamoe/cells/__pycache__/bench_utils.cpython-311.pyc and b/megablocks_yamoe/cells/__pycache__/bench_utils.cpython-311.pyc differ
|
|
|
megablocks_yamoe/cells/__pycache__/config.cpython-311.pyc
CHANGED
|
Binary files a/megablocks_yamoe/cells/__pycache__/config.cpython-311.pyc and b/megablocks_yamoe/cells/__pycache__/config.cpython-311.pyc differ
|
|
|
megablocks_yamoe/megablocks_yamoe.html
CHANGED
|
@@ -3715,84 +3715,17 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
|
|
| 3715 |
</div>
|
| 3716 |
|
| 3717 |
<div class="main-content">
|
| 3718 |
-
<
|
| 3719 |
-
<div class="cell-header">
|
| 3720 |
-
<span class="collapse-indicators">
|
| 3721 |
-
<span onclick="toggleCode('nv')" style="cursor: pointer;">▼ code</span>
|
| 3722 |
-
<span onclick="toggleOutput('nv')" style="cursor: pointer;">▼ output</span>
|
| 3723 |
-
<span id="uv-indicator-nv" style="cursor: default; opacity: 0.3;">▶ uv-logs</span>
|
| 3724 |
-
</span> |
|
| 3725 |
-
Cell: nv | 0.55s
|
| 3726 |
-
| <button class="run-btn" onclick="runCell('nv')">▶ run</button>
|
| 3727 |
-
<button class="copy-btn" onclick="copyCell('nv')">Copy</button>
|
| 3728 |
-
<a href="cells/nv.py" target="_blank" class="raw-btn">Raw</a>
|
| 3729 |
-
</div>
|
| 3730 |
-
<div id="code-nv" class="cell-code" data-lines="3">
|
| 3731 |
-
<div class="highlight-with-lines">
|
| 3732 |
-
<div class="line-numbers" id="lines-nv">
|
| 3733 |
-
<a class="line-number" data-cell="nv" data-line="1" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 1, true);">1</a>
|
| 3734 |
-
<a class="line-number" data-cell="nv" data-line="2" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 2, true);">2</a>
|
| 3735 |
-
<a class="line-number" data-cell="nv" data-line="3" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 3, true);">3</a>
|
| 3736 |
-
</div>
|
| 3737 |
-
<div class="code-wrap">
|
| 3738 |
-
<div class="highlight"><pre><span></span><span class="kn">import</span><span class="w"> </span><span class="nn">subprocess</span>
|
| 3739 |
-
|
| 3740 |
-
<span class="nb">print</span><span class="p">(</span><span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">"nvidia-smi"</span><span class="p">],</span> <span class="n">capture_output</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">stdout</span><span class="p">)</span>
|
| 3741 |
-
</pre></div>
|
| 3742 |
-
|
| 3743 |
-
<div class="code-line-highlight" id="line-highlight-nv"></div>
|
| 3744 |
-
</div>
|
| 3745 |
-
</div>
|
| 3746 |
-
</div>
|
| 3747 |
-
<div id="output-nv" class="cell-output">
|
| 3748 |
-
<div class="cell-stdout">Wed Sep 24 22:04:34 2025
|
| 3749 |
-
+-----------------------------------------------------------------------------------------+
|
| 3750 |
-
| NVIDIA-SMI 570.172.08 Driver Version: 570.172.08 CUDA Version: 12.8 |
|
| 3751 |
-
|-----------------------------------------+------------------------+----------------------+
|
| 3752 |
-
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
| 3753 |
-
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
| 3754 |
-
| | | MIG M. |
|
| 3755 |
-
|=========================================+========================+======================|
|
| 3756 |
-
| 0 NVIDIA A10G On | 00000000:00:1B.0 Off | 0 |
|
| 3757 |
-
| 0% 36C P0 45W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3758 |
-
| | | N/A |
|
| 3759 |
-
+-----------------------------------------+------------------------+----------------------+
|
| 3760 |
-
| 1 NVIDIA A10G On | 00000000:00:1C.0 Off | 0 |
|
| 3761 |
-
| 0% 37C P0 47W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3762 |
-
| | | N/A |
|
| 3763 |
-
+-----------------------------------------+------------------------+----------------------+
|
| 3764 |
-
| 2 NVIDIA A10G On | 00000000:00:1D.0 Off | 0 |
|
| 3765 |
-
| 0% 35C P0 47W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3766 |
-
| | | N/A |
|
| 3767 |
-
+-----------------------------------------+------------------------+----------------------+
|
| 3768 |
-
| 3 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
|
| 3769 |
-
| 0% 37C P0 44W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3770 |
-
| | | N/A |
|
| 3771 |
-
+-----------------------------------------+------------------------+----------------------+
|
| 3772 |
-
|
| 3773 |
-
+-----------------------------------------------------------------------------------------+
|
| 3774 |
-
| Processes: |
|
| 3775 |
-
| GPU GI CI PID Type Process name GPU Memory |
|
| 3776 |
-
| ID ID Usage |
|
| 3777 |
-
|=========================================================================================|
|
| 3778 |
-
| No running processes found |
|
| 3779 |
-
+-----------------------------------------------------------------------------------------+
|
| 3780 |
-
|
| 3781 |
-
</div>
|
| 3782 |
-
</div>
|
| 3783 |
-
</div>
|
| 3784 |
-
|
| 3785 |
-
<h1>Comparison of Megablocks and Yamoe Kernels</h1>
|
| 3786 |
<p>This note compares the performance of the Megablocks and Yamoe kernels on the GPT-OSS-20B model.</p>
|
| 3787 |
<h2>Megablocks kernel</h2>
|
| 3788 |
-
<div class="cell" id="cell-setup2">
|
| 3789 |
<div class="cell-header">
|
| 3790 |
<span class="collapse-indicators">
|
| 3791 |
<span onclick="toggleCode('setup2')" style="cursor: pointer;">▼ code</span>
|
| 3792 |
<span onclick="toggleOutput('setup2')" style="cursor: pointer;">▼ output</span>
|
| 3793 |
-
<span id="uv-indicator-setup2"
|
| 3794 |
</span> |
|
| 3795 |
-
Cell: setup2 |
|
| 3796 |
| <button class="run-btn" onclick="runCell('setup2')">▶ run</button>
|
| 3797 |
<button class="copy-btn" onclick="copyCell('setup2')">Copy</button>
|
| 3798 |
<a href="cells/setup2.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -4039,561 +3972,25 @@ Cell: setup2 | 114.03s
|
|
| 4039 |
</div>
|
| 4040 |
</div>
|
| 4041 |
<div id="output-setup2" class="cell-output">
|
| 4042 |
-
<div class="cell-
|
| 4043 |
-
Knowledge cutoff: 2024-06
|
| 4044 |
-
Current date: 2025-09-24
|
| 4045 |
-
|
| 4046 |
-
Reasoning: low
|
| 4047 |
-
|
| 4048 |
-
# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions
|
| 4049 |
-
|
| 4050 |
-
What is Tensor Parallelism?
|
| 4051 |
-
|
| 4052 |
-
<|end|><|start|>assistant<|channel|>analysis<|message|>We need to explain what Tensor Parallelism is. It's a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices. Provide details: how it works, benefits, challenges, typical frameworks, etc. Also mention difference from data parallelism, pipeline parallelism. Provide example: splitting a weight matrix across GPUs, each GPU holds a slice, compute partial results, then gather. Provide mention of communication overhead, scaling, etc. Also mention that it's used in large models like GPT-3, Megatron-LM, DeepSpeed. Provide references. Also mention that it's also called model parallelism. Provide explanation of how it works in practice: e.g., for a linear layer, weight matrix W of shape (out_features, in_features). In tensor parallelism, split W along out_features dimension across GPUs. Each GPU computes partial output. Then gather outputs. Provide details on how to handle bias, etc. Provide mention of "tensor model parallelism" vs "tensor parallelism" synonyms. Provide mention of "tensor parallelism" in Megatron-LM: splitting weight matrices across GPUs. Provide mention of "tensor parallelism" in DeepSpeed: "ZeRO-Offload" etc. Provide mention
|
| 4053 |
-
Generation took 31.36 seconds
|
| 4054 |
-
</div>
|
| 4055 |
-
<div class="uv-install-logs" id="uv-logs-setup2">
|
| 4056 |
-
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4057 |
-
<div class="uv-logs-content" style="display: none;">
|
| 4058 |
-
Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
|
| 4059 |
Downloading cpython-3.13.7-linux-x86_64-gnu (download)
|
| 4060 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4061 |
-
Updated https://github.com/huggingface/transformers.git (
|
| 4062 |
-
|
| 4063 |
-
|
| 4064 |
-
|
| 4065 |
-
|
| 4066 |
-
|
| 4067 |
-
|
| 4068 |
-
|
| 4069 |
-
|
| 4070 |
-
|
| 4071 |
-
|
| 4072 |
-
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4073 |
-
Downloading fonttools (4.7MiB)
|
| 4074 |
-
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4075 |
-
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4076 |
-
Downloading triton (148.4MiB)
|
| 4077 |
-
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4078 |
-
Downloading tokenizers (3.1MiB)
|
| 4079 |
-
Downloading kiwisolver (1.4MiB)
|
| 4080 |
-
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4081 |
-
Downloading pillow (6.3MiB)
|
| 4082 |
-
Downloading numpy (15.9MiB)
|
| 4083 |
-
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4084 |
-
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4085 |
-
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4086 |
-
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4087 |
-
Downloading torch (846.8MiB)
|
| 4088 |
-
Downloading nvidia-cufile-cu12
|
| 4089 |
-
Downloading kiwisolver
|
| 4090 |
-
Downloading pygments
|
| 4091 |
-
Downloading hf-xet
|
| 4092 |
-
Downloading tokenizers
|
| 4093 |
-
Downloading networkx
|
| 4094 |
-
Downloading fonttools
|
| 4095 |
-
Downloading pillow
|
| 4096 |
-
Downloading matplotlib
|
| 4097 |
-
Downloading nvidia-cuda-cupti-cu12
|
| 4098 |
-
Downloading numpy
|
| 4099 |
-
Downloading sympy
|
| 4100 |
-
Built transformers @ git+https://github.com/huggingface/transformers.git@7258ea44bc0c0a425a468f66f8559d1de8c4126d
|
| 4101 |
-
Downloading nvidia-nvjitlink-cu12
|
| 4102 |
-
Downloading jedi
|
| 4103 |
-
Downloading nvidia-curand-cu12
|
| 4104 |
-
Downloading nvidia-cuda-nvrtc-cu12
|
| 4105 |
-
Downloading triton
|
| 4106 |
-
Downloading nvidia-cufft-cu12
|
| 4107 |
-
Downloading nvidia-cusolver-cu12
|
| 4108 |
-
Downloading nvidia-cusparselt-cu12
|
| 4109 |
-
Downloading nvidia-cusparse-cu12
|
| 4110 |
-
Downloading nvidia-nccl-cu12
|
| 4111 |
-
Downloading nvidia-cublas-cu12
|
| 4112 |
-
Downloading nvidia-cudnn-cu12
|
| 4113 |
-
Downloading torch
|
| 4114 |
-
Installed 69 packages in 509ms
|
| 4115 |
-
</div>
|
| 4116 |
</div>
|
| 4117 |
-
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4118 |
-
Fetching 3 files: 33%|███▎ | 1/3 [00:06<00:12, 6.49s/it]
|
| 4119 |
-
Fetching 3 files: 67%|██████▋ | 2/3 [00:07<00:03, 3.44s/it]
|
| 4120 |
-
Fetching 3 files: 100%|██████████| 3/3 [00:07<00:00, 2.60s/it]
|
| 4121 |
-
You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
|
| 4122 |
-
|
| 4123 |
-
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4124 |
-
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.35s/it]
|
| 4125 |
-
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.25s/it]
|
| 4126 |
-
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.80s/it]
|
| 4127 |
-
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.93s/it]
|
| 4128 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4129 |
-
|
| 4130 |
-
Fetching 66 files: 0%| | 0/66 [00:00<?, ?it/s]
|
| 4131 |
-
Fetching 66 files: 2%|▏ | 1/66 [00:00<00:10, 6.31it/s]
|
| 4132 |
-
Fetching 66 files: 14%|█▎ | 9/66 [00:00<00:02, 26.39it/s]
|
| 4133 |
-
Fetching 66 files: 26%|██▌ | 17/66 [00:01<00:03, 12.42it/s]
|
| 4134 |
-
Fetching 66 files: 74%|███████▍ | 49/66 [00:01<00:00, 45.00it/s]
|
| 4135 |
-
Fetching 66 files: 91%|█████████ | 60/66 [00:01<00:00, 45.67it/s]
|
| 4136 |
-
Fetching 66 files: 100%|██████████| 66/66 [00:01<00:00, 34.31it/s]
|
| 4137 |
-
/tmp/uvnote-run-_uergc47/home/.cache/uv/environments-v2/setup2-adf2810b697d7b08/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4138 |
-
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4139 |
-
warnings.warn(
|
| 4140 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4141 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4142 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4143 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4144 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4145 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4146 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4147 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4148 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4149 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4150 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4151 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4152 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4153 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4154 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4155 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4156 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4157 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4158 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4159 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4160 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4161 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4162 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4163 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4164 |
-
/tmp/uvnote-run-_uergc47/home/.cache/uv/environments-v2/setup2-adf2810b697d7b08/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4165 |
-
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4166 |
-
warnings.warn(
|
| 4167 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4168 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4169 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4170 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4171 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4172 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4173 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4174 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4175 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4176 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4177 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4178 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4179 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4180 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4181 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4182 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4183 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4184 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4185 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4186 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4187 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4188 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4189 |
-
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`</div>
|
| 4190 |
</div>
|
| 4191 |
</div>
|
| 4192 |
|
| 4193 |
<h2>Yamoe Kernel</h2>
|
| 4194 |
-
<div class="cell" id="cell-setup">
|
| 4195 |
-
<div class="cell-header">
|
| 4196 |
-
<span class="collapse-indicators">
|
| 4197 |
-
<span onclick="toggleCode('setup')" style="cursor: pointer;">▼ code</span>
|
| 4198 |
-
<span onclick="toggleOutput('setup')" style="cursor: pointer;">▼ output</span>
|
| 4199 |
-
<span id="uv-indicator-setup" onclick="toggleUvLogsFromHeader('setup')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4200 |
-
</span> |
|
| 4201 |
-
Cell: setup | 109.23s
|
| 4202 |
-
| <button class="run-btn" onclick="runCell('setup')">▶ run</button>
|
| 4203 |
-
<button class="copy-btn" onclick="copyCell('setup')">Copy</button>
|
| 4204 |
-
<a href="cells/setup.py" target="_blank" class="raw-btn">Raw</a>
|
| 4205 |
-
</div>
|
| 4206 |
-
<div id="code-setup" class="cell-code" data-lines="116">
|
| 4207 |
-
<div class="highlight-with-lines">
|
| 4208 |
-
<div class="line-numbers" id="lines-setup">
|
| 4209 |
-
<a class="line-number" data-cell="setup" data-line="1" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 1, true);">1</a>
|
| 4210 |
-
<a class="line-number" data-cell="setup" data-line="2" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 2, true);">2</a>
|
| 4211 |
-
<a class="line-number" data-cell="setup" data-line="3" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 3, true);">3</a>
|
| 4212 |
-
<a class="line-number" data-cell="setup" data-line="4" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 4, true);">4</a>
|
| 4213 |
-
<a class="line-number" data-cell="setup" data-line="5" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 5, true);">5</a>
|
| 4214 |
-
<a class="line-number" data-cell="setup" data-line="6" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 6, true);">6</a>
|
| 4215 |
-
<a class="line-number" data-cell="setup" data-line="7" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 7, true);">7</a>
|
| 4216 |
-
<a class="line-number" data-cell="setup" data-line="8" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 8, true);">8</a>
|
| 4217 |
-
<a class="line-number" data-cell="setup" data-line="9" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 9, true);">9</a>
|
| 4218 |
-
<a class="line-number" data-cell="setup" data-line="10" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 10, true);">10</a>
|
| 4219 |
-
<a class="line-number" data-cell="setup" data-line="11" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 11, true);">11</a>
|
| 4220 |
-
<a class="line-number" data-cell="setup" data-line="12" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 12, true);">12</a>
|
| 4221 |
-
<a class="line-number" data-cell="setup" data-line="13" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 13, true);">13</a>
|
| 4222 |
-
<a class="line-number" data-cell="setup" data-line="14" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 14, true);">14</a>
|
| 4223 |
-
<a class="line-number" data-cell="setup" data-line="15" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 15, true);">15</a>
|
| 4224 |
-
<a class="line-number" data-cell="setup" data-line="16" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 16, true);">16</a>
|
| 4225 |
-
<a class="line-number" data-cell="setup" data-line="17" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 17, true);">17</a>
|
| 4226 |
-
<a class="line-number" data-cell="setup" data-line="18" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 18, true);">18</a>
|
| 4227 |
-
<a class="line-number" data-cell="setup" data-line="19" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 19, true);">19</a>
|
| 4228 |
-
<a class="line-number" data-cell="setup" data-line="20" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 20, true);">20</a>
|
| 4229 |
-
<a class="line-number" data-cell="setup" data-line="21" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 21, true);">21</a>
|
| 4230 |
-
<a class="line-number" data-cell="setup" data-line="22" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 22, true);">22</a>
|
| 4231 |
-
<a class="line-number" data-cell="setup" data-line="23" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 23, true);">23</a>
|
| 4232 |
-
<a class="line-number" data-cell="setup" data-line="24" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 24, true);">24</a>
|
| 4233 |
-
<a class="line-number" data-cell="setup" data-line="25" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 25, true);">25</a>
|
| 4234 |
-
<a class="line-number" data-cell="setup" data-line="26" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 26, true);">26</a>
|
| 4235 |
-
<a class="line-number" data-cell="setup" data-line="27" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 27, true);">27</a>
|
| 4236 |
-
<a class="line-number" data-cell="setup" data-line="28" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 28, true);">28</a>
|
| 4237 |
-
<a class="line-number" data-cell="setup" data-line="29" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 29, true);">29</a>
|
| 4238 |
-
<a class="line-number" data-cell="setup" data-line="30" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 30, true);">30</a>
|
| 4239 |
-
<a class="line-number" data-cell="setup" data-line="31" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 31, true);">31</a>
|
| 4240 |
-
<a class="line-number" data-cell="setup" data-line="32" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 32, true);">32</a>
|
| 4241 |
-
<a class="line-number" data-cell="setup" data-line="33" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 33, true);">33</a>
|
| 4242 |
-
<a class="line-number" data-cell="setup" data-line="34" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 34, true);">34</a>
|
| 4243 |
-
<a class="line-number" data-cell="setup" data-line="35" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 35, true);">35</a>
|
| 4244 |
-
<a class="line-number" data-cell="setup" data-line="36" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 36, true);">36</a>
|
| 4245 |
-
<a class="line-number" data-cell="setup" data-line="37" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 37, true);">37</a>
|
| 4246 |
-
<a class="line-number" data-cell="setup" data-line="38" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 38, true);">38</a>
|
| 4247 |
-
<a class="line-number" data-cell="setup" data-line="39" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 39, true);">39</a>
|
| 4248 |
-
<a class="line-number" data-cell="setup" data-line="40" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 40, true);">40</a>
|
| 4249 |
-
<a class="line-number" data-cell="setup" data-line="41" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 41, true);">41</a>
|
| 4250 |
-
<a class="line-number" data-cell="setup" data-line="42" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 42, true);">42</a>
|
| 4251 |
-
<a class="line-number" data-cell="setup" data-line="43" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 43, true);">43</a>
|
| 4252 |
-
<a class="line-number" data-cell="setup" data-line="44" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 44, true);">44</a>
|
| 4253 |
-
<a class="line-number" data-cell="setup" data-line="45" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 45, true);">45</a>
|
| 4254 |
-
<a class="line-number" data-cell="setup" data-line="46" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 46, true);">46</a>
|
| 4255 |
-
<a class="line-number" data-cell="setup" data-line="47" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 47, true);">47</a>
|
| 4256 |
-
<a class="line-number" data-cell="setup" data-line="48" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 48, true);">48</a>
|
| 4257 |
-
<a class="line-number" data-cell="setup" data-line="49" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 49, true);">49</a>
|
| 4258 |
-
<a class="line-number" data-cell="setup" data-line="50" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 50, true);">50</a>
|
| 4259 |
-
<a class="line-number" data-cell="setup" data-line="51" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 51, true);">51</a>
|
| 4260 |
-
<a class="line-number" data-cell="setup" data-line="52" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 52, true);">52</a>
|
| 4261 |
-
<a class="line-number" data-cell="setup" data-line="53" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 53, true);">53</a>
|
| 4262 |
-
<a class="line-number" data-cell="setup" data-line="54" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 54, true);">54</a>
|
| 4263 |
-
<a class="line-number" data-cell="setup" data-line="55" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 55, true);">55</a>
|
| 4264 |
-
<a class="line-number" data-cell="setup" data-line="56" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 56, true);">56</a>
|
| 4265 |
-
<a class="line-number" data-cell="setup" data-line="57" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 57, true);">57</a>
|
| 4266 |
-
<a class="line-number" data-cell="setup" data-line="58" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 58, true);">58</a>
|
| 4267 |
-
<a class="line-number" data-cell="setup" data-line="59" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 59, true);">59</a>
|
| 4268 |
-
<a class="line-number" data-cell="setup" data-line="60" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 60, true);">60</a>
|
| 4269 |
-
<a class="line-number" data-cell="setup" data-line="61" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 61, true);">61</a>
|
| 4270 |
-
<a class="line-number" data-cell="setup" data-line="62" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 62, true);">62</a>
|
| 4271 |
-
<a class="line-number" data-cell="setup" data-line="63" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 63, true);">63</a>
|
| 4272 |
-
<a class="line-number" data-cell="setup" data-line="64" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 64, true);">64</a>
|
| 4273 |
-
<a class="line-number" data-cell="setup" data-line="65" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 65, true);">65</a>
|
| 4274 |
-
<a class="line-number" data-cell="setup" data-line="66" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 66, true);">66</a>
|
| 4275 |
-
<a class="line-number" data-cell="setup" data-line="67" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 67, true);">67</a>
|
| 4276 |
-
<a class="line-number" data-cell="setup" data-line="68" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 68, true);">68</a>
|
| 4277 |
-
<a class="line-number" data-cell="setup" data-line="69" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 69, true);">69</a>
|
| 4278 |
-
<a class="line-number" data-cell="setup" data-line="70" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 70, true);">70</a>
|
| 4279 |
-
<a class="line-number" data-cell="setup" data-line="71" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 71, true);">71</a>
|
| 4280 |
-
<a class="line-number" data-cell="setup" data-line="72" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 72, true);">72</a>
|
| 4281 |
-
<a class="line-number" data-cell="setup" data-line="73" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 73, true);">73</a>
|
| 4282 |
-
<a class="line-number" data-cell="setup" data-line="74" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 74, true);">74</a>
|
| 4283 |
-
<a class="line-number" data-cell="setup" data-line="75" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 75, true);">75</a>
|
| 4284 |
-
<a class="line-number" data-cell="setup" data-line="76" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 76, true);">76</a>
|
| 4285 |
-
<a class="line-number" data-cell="setup" data-line="77" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 77, true);">77</a>
|
| 4286 |
-
<a class="line-number" data-cell="setup" data-line="78" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 78, true);">78</a>
|
| 4287 |
-
<a class="line-number" data-cell="setup" data-line="79" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 79, true);">79</a>
|
| 4288 |
-
<a class="line-number" data-cell="setup" data-line="80" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 80, true);">80</a>
|
| 4289 |
-
<a class="line-number" data-cell="setup" data-line="81" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 81, true);">81</a>
|
| 4290 |
-
<a class="line-number" data-cell="setup" data-line="82" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 82, true);">82</a>
|
| 4291 |
-
<a class="line-number" data-cell="setup" data-line="83" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 83, true);">83</a>
|
| 4292 |
-
<a class="line-number" data-cell="setup" data-line="84" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 84, true);">84</a>
|
| 4293 |
-
<a class="line-number" data-cell="setup" data-line="85" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 85, true);">85</a>
|
| 4294 |
-
<a class="line-number" data-cell="setup" data-line="86" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 86, true);">86</a>
|
| 4295 |
-
<a class="line-number" data-cell="setup" data-line="87" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 87, true);">87</a>
|
| 4296 |
-
<a class="line-number" data-cell="setup" data-line="88" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 88, true);">88</a>
|
| 4297 |
-
<a class="line-number" data-cell="setup" data-line="89" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 89, true);">89</a>
|
| 4298 |
-
<a class="line-number" data-cell="setup" data-line="90" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 90, true);">90</a>
|
| 4299 |
-
<a class="line-number" data-cell="setup" data-line="91" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 91, true);">91</a>
|
| 4300 |
-
<a class="line-number" data-cell="setup" data-line="92" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 92, true);">92</a>
|
| 4301 |
-
<a class="line-number" data-cell="setup" data-line="93" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 93, true);">93</a>
|
| 4302 |
-
<a class="line-number" data-cell="setup" data-line="94" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 94, true);">94</a>
|
| 4303 |
-
<a class="line-number" data-cell="setup" data-line="95" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 95, true);">95</a>
|
| 4304 |
-
<a class="line-number" data-cell="setup" data-line="96" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 96, true);">96</a>
|
| 4305 |
-
<a class="line-number" data-cell="setup" data-line="97" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 97, true);">97</a>
|
| 4306 |
-
<a class="line-number" data-cell="setup" data-line="98" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 98, true);">98</a>
|
| 4307 |
-
<a class="line-number" data-cell="setup" data-line="99" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 99, true);">99</a>
|
| 4308 |
-
<a class="line-number" data-cell="setup" data-line="100" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 100, true);">100</a>
|
| 4309 |
-
<a class="line-number" data-cell="setup" data-line="101" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 101, true);">101</a>
|
| 4310 |
-
<a class="line-number" data-cell="setup" data-line="102" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 102, true);">102</a>
|
| 4311 |
-
<a class="line-number" data-cell="setup" data-line="103" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 103, true);">103</a>
|
| 4312 |
-
<a class="line-number" data-cell="setup" data-line="104" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 104, true);">104</a>
|
| 4313 |
-
<a class="line-number" data-cell="setup" data-line="105" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 105, true);">105</a>
|
| 4314 |
-
<a class="line-number" data-cell="setup" data-line="106" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 106, true);">106</a>
|
| 4315 |
-
<a class="line-number" data-cell="setup" data-line="107" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 107, true);">107</a>
|
| 4316 |
-
<a class="line-number" data-cell="setup" data-line="108" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 108, true);">108</a>
|
| 4317 |
-
<a class="line-number" data-cell="setup" data-line="109" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 109, true);">109</a>
|
| 4318 |
-
<a class="line-number" data-cell="setup" data-line="110" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 110, true);">110</a>
|
| 4319 |
-
<a class="line-number" data-cell="setup" data-line="111" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 111, true);">111</a>
|
| 4320 |
-
<a class="line-number" data-cell="setup" data-line="112" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 112, true);">112</a>
|
| 4321 |
-
<a class="line-number" data-cell="setup" data-line="113" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 113, true);">113</a>
|
| 4322 |
-
<a class="line-number" data-cell="setup" data-line="114" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 114, true);">114</a>
|
| 4323 |
-
<a class="line-number" data-cell="setup" data-line="115" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 115, true);">115</a>
|
| 4324 |
-
<a class="line-number" data-cell="setup" data-line="116" href="#cell-setup" onclick="event.preventDefault(); selectCellLine('setup', 116, true);">116</a>
|
| 4325 |
-
</div>
|
| 4326 |
-
<div class="code-wrap">
|
| 4327 |
-
<div class="highlight"><pre><span></span><span class="c1"># /// script</span>
|
| 4328 |
-
<span class="c1"># requires-python = ">=3.12"</span>
|
| 4329 |
-
<span class="c1"># dependencies = [</span>
|
| 4330 |
-
<span class="c1"># "accelerate>=1.10.1",</span>
|
| 4331 |
-
<span class="c1"># "torch>=2.7.0",</span>
|
| 4332 |
-
<span class="c1"># "kernels==0.10.0",</span>
|
| 4333 |
-
<span class="c1"># "transformers@https://github.com/huggingface/transformers.git",</span>
|
| 4334 |
-
<span class="c1"># "ipdb>=0.13.13",</span>
|
| 4335 |
-
<span class="c1"># "matplotlib>=3.7.2",</span>
|
| 4336 |
-
<span class="c1"># "numpy>=1.24.3",</span>
|
| 4337 |
-
<span class="c1"># ]</span>
|
| 4338 |
-
<span class="c1"># ///</span>
|
| 4339 |
-
|
| 4340 |
-
<span class="kn">import</span><span class="w"> </span><span class="nn">torch</span>
|
| 4341 |
-
<span class="kn">from</span><span class="w"> </span><span class="nn">transformers</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssForCausalLM</span><span class="p">,</span> <span class="n">PreTrainedTokenizerFast</span><span class="p">,</span> <span class="n">Mxfp4Config</span>
|
| 4342 |
-
<span class="kn">import</span><span class="w"> </span><span class="nn">time</span>
|
| 4343 |
-
<span class="kn">import</span><span class="w"> </span><span class="nn">torch.nn</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">nn</span>
|
| 4344 |
-
<span class="kn">from</span><span class="w"> </span><span class="nn">kernels</span><span class="w"> </span><span class="kn">import</span> <span class="n">register_kernel_mapping</span><span class="p">,</span> <span class="n">Mode</span><span class="p">,</span> <span class="n">LayerRepository</span>
|
| 4345 |
-
<span class="kn">import</span><span class="w"> </span><span class="nn">sys</span>
|
| 4346 |
-
<span class="kn">import</span><span class="w"> </span><span class="nn">torch.profiler</span>
|
| 4347 |
-
<span class="kn">import</span><span class="w"> </span><span class="nn">gc</span>
|
| 4348 |
-
<span class="kn">import</span><span class="w"> </span><span class="nn">logging</span>
|
| 4349 |
-
|
| 4350 |
-
<span class="c1"># set to debug logging</span>
|
| 4351 |
-
<span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
|
| 4352 |
-
|
| 4353 |
-
<span class="k">def</span><span class="w"> </span><span class="nf">reset_peak_memory_stats</span><span class="p">():</span>
|
| 4354 |
-
<span class="w"> </span><span class="sd">"""Clear CUDA cache and reset memory allocation counters."""</span>
|
| 4355 |
-
<span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">empty_cache</span><span class="p">()</span>
|
| 4356 |
-
<span class="k">if</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">is_available</span><span class="p">():</span>
|
| 4357 |
-
<span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">reset_peak_memory_stats</span><span class="p">()</span>
|
| 4358 |
-
<span class="n">gc</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
|
| 4359 |
-
|
| 4360 |
-
<span class="k">def</span><span class="w"> </span><span class="nf">get_memory_stats</span><span class="p">():</span>
|
| 4361 |
-
<span class="w"> </span><span class="sd">"""Get current and peak CUDA memory usage."""</span>
|
| 4362 |
-
<span class="k">if</span> <span class="ow">not</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">is_available</span><span class="p">():</span>
|
| 4363 |
-
<span class="k">return</span> <span class="p">{</span><span class="s2">"allocated_gb"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">"peak_gb"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">"reserved_gb"</span><span class="p">:</span> <span class="mi">0</span><span class="p">}</span>
|
| 4364 |
-
<span class="k">return</span> <span class="p">{</span>
|
| 4365 |
-
<span class="s2">"allocated_gb"</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">memory_allocated</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
|
| 4366 |
-
<span class="s2">"peak_gb"</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">max_memory_allocated</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
|
| 4367 |
-
<span class="s2">"reserved_gb"</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">memory_reserved</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
|
| 4368 |
-
<span class="p">}</span>
|
| 4369 |
-
|
| 4370 |
-
<span class="k">def</span><span class="w"> </span><span class="nf">override_kernel_layer_name</span><span class="p">(</span><span class="n">cls_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
|
| 4371 |
-
<span class="w"> </span><span class="sd">"""Helper to dynamically override the kernel_layer_name in a model class."""</span>
|
| 4372 |
-
<span class="k">for</span> <span class="n">mod</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">modules</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
|
| 4373 |
-
<span class="k">if</span> <span class="n">mod</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
| 4374 |
-
<span class="k">continue</span>
|
| 4375 |
-
<span class="n">obj</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">cls_name</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
|
| 4376 |
-
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="nb">type</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">issubclass</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
|
| 4377 |
-
<span class="nb">setattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="s2">"kernel_layer_name"</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
|
| 4378 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Overrode </span><span class="si">{</span><span class="n">cls_name</span><span class="si">}</span><span class="s2">.kernel_layer_name to </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
| 4379 |
-
<span class="k">return</span> <span class="kc">True</span>
|
| 4380 |
-
<span class="k">return</span> <span class="kc">False</span>
|
| 4381 |
-
|
| 4382 |
-
|
| 4383 |
-
<span class="c1"># Init the model the normal way</span>
|
| 4384 |
-
<span class="n">model_id</span> <span class="o">=</span> <span class="s2">"openai/gpt-oss-20b"</span>
|
| 4385 |
-
<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">PreTrainedTokenizerFast</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">model_id</span><span class="p">)</span>
|
| 4386 |
-
<span class="n">quantization_config</span> <span class="o">=</span> <span class="n">Mxfp4Config</span><span class="p">(</span><span class="n">dequantize</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
| 4387 |
-
|
| 4388 |
-
|
| 4389 |
-
<span class="kn">from</span><span class="w"> </span><span class="nn">kernels</span><span class="w"> </span><span class="kn">import</span> <span class="n">replace_kernel_forward_from_hub</span><span class="p">,</span> <span class="n">register_kernel_mapping</span><span class="p">,</span> <span class="n">LayerRepository</span><span class="p">,</span> <span class="n">Mode</span>
|
| 4390 |
-
|
| 4391 |
-
<span class="kn">from</span><span class="w"> </span><span class="nn">transformers.models.gpt_oss.modeling_gpt_oss</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssMLP</span><span class="p">,</span> <span class="n">GptOssRMSNorm</span>
|
| 4392 |
-
|
| 4393 |
-
<span class="n">replace_kernel_forward_from_hub</span><span class="p">(</span><span class="n">GptOssMLP</span><span class="p">,</span> <span class="s2">"Yamoe"</span><span class="p">)</span>
|
| 4394 |
-
<span class="n">replace_kernel_forward_from_hub</span><span class="p">(</span><span class="n">GptOssRMSNorm</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
|
| 4395 |
-
<span class="n">custom_mapping</span> <span class="o">=</span> <span class="p">{</span>
|
| 4396 |
-
<span class="s2">"Yamoe"</span><span class="p">:</span> <span class="p">{</span>
|
| 4397 |
-
<span class="s2">"cuda"</span><span class="p">:</span> <span class="p">{</span>
|
| 4398 |
-
<span class="n">Mode</span><span class="o">.</span><span class="n">INFERENCE</span><span class="p">:</span> <span class="n">LayerRepository</span><span class="p">(</span>
|
| 4399 |
-
<span class="n">repo_id</span><span class="o">=</span><span class="s2">"drbh/yamoe"</span><span class="p">,</span>
|
| 4400 |
-
<span class="n">layer_name</span><span class="o">=</span><span class="s2">"Yamoe"</span><span class="p">,</span>
|
| 4401 |
-
<span class="n">revision</span><span class="o">=</span><span class="s2">"v0.3.0"</span><span class="p">,</span>
|
| 4402 |
-
<span class="p">)</span>
|
| 4403 |
-
<span class="p">}</span>
|
| 4404 |
-
<span class="p">}</span>
|
| 4405 |
-
<span class="p">}</span>
|
| 4406 |
-
<span class="n">register_kernel_mapping</span><span class="p">(</span><span class="n">custom_mapping</span><span class="p">)</span>
|
| 4407 |
-
|
| 4408 |
-
|
| 4409 |
-
<span class="n">model</span> <span class="o">=</span> <span class="n">GptOssForCausalLM</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
|
| 4410 |
-
<span class="n">model_id</span><span class="p">,</span>
|
| 4411 |
-
<span class="n">dtype</span><span class="o">=</span><span class="s2">"bfloat16"</span><span class="p">,</span>
|
| 4412 |
-
<span class="n">device_map</span><span class="o">=</span><span class="s2">"auto"</span><span class="p">,</span>
|
| 4413 |
-
<span class="n">use_kernels</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
|
| 4414 |
-
<span class="n">quantization_config</span><span class="o">=</span><span class="n">quantization_config</span><span class="p">,</span>
|
| 4415 |
-
<span class="p">)</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
|
| 4416 |
-
|
| 4417 |
-
<span class="n">messages</span> <span class="o">=</span> <span class="p">[</span>
|
| 4418 |
-
<span class="p">{</span><span class="s2">"role"</span><span class="p">:</span> <span class="s2">"system"</span><span class="p">,</span> <span class="s2">"content"</span><span class="p">:</span> <span class="s2">"What is Tensor Parallelism?"</span><span class="p">},</span>
|
| 4419 |
-
<span class="p">]</span>
|
| 4420 |
-
|
| 4421 |
-
<span class="n">inputs</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">apply_chat_template</span><span class="p">(</span>
|
| 4422 |
-
<span class="n">messages</span><span class="p">,</span>
|
| 4423 |
-
<span class="n">add_generation_prompt</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
|
| 4424 |
-
<span class="n">return_tensors</span><span class="o">=</span><span class="s2">"pt"</span><span class="p">,</span>
|
| 4425 |
-
<span class="n">return_dict</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
|
| 4426 |
-
<span class="n">reasoning_effort</span><span class="o">=</span><span class="s2">"low"</span><span class="p">,</span>
|
| 4427 |
-
<span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s2">"cuda"</span><span class="p">)</span>
|
| 4428 |
-
|
| 4429 |
-
<span class="n">max_tokens</span> <span class="o">=</span> <span class="mi">256</span>
|
| 4430 |
-
|
| 4431 |
-
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">inference_mode</span><span class="p">():</span>
|
| 4432 |
-
<span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
|
| 4433 |
-
<span class="n">generated</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span>
|
| 4434 |
-
<span class="o">**</span><span class="n">inputs</span><span class="p">,</span>
|
| 4435 |
-
<span class="n">max_new_tokens</span><span class="o">=</span><span class="n">max_tokens</span><span class="p">,</span>
|
| 4436 |
-
<span class="n">do_sample</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
|
| 4437 |
-
<span class="n">temperature</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
| 4438 |
-
<span class="p">)</span>
|
| 4439 |
-
<span class="n">end_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
|
| 4440 |
-
|
| 4441 |
-
<span class="nb">print</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">generated</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">skip_special_tokens</span><span class="o">=</span><span class="kc">False</span><span class="p">))</span>
|
| 4442 |
-
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Generation took </span><span class="si">{</span><span class="n">end_time</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">start_time</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2"> seconds"</span><span class="p">)</span>
|
| 4443 |
-
</pre></div>
|
| 4444 |
-
|
| 4445 |
-
<div class="code-line-highlight" id="line-highlight-setup"></div>
|
| 4446 |
-
</div>
|
| 4447 |
-
</div>
|
| 4448 |
-
</div>
|
| 4449 |
-
<div id="output-setup" class="cell-output">
|
| 4450 |
-
<div class="cell-stdout"><|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
|
| 4451 |
-
Knowledge cutoff: 2024-06
|
| 4452 |
-
Current date: 2025-09-24
|
| 4453 |
-
|
| 4454 |
-
Reasoning: low
|
| 4455 |
-
|
| 4456 |
-
# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions
|
| 4457 |
-
|
| 4458 |
-
What is Tensor Parallelism?
|
| 4459 |
-
|
| 4460 |
-
<|end|><|start|>assistant<|channel|>analysis<|message|>We need to explain what Tensor Parallelism is. It's a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices. Provide details: how it works, benefits, challenges, typical use cases, differences from data parallelism, pipeline parallelism, model parallelism. Provide example: splitting a fully connected layer's weight matrix across GPUs. Provide mention of frameworks: Megatron-LM, DeepSpeed, etc. Provide explanation of how forward/backward passes are computed. Provide mention of communication overhead, scaling, etc. Provide mention of "tensor parallelism" as part of "model parallelism" but specifically splitting tensors. Provide mention of "tensor parallelism" in context of transformer layers: splitting attention heads, feed-forward layers. Provide mention of "tensor parallelism" in context of "DeepSpeed ZeRO Stage 3" or "Megatron-LM's tensor parallelism". Provide mention of "tensor parallelism" as "model parallelism across the weight matrices" and "tensor parallelism" vs "pipeline parallelism". Provide mention of "tensor parallelism" as "splitting the weight matrix across GPUs, each GPU holds a slice of the matrix, and the input is broadcasted,
|
| 4461 |
-
Generation took 26.26 seconds
|
| 4462 |
-
</div>
|
| 4463 |
-
<div class="uv-install-logs" id="uv-logs-setup">
|
| 4464 |
-
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4465 |
-
<div class="uv-logs-content" style="display: none;">
|
| 4466 |
-
Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
|
| 4467 |
-
Downloading cpython-3.13.7-linux-x86_64-gnu (download)
|
| 4468 |
-
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4469 |
-
Updated https://github.com/huggingface/transformers.git (7258ea44bc0c0a425a468f66f8559d1de8c4126d)
|
| 4470 |
-
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4471 |
-
Downloading pillow (6.3MiB)
|
| 4472 |
-
Building transformers @ git+https://github.com/huggingface/transformers.git@7258ea44bc0c0a425a468f66f8559d1de8c4126d
|
| 4473 |
-
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4474 |
-
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4475 |
-
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4476 |
-
Downloading numpy (15.9MiB)
|
| 4477 |
-
Downloading hf-xet (3.0MiB)
|
| 4478 |
-
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4479 |
-
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4480 |
-
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4481 |
-
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4482 |
-
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4483 |
-
Downloading pygments (1.2MiB)
|
| 4484 |
-
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4485 |
-
Downloading jedi (1.5MiB)
|
| 4486 |
-
Downloading sympy (6.0MiB)
|
| 4487 |
-
Downloading kiwisolver (1.4MiB)
|
| 4488 |
-
Downloading matplotlib (8.3MiB)
|
| 4489 |
-
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4490 |
-
Downloading networkx (1.9MiB)
|
| 4491 |
-
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4492 |
-
Downloading tokenizers (3.1MiB)
|
| 4493 |
-
Downloading fonttools (4.7MiB)
|
| 4494 |
-
Downloading triton (148.4MiB)
|
| 4495 |
-
Downloading torch (846.8MiB)
|
| 4496 |
-
Downloading nvidia-cufile-cu12
|
| 4497 |
-
Downloading kiwisolver
|
| 4498 |
-
Downloading pygments
|
| 4499 |
-
Downloading hf-xet
|
| 4500 |
-
Downloading tokenizers
|
| 4501 |
-
Downloading networkx
|
| 4502 |
-
Downloading fonttools
|
| 4503 |
-
Downloading pillow
|
| 4504 |
-
Downloading matplotlib
|
| 4505 |
-
Downloading nvidia-cuda-cupti-cu12
|
| 4506 |
-
Downloading numpy
|
| 4507 |
-
Downloading sympy
|
| 4508 |
-
Built transformers @ git+https://github.com/huggingface/transformers.git@7258ea44bc0c0a425a468f66f8559d1de8c4126d
|
| 4509 |
-
Downloading nvidia-nvjitlink-cu12
|
| 4510 |
-
Downloading jedi
|
| 4511 |
-
Downloading nvidia-curand-cu12
|
| 4512 |
-
Downloading nvidia-cuda-nvrtc-cu12
|
| 4513 |
-
Downloading triton
|
| 4514 |
-
Downloading nvidia-cufft-cu12
|
| 4515 |
-
Downloading nvidia-cusolver-cu12
|
| 4516 |
-
Downloading nvidia-cusparselt-cu12
|
| 4517 |
-
Downloading nvidia-cusparse-cu12
|
| 4518 |
-
Downloading nvidia-nccl-cu12
|
| 4519 |
-
Downloading nvidia-cublas-cu12
|
| 4520 |
-
Downloading nvidia-cudnn-cu12
|
| 4521 |
-
Downloading torch
|
| 4522 |
-
Installed 69 packages in 464ms
|
| 4523 |
-
</div>
|
| 4524 |
-
</div>
|
| 4525 |
-
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4526 |
-
Fetching 3 files: 33%|███▎ | 1/3 [00:07<00:14, 7.38s/it]
|
| 4527 |
-
Fetching 3 files: 67%|██████▋ | 2/3 [00:08<00:03, 3.64s/it]
|
| 4528 |
-
Fetching 3 files: 100%|██████████| 3/3 [00:08<00:00, 2.80s/it]
|
| 4529 |
-
You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
|
| 4530 |
-
|
| 4531 |
-
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4532 |
-
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.34s/it]
|
| 4533 |
-
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.25s/it]
|
| 4534 |
-
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.80s/it]
|
| 4535 |
-
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.93s/it]
|
| 4536 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4537 |
-
|
| 4538 |
-
Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s]
|
| 4539 |
-
Fetching 6 files: 17%|█▋ | 1/6 [00:00<00:00, 5.44it/s]
|
| 4540 |
-
Fetching 6 files: 50%|█████ | 3/6 [00:00<00:00, 6.96it/s]
|
| 4541 |
-
Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 13.54it/s]
|
| 4542 |
-
/tmp/uvnote-run-jc1wbhvj/home/.cache/uv/environments-v2/setup-1400c3ff0fc01263/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4543 |
-
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4544 |
-
warnings.warn(
|
| 4545 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4546 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4547 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4548 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4549 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4550 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4551 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4552 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4553 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4554 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4555 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4556 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4557 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4558 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4559 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4560 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4561 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4562 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4563 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4564 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4565 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4566 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4567 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4568 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4569 |
-
/tmp/uvnote-run-jc1wbhvj/home/.cache/uv/environments-v2/setup-1400c3ff0fc01263/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4570 |
-
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4571 |
-
warnings.warn(
|
| 4572 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4573 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4574 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4575 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4576 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4577 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4578 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4579 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4580 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4581 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4582 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4583 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4584 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4585 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4586 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4587 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4588 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4589 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4590 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4591 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4592 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4593 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4594 |
-
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`</div>
|
| 4595 |
-
</div>
|
| 4596 |
-
</div>
|
| 4597 |
</div>
|
| 4598 |
|
| 4599 |
</body>
|
|
|
|
| 3715 |
</div>
|
| 3716 |
|
| 3717 |
<div class="main-content">
|
| 3718 |
+
<h1>Comparison of Megablocks and Yamoe Kernels</h1>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3719 |
<p>This note compares the performance of the Megablocks and Yamoe kernels on the GPT-OSS-20B model.</p>
|
| 3720 |
<h2>Megablocks kernel</h2>
|
| 3721 |
+
<div class="cell cell-failed" id="cell-setup2">
|
| 3722 |
<div class="cell-header">
|
| 3723 |
<span class="collapse-indicators">
|
| 3724 |
<span onclick="toggleCode('setup2')" style="cursor: pointer;">▼ code</span>
|
| 3725 |
<span onclick="toggleOutput('setup2')" style="cursor: pointer;">▼ output</span>
|
| 3726 |
+
<span id="uv-indicator-setup2" style="cursor: default; opacity: 0.3;">▶ uv-logs</span>
|
| 3727 |
</span> |
|
| 3728 |
+
Cell: setup2 | 18.93s | FAILED
|
| 3729 |
| <button class="run-btn" onclick="runCell('setup2')">▶ run</button>
|
| 3730 |
<button class="copy-btn" onclick="copyCell('setup2')">Copy</button>
|
| 3731 |
<a href="cells/setup2.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 3972 |
</div>
|
| 3973 |
</div>
|
| 3974 |
<div id="output-setup2" class="cell-output">
|
| 3975 |
+
<div class="cell-stderr">Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3976 |
Downloading cpython-3.13.7-linux-x86_64-gnu (download)
|
| 3977 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 3978 |
+
Updated https://github.com/huggingface/transformers.git (e691f84412563b6abca098f3e044980725d8daa3)
|
| 3979 |
+
× No solution found when resolving script dependencies:
|
| 3980 |
+
╰─▶ Because only transformers==4.57.0.dev0 is available and
|
| 3981 |
+
transformers==4.57.0.dev0 depends on huggingface-hub==1.0.0rc1,
|
| 3982 |
+
we can conclude that all versions of transformers depend on
|
| 3983 |
+
huggingface-hub==1.0.0rc1.
|
| 3984 |
+
And because kernels==0.10.0 depends on huggingface-hub>=0.26.0,<1.0,
|
| 3985 |
+
we can conclude that kernels==0.10.0 and all versions of transformers
|
| 3986 |
+
are incompatible.
|
| 3987 |
+
And because you require kernels==0.10.0 and transformers, we can
|
| 3988 |
+
conclude that your requirements are unsatisfiable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3989 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3990 |
</div>
|
| 3991 |
</div>
|
| 3992 |
|
| 3993 |
<h2>Yamoe Kernel</h2>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3994 |
</div>
|
| 3995 |
|
| 3996 |
</body>
|
megablocks_yamoe/torch_profile.html
CHANGED
|
@@ -3720,7 +3720,7 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
|
|
| 3720 |
<span onclick="toggleOutput('utils')" style="cursor: pointer;">▼ output</span>
|
| 3721 |
<span id="uv-indicator-utils" onclick="toggleUvLogsFromHeader('utils')" style="cursor: pointer;">▶ uv-logs</span>
|
| 3722 |
</span> |
|
| 3723 |
-
Cell: utils | deps: torch, numpy |
|
| 3724 |
| <button class="run-btn" onclick="runCell('utils')">▶ run</button>
|
| 3725 |
<button class="copy-btn" onclick="copyCell('utils')">Copy</button>
|
| 3726 |
<a href="cells/utils.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -3794,24 +3794,24 @@ Cell: utils | deps: torch, numpy | 35.29s
|
|
| 3794 |
<div class="uv-install-logs" id="uv-logs-utils">
|
| 3795 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 3796 |
<div class="uv-logs-content" style="display: none;">
|
| 3797 |
-
Downloading
|
| 3798 |
-
Downloading
|
| 3799 |
-
Downloading numpy (16.2MiB)
|
| 3800 |
-
Downloading sympy (6.0MiB)
|
| 3801 |
-
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 3802 |
-
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 3803 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 3804 |
-
Downloading nvidia-
|
| 3805 |
-
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 3806 |
-
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 3807 |
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 3808 |
-
Downloading
|
| 3809 |
-
Downloading nvidia-
|
| 3810 |
-
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 3811 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 3812 |
-
Downloading nvidia-
|
| 3813 |
-
Downloading
|
|
|
|
| 3814 |
Downloading triton (148.3MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3815 |
Downloading nvidia-cufile-cu12
|
| 3816 |
Downloading setuptools
|
| 3817 |
Downloading networkx
|
|
@@ -3824,13 +3824,13 @@ Downloading triton (148.3MiB)
|
|
| 3824 |
Downloading triton
|
| 3825 |
Downloading nvidia-cufft-cu12
|
| 3826 |
Downloading nvidia-cusolver-cu12
|
| 3827 |
-
Downloading nvidia-cusparse-cu12
|
| 3828 |
Downloading nvidia-cusparselt-cu12
|
|
|
|
| 3829 |
Downloading nvidia-nccl-cu12
|
| 3830 |
Downloading nvidia-cublas-cu12
|
| 3831 |
Downloading nvidia-cudnn-cu12
|
| 3832 |
Downloading torch
|
| 3833 |
-
Installed 26 packages in
|
| 3834 |
</div>
|
| 3835 |
</div>
|
| 3836 |
</div>
|
|
@@ -3843,7 +3843,7 @@ Installed 26 packages in 455ms
|
|
| 3843 |
<span onclick="toggleOutput('bench_utils')" style="cursor: pointer;">▼ output</span>
|
| 3844 |
<span id="uv-indicator-bench_utils" onclick="toggleUvLogsFromHeader('bench_utils')" style="cursor: pointer;">▶ uv-logs</span>
|
| 3845 |
</span> |
|
| 3846 |
-
Cell: bench_utils | deps: torch, numpy | 34.
|
| 3847 |
| <button class="run-btn" onclick="runCell('bench_utils')">▶ run</button>
|
| 3848 |
<button class="copy-btn" onclick="copyCell('bench_utils')">Copy</button>
|
| 3849 |
<a href="cells/bench_utils.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -4331,24 +4331,24 @@ Cell: bench_utils | deps: torch, numpy | 34.44s
|
|
| 4331 |
<div class="uv-install-logs" id="uv-logs-bench_utils">
|
| 4332 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4333 |
<div class="uv-logs-content" style="display: none;">
|
| 4334 |
-
Downloading
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4335 |
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4336 |
-
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4337 |
-
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4338 |
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4339 |
-
Downloading nvidia-
|
| 4340 |
-
Downloading sympy (6.0MiB)
|
| 4341 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4342 |
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4343 |
-
Downloading
|
| 4344 |
-
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4345 |
-
Downloading torch (846.9MiB)
|
| 4346 |
-
Downloading networkx (1.9MiB)
|
| 4347 |
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4348 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4349 |
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4350 |
-
Downloading
|
| 4351 |
-
Downloading nvidia-
|
|
|
|
|
|
|
| 4352 |
Downloading nvidia-cufile-cu12
|
| 4353 |
Downloading setuptools
|
| 4354 |
Downloading networkx
|
|
@@ -4367,7 +4367,7 @@ Downloading nvidia-nccl-cu12 (307.4MiB)
|
|
| 4367 |
Downloading nvidia-cublas-cu12
|
| 4368 |
Downloading nvidia-cudnn-cu12
|
| 4369 |
Downloading torch
|
| 4370 |
-
Installed 26 packages in
|
| 4371 |
</div>
|
| 4372 |
</div>
|
| 4373 |
</div>
|
|
@@ -4381,7 +4381,7 @@ Installed 26 packages in 447ms
|
|
| 4381 |
<span onclick="toggleOutput('config')" style="cursor: pointer;">▼ output</span>
|
| 4382 |
<span id="uv-indicator-config" onclick="toggleUvLogsFromHeader('config')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4383 |
</span> |
|
| 4384 |
-
Cell: config | deps: torch, numpy |
|
| 4385 |
| <button class="run-btn" onclick="runCell('config')">▶ run</button>
|
| 4386 |
<button class="copy-btn" onclick="copyCell('config')">Copy</button>
|
| 4387 |
<a href="cells/config.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -4442,23 +4442,23 @@ Cell: config | deps: torch, numpy | 34.69s
|
|
| 4442 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4443 |
<div class="uv-logs-content" style="display: none;">
|
| 4444 |
Downloading numpy (16.2MiB)
|
| 4445 |
-
Downloading
|
| 4446 |
-
Downloading
|
| 4447 |
-
Downloading torch (846.9MiB)
|
| 4448 |
-
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4449 |
Downloading setuptools (1.1MiB)
|
| 4450 |
Downloading triton (148.3MiB)
|
| 4451 |
-
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4452 |
-
Downloading networkx (1.9MiB)
|
| 4453 |
-
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4454 |
-
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4455 |
-
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4456 |
-
Downloading sympy (6.0MiB)
|
| 4457 |
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4458 |
-
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4459 |
-
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4460 |
Downloading nvidia-cufile-cu12 (1.1MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4461 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4462 |
Downloading nvidia-cufile-cu12
|
| 4463 |
Downloading setuptools
|
| 4464 |
Downloading networkx
|
|
@@ -4471,13 +4471,13 @@ Downloading nvidia-cublas-cu12 (566.8MiB)
|
|
| 4471 |
Downloading triton
|
| 4472 |
Downloading nvidia-cufft-cu12
|
| 4473 |
Downloading nvidia-cusolver-cu12
|
| 4474 |
-
Downloading nvidia-cusparselt-cu12
|
| 4475 |
Downloading nvidia-cusparse-cu12
|
|
|
|
| 4476 |
Downloading nvidia-nccl-cu12
|
| 4477 |
Downloading nvidia-cublas-cu12
|
| 4478 |
Downloading nvidia-cudnn-cu12
|
| 4479 |
Downloading torch
|
| 4480 |
-
Installed 26 packages in
|
| 4481 |
</div>
|
| 4482 |
</div>
|
| 4483 |
</div>
|
|
@@ -4490,7 +4490,7 @@ Installed 26 packages in 526ms
|
|
| 4490 |
<span onclick="toggleOutput('save_data')" style="cursor: pointer;">▼ output</span>
|
| 4491 |
<span id="uv-indicator-save_data" onclick="toggleUvLogsFromHeader('save_data')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4492 |
</span> |
|
| 4493 |
-
Cell: save_data | deps: torch, numpy |
|
| 4494 |
| <button class="run-btn" onclick="runCell('save_data')">▶ run</button>
|
| 4495 |
<button class="copy-btn" onclick="copyCell('save_data')">Copy</button>
|
| 4496 |
<a href="cells/save_data.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -4585,23 +4585,23 @@ Down sum: 206.729263
|
|
| 4585 |
<div class="uv-install-logs" id="uv-logs-save_data">
|
| 4586 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4587 |
<div class="uv-logs-content" style="display: none;">
|
| 4588 |
-
Downloading nvidia-
|
| 4589 |
-
Downloading nvidia-
|
| 4590 |
-
Downloading
|
| 4591 |
-
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4592 |
Downloading setuptools (1.1MiB)
|
|
|
|
| 4593 |
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4594 |
-
Downloading numpy (16.2MiB)
|
| 4595 |
-
Downloading triton (148.3MiB)
|
| 4596 |
-
Downloading networkx (1.9MiB)
|
| 4597 |
-
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4598 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4599 |
-
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4600 |
-
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4601 |
-
Downloading sympy (6.0MiB)
|
| 4602 |
Downloading torch (846.9MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4603 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4604 |
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
|
|
|
|
|
|
| 4605 |
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4606 |
Downloading nvidia-cufile-cu12
|
| 4607 |
Downloading setuptools
|
|
@@ -4618,20 +4618,20 @@ Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
|
| 4618 |
Downloading nvidia-cusparselt-cu12
|
| 4619 |
Downloading nvidia-cusparse-cu12
|
| 4620 |
Downloading nvidia-nccl-cu12
|
| 4621 |
-
Downloading nvidia-cublas-cu12
|
| 4622 |
Downloading nvidia-cudnn-cu12
|
|
|
|
| 4623 |
Downloading torch
|
| 4624 |
-
Installed 26 packages in
|
| 4625 |
</div>
|
| 4626 |
</div>
|
| 4627 |
<div class="cell-artifacts">
|
| 4628 |
<h4>Artifacts:</h4>
|
| 4629 |
-
<a href="artifacts/save_data/down_proj_bias.pt" class="artifact" target="_blank">down_proj_bias.pt</a>
|
| 4630 |
-
<a href="artifacts/save_data/down_proj.pt" class="artifact" target="_blank">down_proj.pt</a>
|
| 4631 |
-
<a href="artifacts/save_data/router_weight.pt" class="artifact" target="_blank">router_weight.pt</a>
|
| 4632 |
<a href="artifacts/save_data/router_bias.pt" class="artifact" target="_blank">router_bias.pt</a>
|
|
|
|
|
|
|
| 4633 |
<a href="artifacts/save_data/gate_up_proj_bias.pt" class="artifact" target="_blank">gate_up_proj_bias.pt</a>
|
| 4634 |
<a href="artifacts/save_data/gate_up_proj.pt" class="artifact" target="_blank">gate_up_proj.pt</a>
|
|
|
|
| 4635 |
</div>
|
| 4636 |
</div>
|
| 4637 |
</div>
|
|
@@ -4645,7 +4645,7 @@ Installed 26 packages in 563ms
|
|
| 4645 |
<span onclick="toggleOutput('yamoe_run')" style="cursor: pointer;">▼ output</span>
|
| 4646 |
<span id="uv-indicator-yamoe_run" onclick="toggleUvLogsFromHeader('yamoe_run')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4647 |
</span> |
|
| 4648 |
-
Cell: yamoe_run | deps: torch, kernels, numpy |
|
| 4649 |
| <button class="run-btn" onclick="runCell('yamoe_run')">▶ run</button>
|
| 4650 |
<button class="copy-btn" onclick="copyCell('yamoe_run')">Copy</button>
|
| 4651 |
<a href="cells/yamoe_run.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -4938,10 +4938,10 @@ Input Variation: +0.001 * iteration (deterministic)
|
|
| 4938 |
|
| 4939 |
Warming up (10 iterations)...
|
| 4940 |
Benchmarking (50 iterations)...
|
| 4941 |
-
Progress: 20% complete (avg: 4.
|
| 4942 |
-
Progress: 40% complete (avg: 4.
|
| 4943 |
-
Progress: 60% complete (avg: 4.
|
| 4944 |
-
Progress: 80% complete (avg: 4.
|
| 4945 |
|
| 4946 |
Output tensors:
|
| 4947 |
Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.049506, 0.054984], mean=0.000034, std=0.006508, norm=2.208791
|
|
@@ -4952,18 +4952,18 @@ Iterations: 50
|
|
| 4952 |
|
| 4953 |
Latency Statistics:
|
| 4954 |
Average: 4.248 ms
|
| 4955 |
-
Min: 4.
|
| 4956 |
-
Max: 4.
|
| 4957 |
Std Dev: 0.021 ms
|
| 4958 |
|
| 4959 |
Percentiles:
|
| 4960 |
-
P50 (median): 4.
|
| 4961 |
-
P95: 4.
|
| 4962 |
-
P99: 4.
|
| 4963 |
|
| 4964 |
Throughput:
|
| 4965 |
-
Tokens/sec:
|
| 4966 |
-
Std Dev:
|
| 4967 |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 4968 |
|
| 4969 |
Saved benchmark results to yamoe_results.json
|
|
@@ -4973,25 +4973,25 @@ Output sum: 3.971905
|
|
| 4973 |
<div class="uv-install-logs" id="uv-logs-yamoe_run">
|
| 4974 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4975 |
<div class="uv-logs-content" style="display: none;">
|
| 4976 |
-
Downloading
|
| 4977 |
-
Downloading
|
|
|
|
|
|
|
| 4978 |
Downloading setuptools (1.1MiB)
|
|
|
|
| 4979 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4980 |
-
Downloading
|
| 4981 |
-
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4982 |
-
Downloading triton (148.3MiB)
|
| 4983 |
-
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4984 |
-
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4985 |
-
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4986 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4987 |
-
Downloading nvidia-
|
| 4988 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4989 |
Downloading torch (846.9MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4990 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4991 |
-
Downloading nvidia-
|
| 4992 |
-
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4993 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4994 |
-
Downloading numpy (16.2MiB)
|
| 4995 |
Downloading nvidia-cufile-cu12
|
| 4996 |
Downloading hf-xet
|
| 4997 |
Downloading setuptools
|
|
@@ -5011,14 +5011,13 @@ Downloading numpy (16.2MiB)
|
|
| 5011 |
Downloading nvidia-cublas-cu12
|
| 5012 |
Downloading nvidia-cudnn-cu12
|
| 5013 |
Downloading torch
|
| 5014 |
-
Installed 37 packages in
|
| 5015 |
</div>
|
| 5016 |
</div>
|
| 5017 |
<div class="cell-stderr">Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s]
|
| 5018 |
-
Fetching 6 files: 17%|█▋ | 1/6 [00:00<00:
|
| 5019 |
-
Fetching 6 files:
|
| 5020 |
-
Fetching 6 files:
|
| 5021 |
-
Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 10.28it/s]</div>
|
| 5022 |
<div class="cell-artifacts">
|
| 5023 |
<h4>Artifacts:</h4>
|
| 5024 |
<a href="artifacts/yamoe_run/yamoe_results.json" class="artifact" target="_blank">yamoe_results.json</a>
|
|
@@ -5035,7 +5034,7 @@ Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 10.2
|
|
| 5035 |
<span onclick="toggleOutput('binned_run')" style="cursor: pointer;">▼ output</span>
|
| 5036 |
<span id="uv-indicator-binned_run" onclick="toggleUvLogsFromHeader('binned_run')" style="cursor: pointer;">▶ uv-logs</span>
|
| 5037 |
</span> |
|
| 5038 |
-
Cell: binned_run | deps: torch, numpy |
|
| 5039 |
| <button class="run-btn" onclick="runCell('binned_run')">▶ run</button>
|
| 5040 |
<button class="copy-btn" onclick="copyCell('binned_run')">Copy</button>
|
| 5041 |
<a href="cells/binned_run.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -5449,10 +5448,10 @@ Input Variation: +0.001 * iteration (deterministic)
|
|
| 5449 |
|
| 5450 |
Warming up (10 iterations)...
|
| 5451 |
Benchmarking (50 iterations)...
|
| 5452 |
-
Progress: 20% complete (avg: 37.
|
| 5453 |
-
Progress: 40% complete (avg: 37.
|
| 5454 |
-
Progress: 60% complete (avg: 37.
|
| 5455 |
-
Progress: 80% complete (avg: 36.
|
| 5456 |
|
| 5457 |
Output tensors:
|
| 5458 |
Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.049506, 0.054984], mean=0.000034, std=0.006508, norm=2.208791
|
|
@@ -5462,19 +5461,19 @@ Output tensors:
|
|
| 5462 |
Iterations: 50
|
| 5463 |
|
| 5464 |
Latency Statistics:
|
| 5465 |
-
Average: 36.
|
| 5466 |
-
Min:
|
| 5467 |
-
Max:
|
| 5468 |
-
Std Dev: 1.
|
| 5469 |
|
| 5470 |
Percentiles:
|
| 5471 |
-
P50 (median): 36.
|
| 5472 |
-
P95:
|
| 5473 |
-
P99:
|
| 5474 |
|
| 5475 |
Throughput:
|
| 5476 |
-
Tokens/sec:
|
| 5477 |
-
Std Dev:
|
| 5478 |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 5479 |
|
| 5480 |
Saved benchmark results to binned_results.json
|
|
@@ -5484,24 +5483,24 @@ Output sum: 3.971905
|
|
| 5484 |
<div class="uv-install-logs" id="uv-logs-binned_run">
|
| 5485 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 5486 |
<div class="uv-logs-content" style="display: none;">
|
| 5487 |
-
Downloading networkx (1.9MiB)
|
| 5488 |
-
Downloading numpy (16.2MiB)
|
| 5489 |
-
Downloading setuptools (1.1MiB)
|
| 5490 |
-
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 5491 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 5492 |
-
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 5493 |
-
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 5494 |
-
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 5495 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 5496 |
-
Downloading nvidia-
|
| 5497 |
-
Downloading nvidia-
|
|
|
|
|
|
|
| 5498 |
Downloading triton (148.3MiB)
|
|
|
|
| 5499 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 5500 |
-
Downloading
|
|
|
|
|
|
|
| 5501 |
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 5502 |
-
Downloading
|
| 5503 |
-
Downloading
|
| 5504 |
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
|
|
|
|
|
|
| 5505 |
Downloading nvidia-cufile-cu12
|
| 5506 |
Downloading setuptools
|
| 5507 |
Downloading networkx
|
|
@@ -5514,13 +5513,13 @@ Downloading nvidia-cusolver-cu12 (255.1MiB)
|
|
| 5514 |
Downloading triton
|
| 5515 |
Downloading nvidia-cufft-cu12
|
| 5516 |
Downloading nvidia-cusolver-cu12
|
| 5517 |
-
Downloading nvidia-cusparse-cu12
|
| 5518 |
Downloading nvidia-cusparselt-cu12
|
|
|
|
| 5519 |
Downloading nvidia-nccl-cu12
|
| 5520 |
Downloading nvidia-cublas-cu12
|
| 5521 |
Downloading nvidia-cudnn-cu12
|
| 5522 |
Downloading torch
|
| 5523 |
-
Installed 26 packages in
|
| 5524 |
</div>
|
| 5525 |
</div>
|
| 5526 |
<div class="cell-artifacts">
|
|
@@ -5539,7 +5538,7 @@ Installed 26 packages in 455ms
|
|
| 5539 |
<span onclick="toggleOutput('gptoss_run')" style="cursor: pointer;">▼ output</span>
|
| 5540 |
<span id="uv-indicator-gptoss_run" onclick="toggleUvLogsFromHeader('gptoss_run')" style="cursor: pointer;">▶ uv-logs</span>
|
| 5541 |
</span> |
|
| 5542 |
-
Cell: gptoss_run | deps: torch, numpy |
|
| 5543 |
| <button class="run-btn" onclick="runCell('gptoss_run')">▶ run</button>
|
| 5544 |
<button class="copy-btn" onclick="copyCell('gptoss_run')">Copy</button>
|
| 5545 |
<a href="cells/gptoss_run.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -5857,10 +5856,10 @@ Input Variation: +0.001 * iteration (deterministic)
|
|
| 5857 |
|
| 5858 |
Warming up (10 iterations)...
|
| 5859 |
Benchmarking (50 iterations)...
|
| 5860 |
-
Progress: 20% complete (avg:
|
| 5861 |
-
Progress: 40% complete (avg: 49.
|
| 5862 |
-
Progress: 60% complete (avg:
|
| 5863 |
-
Progress: 80% complete (avg:
|
| 5864 |
|
| 5865 |
Output tensors:
|
| 5866 |
Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.064982, 0.061193], mean=0.000100, std=0.013510, norm=4.585560
|
|
@@ -5870,19 +5869,19 @@ Output tensors:
|
|
| 5870 |
Iterations: 50
|
| 5871 |
|
| 5872 |
Latency Statistics:
|
| 5873 |
-
Average:
|
| 5874 |
-
Min: 40.
|
| 5875 |
-
Max:
|
| 5876 |
-
Std Dev:
|
| 5877 |
|
| 5878 |
Percentiles:
|
| 5879 |
-
P50 (median):
|
| 5880 |
-
P95:
|
| 5881 |
-
P99:
|
| 5882 |
|
| 5883 |
Throughput:
|
| 5884 |
-
Tokens/sec:
|
| 5885 |
-
Std Dev:
|
| 5886 |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 5887 |
|
| 5888 |
Saved benchmark results to gptoss_results.json
|
|
@@ -5892,24 +5891,24 @@ Output sum: 11.532237
|
|
| 5892 |
<div class="uv-install-logs" id="uv-logs-gptoss_run">
|
| 5893 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 5894 |
<div class="uv-logs-content" style="display: none;">
|
| 5895 |
-
Downloading numpy (16.2MiB)
|
| 5896 |
-
Downloading networkx (1.9MiB)
|
| 5897 |
-
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 5898 |
Downloading setuptools (1.1MiB)
|
|
|
|
|
|
|
| 5899 |
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5900 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 5901 |
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
|
|
|
|
|
|
|
|
|
| 5902 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
|
|
|
|
|
|
| 5903 |
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 5904 |
-
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 5905 |
-
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 5906 |
-
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 5907 |
Downloading triton (148.3MiB)
|
| 5908 |
-
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 5909 |
-
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 5910 |
-
Downloading torch (846.9MiB)
|
| 5911 |
-
Downloading sympy (6.0MiB)
|
| 5912 |
-
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 5913 |
Downloading nvidia-cufile-cu12
|
| 5914 |
Downloading setuptools
|
| 5915 |
Downloading networkx
|
|
@@ -5922,13 +5921,13 @@ Downloading nvidia-cublas-cu12 (566.8MiB)
|
|
| 5922 |
Downloading triton
|
| 5923 |
Downloading nvidia-cufft-cu12
|
| 5924 |
Downloading nvidia-cusolver-cu12
|
| 5925 |
-
Downloading nvidia-cusparse-cu12
|
| 5926 |
Downloading nvidia-cusparselt-cu12
|
|
|
|
| 5927 |
Downloading nvidia-nccl-cu12
|
| 5928 |
Downloading nvidia-cublas-cu12
|
| 5929 |
Downloading nvidia-cudnn-cu12
|
| 5930 |
Downloading torch
|
| 5931 |
-
Installed 26 packages in
|
| 5932 |
</div>
|
| 5933 |
</div>
|
| 5934 |
<div class="cell-artifacts">
|
|
@@ -5947,7 +5946,7 @@ Installed 26 packages in 524ms
|
|
| 5947 |
<span onclick="toggleOutput('gptoss_training_run')" style="cursor: pointer;">▼ output</span>
|
| 5948 |
<span id="uv-indicator-gptoss_training_run" onclick="toggleUvLogsFromHeader('gptoss_training_run')" style="cursor: pointer;">▶ uv-logs</span>
|
| 5949 |
</span> |
|
| 5950 |
-
Cell: gptoss_training_run | deps: torch, numpy | 40.
|
| 5951 |
| <button class="run-btn" onclick="runCell('gptoss_training_run')">▶ run</button>
|
| 5952 |
<button class="copy-btn" onclick="copyCell('gptoss_training_run')">Copy</button>
|
| 5953 |
<a href="cells/gptoss_training_run.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -6248,10 +6247,10 @@ Input Variation: +0.001 * iteration (deterministic)
|
|
| 6248 |
|
| 6249 |
Warming up (10 iterations)...
|
| 6250 |
Benchmarking (50 iterations)...
|
| 6251 |
-
Progress: 20% complete (avg:
|
| 6252 |
-
Progress: 40% complete (avg:
|
| 6253 |
-
Progress: 60% complete (avg:
|
| 6254 |
-
Progress: 80% complete (avg: 47.
|
| 6255 |
|
| 6256 |
Output tensors:
|
| 6257 |
Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.064982, 0.061193], mean=0.000100, std=0.013510, norm=4.585560
|
|
@@ -6261,19 +6260,19 @@ Output tensors:
|
|
| 6261 |
Iterations: 50
|
| 6262 |
|
| 6263 |
Latency Statistics:
|
| 6264 |
-
Average: 46.
|
| 6265 |
-
Min:
|
| 6266 |
-
Max:
|
| 6267 |
-
Std Dev: 2.
|
| 6268 |
|
| 6269 |
Percentiles:
|
| 6270 |
-
P50 (median):
|
| 6271 |
-
P95:
|
| 6272 |
-
P99:
|
| 6273 |
|
| 6274 |
Throughput:
|
| 6275 |
-
Tokens/sec:
|
| 6276 |
-
Std Dev:
|
| 6277 |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 6278 |
|
| 6279 |
Saved benchmark results to gptoss_training_results.json
|
|
@@ -6283,24 +6282,24 @@ Output sum: 11.532237
|
|
| 6283 |
<div class="uv-install-logs" id="uv-logs-gptoss_training_run">
|
| 6284 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 6285 |
<div class="uv-logs-content" style="display: none;">
|
| 6286 |
-
Downloading nvidia-
|
| 6287 |
-
Downloading
|
| 6288 |
-
Downloading
|
| 6289 |
Downloading sympy (6.0MiB)
|
| 6290 |
-
Downloading nvidia-
|
| 6291 |
-
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 6292 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 6293 |
-
Downloading
|
| 6294 |
-
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 6295 |
-
Downloading networkx (1.9MiB)
|
| 6296 |
-
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 6297 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
|
|
|
|
|
|
|
|
|
| 6298 |
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 6299 |
-
Downloading nvidia-cuda-
|
| 6300 |
-
Downloading nvidia-
|
| 6301 |
Downloading numpy (16.2MiB)
|
| 6302 |
-
Downloading
|
| 6303 |
Downloading triton (148.3MiB)
|
|
|
|
| 6304 |
Downloading nvidia-cufile-cu12
|
| 6305 |
Downloading setuptools
|
| 6306 |
Downloading networkx
|
|
@@ -6313,13 +6312,13 @@ Downloading triton (148.3MiB)
|
|
| 6313 |
Downloading triton
|
| 6314 |
Downloading nvidia-cufft-cu12
|
| 6315 |
Downloading nvidia-cusolver-cu12
|
| 6316 |
-
Downloading nvidia-cusparselt-cu12
|
| 6317 |
Downloading nvidia-cusparse-cu12
|
|
|
|
| 6318 |
Downloading nvidia-nccl-cu12
|
| 6319 |
Downloading nvidia-cublas-cu12
|
| 6320 |
Downloading nvidia-cudnn-cu12
|
| 6321 |
Downloading torch
|
| 6322 |
-
Installed 26 packages in
|
| 6323 |
</div>
|
| 6324 |
</div>
|
| 6325 |
<div class="cell-artifacts">
|
|
@@ -6338,7 +6337,7 @@ Installed 26 packages in 451ms
|
|
| 6338 |
<span onclick="toggleOutput('megablocks_run')" style="cursor: pointer;">▼ output</span>
|
| 6339 |
<span id="uv-indicator-megablocks_run" onclick="toggleUvLogsFromHeader('megablocks_run')" style="cursor: pointer;">▶ uv-logs</span>
|
| 6340 |
</span> |
|
| 6341 |
-
Cell: megablocks_run | deps: torch, numpy, kernels | 40.
|
| 6342 |
| <button class="run-btn" onclick="runCell('megablocks_run')">▶ run</button>
|
| 6343 |
<button class="copy-btn" onclick="copyCell('megablocks_run')">Copy</button>
|
| 6344 |
<a href="cells/megablocks_run.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -6571,24 +6570,24 @@ Warming up (10 iterations)...
|
|
| 6571 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 6572 |
<div class="uv-logs-content" style="display: none;">
|
| 6573 |
Downloading numpy (16.2MiB)
|
| 6574 |
-
Downloading
|
| 6575 |
-
Downloading
|
| 6576 |
-
Downloading hf-xet (3.0MiB)
|
| 6577 |
-
Downloading networkx (1.9MiB)
|
| 6578 |
-
Downloading torch (846.9MiB)
|
| 6579 |
-
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 6580 |
-
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 6581 |
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 6582 |
-
Downloading triton (148.3MiB)
|
| 6583 |
-
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 6584 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
|
|
|
|
|
|
| 6585 |
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 6586 |
-
Downloading
|
| 6587 |
-
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 6588 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 6589 |
-
Downloading nvidia-
|
| 6590 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 6591 |
-
Downloading
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6592 |
Downloading nvidia-cufile-cu12
|
| 6593 |
Downloading hf-xet
|
| 6594 |
Downloading setuptools
|
|
@@ -6602,25 +6601,25 @@ Downloading setuptools (1.1MiB)
|
|
| 6602 |
Downloading triton
|
| 6603 |
Downloading nvidia-cufft-cu12
|
| 6604 |
Downloading nvidia-cusolver-cu12
|
| 6605 |
-
Downloading nvidia-cusparse-cu12
|
| 6606 |
Downloading nvidia-cusparselt-cu12
|
|
|
|
| 6607 |
Downloading nvidia-nccl-cu12
|
| 6608 |
Downloading nvidia-cublas-cu12
|
| 6609 |
Downloading nvidia-cudnn-cu12
|
| 6610 |
Downloading torch
|
| 6611 |
-
Installed 37 packages in
|
| 6612 |
</div>
|
| 6613 |
</div>
|
| 6614 |
<div class="cell-stderr">Fetching 66 files: 0%| | 0/66 [00:00<?, ?it/s]
|
| 6615 |
-
Fetching 66 files: 2%|▏ | 1/66 [00:00<00:
|
| 6616 |
-
Fetching 66 files: 14%|█▎ | 9/66 [00:00<00:03,
|
| 6617 |
-
Fetching 66 files: 26%|██▌ | 17/66 [00:01<00:02,
|
| 6618 |
-
Fetching 66 files:
|
| 6619 |
-
Fetching 66 files:
|
| 6620 |
-
Fetching 66 files:
|
| 6621 |
-
Fetching 66 files:
|
| 6622 |
-
Fetching 66 files: 100%|██████████| 66/66 [00:
|
| 6623 |
-
/tmp/
|
| 6624 |
5 | #include <Python.h>
|
| 6625 |
| ^~~~~~~~~~
|
| 6626 |
compilation terminated.
|
|
@@ -6637,87 +6636,87 @@ Traceback (most recent call last):
|
|
| 6637 |
File "/repo/moe_benchmarks/megablocks_yamoe/.uvnote/cells/bench_utils.py", line 177, in <lambda>
|
| 6638 |
call = lambda x: fn(x, *args[1:], **kwargs)
|
| 6639 |
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6640 |
-
File "/tmp/uvnote-run-
|
| 6641 |
return self._call_impl(*args, **kwargs)
|
| 6642 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6643 |
-
File "/tmp/uvnote-run-
|
| 6644 |
return forward_call(*args, **kwargs)
|
| 6645 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6646 |
File "/repo/moe_benchmarks/megablocks_yamoe/.uvnote/cells/megablocks_run.py", line 81, in forward
|
| 6647 |
output, dummy_routing_weights = self.model(hidden_states)
|
| 6648 |
^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6649 |
-
File "/tmp/uvnote-run-
|
| 6650 |
return self._call_impl(*args, **kwargs)
|
| 6651 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6652 |
-
File "/tmp/uvnote-run-
|
| 6653 |
return forward_call(*args, **kwargs)
|
| 6654 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6655 |
-
File "/tmp/uvnote-run-
|
| 6656 |
output, expert_weights_out, *_ = moe_forward(
|
| 6657 |
^^^^^^^^^^^^
|
| 6658 |
-
File "/tmp/uvnote-run-
|
| 6659 |
x, tokens_per_expert = forward_fn(**forward_args)
|
| 6660 |
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6661 |
-
File "/tmp/uvnote-run-
|
| 6662 |
x = permute_and_compute(
|
| 6663 |
^^^^^^^^^^^^^^^^^^^^
|
| 6664 |
-
File "/tmp/uvnote-run-
|
| 6665 |
x = ops.binned_gather(x, indices, bins, expert_capacity, top_k)
|
| 6666 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6667 |
-
File "/tmp/uvnote-run-
|
| 6668 |
return super().apply(*args, **kwargs) # type: ignore[misc]
|
| 6669 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6670 |
-
File "/tmp/uvnote-run-
|
| 6671 |
return fwd(*args, **kwargs)
|
| 6672 |
^^^^^^^^^^^^^^^^^^^^
|
| 6673 |
-
File "/tmp/uvnote-run-
|
| 6674 |
return kernels.binned_gather(x, indices, None, bins, bin_size, top_k)
|
| 6675 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6676 |
-
File "/tmp/uvnote-run-
|
| 6677 |
_binned_copy[(num_experts, expert_capacity)](
|
| 6678 |
-
File "/tmp/uvnote-run-
|
| 6679 |
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
|
| 6680 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6681 |
-
File "/tmp/uvnote-run-
|
| 6682 |
benchmark()
|
| 6683 |
-
File "/tmp/uvnote-run-
|
| 6684 |
timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
|
| 6685 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6686 |
-
File "/tmp/uvnote-run-
|
| 6687 |
timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
|
| 6688 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6689 |
-
File "/tmp/uvnote-run-
|
| 6690 |
return self.do_bench(kernel_call, quantiles=(0.5, 0.2, 0.8))
|
| 6691 |
^^^^^^^^^^^^^
|
| 6692 |
File "/usr/lib/python3.11/functools.py", line 1001, in __get__
|
| 6693 |
val = self.func(instance)
|
| 6694 |
^^^^^^^^^^^^^^^^^^^
|
| 6695 |
-
File "/tmp/uvnote-run-
|
| 6696 |
return driver.active.get_benchmarker()
|
| 6697 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6698 |
-
File "/tmp/uvnote-run-
|
| 6699 |
return getattr(self._initialize_obj(), name)
|
| 6700 |
^^^^^^^^^^^^^^^^^^^^^^
|
| 6701 |
-
File "/tmp/uvnote-run-
|
| 6702 |
self._obj = self._init_fn()
|
| 6703 |
^^^^^^^^^^^^^^^
|
| 6704 |
-
File "/tmp/uvnote-run-
|
| 6705 |
return active_drivers[0]()
|
| 6706 |
^^^^^^^^^^^^^^^^^^^
|
| 6707 |
-
File "/tmp/uvnote-run-
|
| 6708 |
self.utils = CudaUtils() # TODO: make static
|
| 6709 |
^^^^^^^^^^^
|
| 6710 |
-
File "/tmp/uvnote-run-
|
| 6711 |
mod = compile_module_from_src(
|
| 6712 |
^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6713 |
-
File "/tmp/uvnote-run-
|
| 6714 |
so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [])
|
| 6715 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6716 |
-
File "/tmp/uvnote-run-
|
| 6717 |
subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL)
|
| 6718 |
File "/usr/lib/python3.11/subprocess.py", line 413, in check_call
|
| 6719 |
raise CalledProcessError(retcode, cmd)
|
| 6720 |
-
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/
|
| 6721 |
</div>
|
| 6722 |
</div>
|
| 6723 |
|
|
|
|
| 3720 |
<span onclick="toggleOutput('utils')" style="cursor: pointer;">▼ output</span>
|
| 3721 |
<span id="uv-indicator-utils" onclick="toggleUvLogsFromHeader('utils')" style="cursor: pointer;">▶ uv-logs</span>
|
| 3722 |
</span> |
|
| 3723 |
+
Cell: utils | deps: torch, numpy | 34.88s
|
| 3724 |
| <button class="run-btn" onclick="runCell('utils')">▶ run</button>
|
| 3725 |
<button class="copy-btn" onclick="copyCell('utils')">Copy</button>
|
| 3726 |
<a href="cells/utils.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 3794 |
<div class="uv-install-logs" id="uv-logs-utils">
|
| 3795 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 3796 |
<div class="uv-logs-content" style="display: none;">
|
| 3797 |
+
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 3798 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3799 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 3800 |
+
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
|
|
|
|
|
|
| 3801 |
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 3802 |
+
Downloading sympy (6.0MiB)
|
| 3803 |
+
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
|
|
|
| 3804 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 3805 |
+
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 3806 |
+
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 3807 |
+
Downloading networkx (1.9MiB)
|
| 3808 |
Downloading triton (148.3MiB)
|
| 3809 |
+
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 3810 |
+
Downloading numpy (16.2MiB)
|
| 3811 |
+
Downloading torch (846.9MiB)
|
| 3812 |
+
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 3813 |
+
Downloading setuptools (1.1MiB)
|
| 3814 |
+
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 3815 |
Downloading nvidia-cufile-cu12
|
| 3816 |
Downloading setuptools
|
| 3817 |
Downloading networkx
|
|
|
|
| 3824 |
Downloading triton
|
| 3825 |
Downloading nvidia-cufft-cu12
|
| 3826 |
Downloading nvidia-cusolver-cu12
|
|
|
|
| 3827 |
Downloading nvidia-cusparselt-cu12
|
| 3828 |
+
Downloading nvidia-cusparse-cu12
|
| 3829 |
Downloading nvidia-nccl-cu12
|
| 3830 |
Downloading nvidia-cublas-cu12
|
| 3831 |
Downloading nvidia-cudnn-cu12
|
| 3832 |
Downloading torch
|
| 3833 |
+
Installed 26 packages in 452ms
|
| 3834 |
</div>
|
| 3835 |
</div>
|
| 3836 |
</div>
|
|
|
|
| 3843 |
<span onclick="toggleOutput('bench_utils')" style="cursor: pointer;">▼ output</span>
|
| 3844 |
<span id="uv-indicator-bench_utils" onclick="toggleUvLogsFromHeader('bench_utils')" style="cursor: pointer;">▶ uv-logs</span>
|
| 3845 |
</span> |
|
| 3846 |
+
Cell: bench_utils | deps: torch, numpy | 34.66s
|
| 3847 |
| <button class="run-btn" onclick="runCell('bench_utils')">▶ run</button>
|
| 3848 |
<button class="copy-btn" onclick="copyCell('bench_utils')">Copy</button>
|
| 3849 |
<a href="cells/bench_utils.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 4331 |
<div class="uv-install-logs" id="uv-logs-bench_utils">
|
| 4332 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4333 |
<div class="uv-logs-content" style="display: none;">
|
| 4334 |
+
Downloading numpy (16.2MiB)
|
| 4335 |
+
Downloading sympy (6.0MiB)
|
| 4336 |
+
Downloading networkx (1.9MiB)
|
| 4337 |
+
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4338 |
+
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4339 |
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
|
|
|
|
|
|
| 4340 |
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4341 |
+
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
|
|
|
| 4342 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4343 |
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4344 |
+
Downloading setuptools (1.1MiB)
|
|
|
|
|
|
|
|
|
|
| 4345 |
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4346 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4347 |
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4348 |
+
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4349 |
+
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4350 |
+
Downloading triton (148.3MiB)
|
| 4351 |
+
Downloading torch (846.9MiB)
|
| 4352 |
Downloading nvidia-cufile-cu12
|
| 4353 |
Downloading setuptools
|
| 4354 |
Downloading networkx
|
|
|
|
| 4367 |
Downloading nvidia-cublas-cu12
|
| 4368 |
Downloading nvidia-cudnn-cu12
|
| 4369 |
Downloading torch
|
| 4370 |
+
Installed 26 packages in 535ms
|
| 4371 |
</div>
|
| 4372 |
</div>
|
| 4373 |
</div>
|
|
|
|
| 4381 |
<span onclick="toggleOutput('config')" style="cursor: pointer;">▼ output</span>
|
| 4382 |
<span id="uv-indicator-config" onclick="toggleUvLogsFromHeader('config')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4383 |
</span> |
|
| 4384 |
+
Cell: config | deps: torch, numpy | 35.36s
|
| 4385 |
| <button class="run-btn" onclick="runCell('config')">▶ run</button>
|
| 4386 |
<button class="copy-btn" onclick="copyCell('config')">Copy</button>
|
| 4387 |
<a href="cells/config.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 4442 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4443 |
<div class="uv-logs-content" style="display: none;">
|
| 4444 |
Downloading numpy (16.2MiB)
|
| 4445 |
+
Downloading sympy (6.0MiB)
|
| 4446 |
+
Downloading networkx (1.9MiB)
|
|
|
|
|
|
|
| 4447 |
Downloading setuptools (1.1MiB)
|
| 4448 |
Downloading triton (148.3MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4449 |
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
|
|
|
|
|
|
| 4450 |
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4451 |
+
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4452 |
+
Downloading torch (846.9MiB)
|
| 4453 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4454 |
+
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4455 |
+
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4456 |
+
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4457 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4458 |
+
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4459 |
+
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4460 |
+
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4461 |
+
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4462 |
Downloading nvidia-cufile-cu12
|
| 4463 |
Downloading setuptools
|
| 4464 |
Downloading networkx
|
|
|
|
| 4471 |
Downloading triton
|
| 4472 |
Downloading nvidia-cufft-cu12
|
| 4473 |
Downloading nvidia-cusolver-cu12
|
|
|
|
| 4474 |
Downloading nvidia-cusparse-cu12
|
| 4475 |
+
Downloading nvidia-cusparselt-cu12
|
| 4476 |
Downloading nvidia-nccl-cu12
|
| 4477 |
Downloading nvidia-cublas-cu12
|
| 4478 |
Downloading nvidia-cudnn-cu12
|
| 4479 |
Downloading torch
|
| 4480 |
+
Installed 26 packages in 452ms
|
| 4481 |
</div>
|
| 4482 |
</div>
|
| 4483 |
</div>
|
|
|
|
| 4490 |
<span onclick="toggleOutput('save_data')" style="cursor: pointer;">▼ output</span>
|
| 4491 |
<span id="uv-indicator-save_data" onclick="toggleUvLogsFromHeader('save_data')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4492 |
</span> |
|
| 4493 |
+
Cell: save_data | deps: torch, numpy | 39.03s
|
| 4494 |
| <button class="run-btn" onclick="runCell('save_data')">▶ run</button>
|
| 4495 |
<button class="copy-btn" onclick="copyCell('save_data')">Copy</button>
|
| 4496 |
<a href="cells/save_data.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 4585 |
<div class="uv-install-logs" id="uv-logs-save_data">
|
| 4586 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4587 |
<div class="uv-logs-content" style="display: none;">
|
| 4588 |
+
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4589 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4590 |
+
Downloading sympy (6.0MiB)
|
|
|
|
| 4591 |
Downloading setuptools (1.1MiB)
|
| 4592 |
+
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4593 |
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4594 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
|
|
|
|
|
|
|
|
|
| 4595 |
Downloading torch (846.9MiB)
|
| 4596 |
+
Downloading networkx (1.9MiB)
|
| 4597 |
+
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4598 |
+
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4599 |
+
Downloading numpy (16.2MiB)
|
| 4600 |
+
Downloading triton (148.3MiB)
|
| 4601 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4602 |
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4603 |
+
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4604 |
+
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4605 |
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4606 |
Downloading nvidia-cufile-cu12
|
| 4607 |
Downloading setuptools
|
|
|
|
| 4618 |
Downloading nvidia-cusparselt-cu12
|
| 4619 |
Downloading nvidia-cusparse-cu12
|
| 4620 |
Downloading nvidia-nccl-cu12
|
|
|
|
| 4621 |
Downloading nvidia-cudnn-cu12
|
| 4622 |
+
Downloading nvidia-cublas-cu12
|
| 4623 |
Downloading torch
|
| 4624 |
+
Installed 26 packages in 447ms
|
| 4625 |
</div>
|
| 4626 |
</div>
|
| 4627 |
<div class="cell-artifacts">
|
| 4628 |
<h4>Artifacts:</h4>
|
|
|
|
|
|
|
|
|
|
| 4629 |
<a href="artifacts/save_data/router_bias.pt" class="artifact" target="_blank">router_bias.pt</a>
|
| 4630 |
+
<a href="artifacts/save_data/router_weight.pt" class="artifact" target="_blank">router_weight.pt</a>
|
| 4631 |
+
<a href="artifacts/save_data/down_proj_bias.pt" class="artifact" target="_blank">down_proj_bias.pt</a>
|
| 4632 |
<a href="artifacts/save_data/gate_up_proj_bias.pt" class="artifact" target="_blank">gate_up_proj_bias.pt</a>
|
| 4633 |
<a href="artifacts/save_data/gate_up_proj.pt" class="artifact" target="_blank">gate_up_proj.pt</a>
|
| 4634 |
+
<a href="artifacts/save_data/down_proj.pt" class="artifact" target="_blank">down_proj.pt</a>
|
| 4635 |
</div>
|
| 4636 |
</div>
|
| 4637 |
</div>
|
|
|
|
| 4645 |
<span onclick="toggleOutput('yamoe_run')" style="cursor: pointer;">▼ output</span>
|
| 4646 |
<span id="uv-indicator-yamoe_run" onclick="toggleUvLogsFromHeader('yamoe_run')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4647 |
</span> |
|
| 4648 |
+
Cell: yamoe_run | deps: torch, kernels, numpy | 39.06s
|
| 4649 |
| <button class="run-btn" onclick="runCell('yamoe_run')">▶ run</button>
|
| 4650 |
<button class="copy-btn" onclick="copyCell('yamoe_run')">Copy</button>
|
| 4651 |
<a href="cells/yamoe_run.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 4938 |
|
| 4939 |
Warming up (10 iterations)...
|
| 4940 |
Benchmarking (50 iterations)...
|
| 4941 |
+
Progress: 20% complete (avg: 4.247 ms)
|
| 4942 |
+
Progress: 40% complete (avg: 4.244 ms)
|
| 4943 |
+
Progress: 60% complete (avg: 4.246 ms)
|
| 4944 |
+
Progress: 80% complete (avg: 4.246 ms)
|
| 4945 |
|
| 4946 |
Output tensors:
|
| 4947 |
Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.049506, 0.054984], mean=0.000034, std=0.006508, norm=2.208791
|
|
|
|
| 4952 |
|
| 4953 |
Latency Statistics:
|
| 4954 |
Average: 4.248 ms
|
| 4955 |
+
Min: 4.137 ms
|
| 4956 |
+
Max: 4.281 ms
|
| 4957 |
Std Dev: 0.021 ms
|
| 4958 |
|
| 4959 |
Percentiles:
|
| 4960 |
+
P50 (median): 4.253 ms
|
| 4961 |
+
P95: 4.266 ms
|
| 4962 |
+
P99: 4.274 ms
|
| 4963 |
|
| 4964 |
Throughput:
|
| 4965 |
+
Tokens/sec: 23539.4
|
| 4966 |
+
Std Dev: 120.7
|
| 4967 |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 4968 |
|
| 4969 |
Saved benchmark results to yamoe_results.json
|
|
|
|
| 4973 |
<div class="uv-install-logs" id="uv-logs-yamoe_run">
|
| 4974 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 4975 |
<div class="uv-logs-content" style="display: none;">
|
| 4976 |
+
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4977 |
+
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4978 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4979 |
+
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4980 |
Downloading setuptools (1.1MiB)
|
| 4981 |
+
Downloading numpy (16.2MiB)
|
| 4982 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4983 |
+
Downloading networkx (1.9MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4984 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4985 |
+
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4986 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4987 |
Downloading torch (846.9MiB)
|
| 4988 |
+
Downloading triton (148.3MiB)
|
| 4989 |
+
Downloading hf-xet (3.0MiB)
|
| 4990 |
+
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4991 |
+
Downloading sympy (6.0MiB)
|
| 4992 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4993 |
+
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
|
|
|
| 4994 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
|
|
|
| 4995 |
Downloading nvidia-cufile-cu12
|
| 4996 |
Downloading hf-xet
|
| 4997 |
Downloading setuptools
|
|
|
|
| 5011 |
Downloading nvidia-cublas-cu12
|
| 5012 |
Downloading nvidia-cudnn-cu12
|
| 5013 |
Downloading torch
|
| 5014 |
+
Installed 37 packages in 553ms
|
| 5015 |
</div>
|
| 5016 |
</div>
|
| 5017 |
<div class="cell-stderr">Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s]
|
| 5018 |
+
Fetching 6 files: 17%|█▋ | 1/6 [00:00<00:01, 2.76it/s]
|
| 5019 |
+
Fetching 6 files: 50%|█████ | 3/6 [00:00<00:00, 3.03it/s]
|
| 5020 |
+
Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 6.01it/s]</div>
|
|
|
|
| 5021 |
<div class="cell-artifacts">
|
| 5022 |
<h4>Artifacts:</h4>
|
| 5023 |
<a href="artifacts/yamoe_run/yamoe_results.json" class="artifact" target="_blank">yamoe_results.json</a>
|
|
|
|
| 5034 |
<span onclick="toggleOutput('binned_run')" style="cursor: pointer;">▼ output</span>
|
| 5035 |
<span id="uv-indicator-binned_run" onclick="toggleUvLogsFromHeader('binned_run')" style="cursor: pointer;">▶ uv-logs</span>
|
| 5036 |
</span> |
|
| 5037 |
+
Cell: binned_run | deps: torch, numpy | 39.51s
|
| 5038 |
| <button class="run-btn" onclick="runCell('binned_run')">▶ run</button>
|
| 5039 |
<button class="copy-btn" onclick="copyCell('binned_run')">Copy</button>
|
| 5040 |
<a href="cells/binned_run.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 5448 |
|
| 5449 |
Warming up (10 iterations)...
|
| 5450 |
Benchmarking (50 iterations)...
|
| 5451 |
+
Progress: 20% complete (avg: 37.524 ms)
|
| 5452 |
+
Progress: 40% complete (avg: 37.442 ms)
|
| 5453 |
+
Progress: 60% complete (avg: 37.122 ms)
|
| 5454 |
+
Progress: 80% complete (avg: 36.627 ms)
|
| 5455 |
|
| 5456 |
Output tensors:
|
| 5457 |
Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.049506, 0.054984], mean=0.000034, std=0.006508, norm=2.208791
|
|
|
|
| 5461 |
Iterations: 50
|
| 5462 |
|
| 5463 |
Latency Statistics:
|
| 5464 |
+
Average: 36.268 ms
|
| 5465 |
+
Min: 34.104 ms
|
| 5466 |
+
Max: 37.686 ms
|
| 5467 |
+
Std Dev: 1.160 ms
|
| 5468 |
|
| 5469 |
Percentiles:
|
| 5470 |
+
P50 (median): 36.522 ms
|
| 5471 |
+
P95: 37.643 ms
|
| 5472 |
+
P99: 37.677 ms
|
| 5473 |
|
| 5474 |
Throughput:
|
| 5475 |
+
Tokens/sec: 2757.2
|
| 5476 |
+
Std Dev: 89.1
|
| 5477 |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 5478 |
|
| 5479 |
Saved benchmark results to binned_results.json
|
|
|
|
| 5483 |
<div class="uv-install-logs" id="uv-logs-binned_run">
|
| 5484 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 5485 |
<div class="uv-logs-content" style="display: none;">
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5486 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
|
|
|
|
|
|
|
|
|
| 5487 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 5488 |
+
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 5489 |
+
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 5490 |
+
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 5491 |
+
Downloading sympy (6.0MiB)
|
| 5492 |
Downloading triton (148.3MiB)
|
| 5493 |
+
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 5494 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 5495 |
+
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 5496 |
+
Downloading networkx (1.9MiB)
|
| 5497 |
+
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 5498 |
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 5499 |
+
Downloading torch (846.9MiB)
|
| 5500 |
+
Downloading setuptools (1.1MiB)
|
| 5501 |
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 5502 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 5503 |
+
Downloading numpy (16.2MiB)
|
| 5504 |
Downloading nvidia-cufile-cu12
|
| 5505 |
Downloading setuptools
|
| 5506 |
Downloading networkx
|
|
|
|
| 5513 |
Downloading triton
|
| 5514 |
Downloading nvidia-cufft-cu12
|
| 5515 |
Downloading nvidia-cusolver-cu12
|
|
|
|
| 5516 |
Downloading nvidia-cusparselt-cu12
|
| 5517 |
+
Downloading nvidia-cusparse-cu12
|
| 5518 |
Downloading nvidia-nccl-cu12
|
| 5519 |
Downloading nvidia-cublas-cu12
|
| 5520 |
Downloading nvidia-cudnn-cu12
|
| 5521 |
Downloading torch
|
| 5522 |
+
Installed 26 packages in 453ms
|
| 5523 |
</div>
|
| 5524 |
</div>
|
| 5525 |
<div class="cell-artifacts">
|
|
|
|
| 5538 |
<span onclick="toggleOutput('gptoss_run')" style="cursor: pointer;">▼ output</span>
|
| 5539 |
<span id="uv-indicator-gptoss_run" onclick="toggleUvLogsFromHeader('gptoss_run')" style="cursor: pointer;">▶ uv-logs</span>
|
| 5540 |
</span> |
|
| 5541 |
+
Cell: gptoss_run | deps: torch, numpy | 40.20s
|
| 5542 |
| <button class="run-btn" onclick="runCell('gptoss_run')">▶ run</button>
|
| 5543 |
<button class="copy-btn" onclick="copyCell('gptoss_run')">Copy</button>
|
| 5544 |
<a href="cells/gptoss_run.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 5856 |
|
| 5857 |
Warming up (10 iterations)...
|
| 5858 |
Benchmarking (50 iterations)...
|
| 5859 |
+
Progress: 20% complete (avg: 50.493 ms)
|
| 5860 |
+
Progress: 40% complete (avg: 49.981 ms)
|
| 5861 |
+
Progress: 60% complete (avg: 49.061 ms)
|
| 5862 |
+
Progress: 80% complete (avg: 47.981 ms)
|
| 5863 |
|
| 5864 |
Output tensors:
|
| 5865 |
Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.064982, 0.061193], mean=0.000100, std=0.013510, norm=4.585560
|
|
|
|
| 5869 |
Iterations: 50
|
| 5870 |
|
| 5871 |
Latency Statistics:
|
| 5872 |
+
Average: 46.914 ms
|
| 5873 |
+
Min: 40.448 ms
|
| 5874 |
+
Max: 51.075 ms
|
| 5875 |
+
Std Dev: 2.992 ms
|
| 5876 |
|
| 5877 |
Percentiles:
|
| 5878 |
+
P50 (median): 47.419 ms
|
| 5879 |
+
P95: 50.800 ms
|
| 5880 |
+
P99: 50.949 ms
|
| 5881 |
|
| 5882 |
Throughput:
|
| 5883 |
+
Tokens/sec: 2131.6
|
| 5884 |
+
Std Dev: 139.9
|
| 5885 |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 5886 |
|
| 5887 |
Saved benchmark results to gptoss_results.json
|
|
|
|
| 5891 |
<div class="uv-install-logs" id="uv-logs-gptoss_run">
|
| 5892 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 5893 |
<div class="uv-logs-content" style="display: none;">
|
|
|
|
|
|
|
|
|
|
| 5894 |
Downloading setuptools (1.1MiB)
|
| 5895 |
+
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 5896 |
+
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 5897 |
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 5898 |
+
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 5899 |
+
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 5900 |
+
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 5901 |
+
Downloading sympy (6.0MiB)
|
| 5902 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 5903 |
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 5904 |
+
Downloading networkx (1.9MiB)
|
| 5905 |
+
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 5906 |
+
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 5907 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 5908 |
+
Downloading numpy (16.2MiB)
|
| 5909 |
+
Downloading torch (846.9MiB)
|
| 5910 |
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
|
|
|
|
|
|
|
|
|
| 5911 |
Downloading triton (148.3MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5912 |
Downloading nvidia-cufile-cu12
|
| 5913 |
Downloading setuptools
|
| 5914 |
Downloading networkx
|
|
|
|
| 5921 |
Downloading triton
|
| 5922 |
Downloading nvidia-cufft-cu12
|
| 5923 |
Downloading nvidia-cusolver-cu12
|
|
|
|
| 5924 |
Downloading nvidia-cusparselt-cu12
|
| 5925 |
+
Downloading nvidia-cusparse-cu12
|
| 5926 |
Downloading nvidia-nccl-cu12
|
| 5927 |
Downloading nvidia-cublas-cu12
|
| 5928 |
Downloading nvidia-cudnn-cu12
|
| 5929 |
Downloading torch
|
| 5930 |
+
Installed 26 packages in 452ms
|
| 5931 |
</div>
|
| 5932 |
</div>
|
| 5933 |
<div class="cell-artifacts">
|
|
|
|
| 5946 |
<span onclick="toggleOutput('gptoss_training_run')" style="cursor: pointer;">▼ output</span>
|
| 5947 |
<span id="uv-indicator-gptoss_training_run" onclick="toggleUvLogsFromHeader('gptoss_training_run')" style="cursor: pointer;">▶ uv-logs</span>
|
| 5948 |
</span> |
|
| 5949 |
+
Cell: gptoss_training_run | deps: torch, numpy | 40.63s
|
| 5950 |
| <button class="run-btn" onclick="runCell('gptoss_training_run')">▶ run</button>
|
| 5951 |
<button class="copy-btn" onclick="copyCell('gptoss_training_run')">Copy</button>
|
| 5952 |
<a href="cells/gptoss_training_run.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 6247 |
|
| 6248 |
Warming up (10 iterations)...
|
| 6249 |
Benchmarking (50 iterations)...
|
| 6250 |
+
Progress: 20% complete (avg: 49.824 ms)
|
| 6251 |
+
Progress: 40% complete (avg: 49.309 ms)
|
| 6252 |
+
Progress: 60% complete (avg: 48.365 ms)
|
| 6253 |
+
Progress: 80% complete (avg: 47.278 ms)
|
| 6254 |
|
| 6255 |
Output tensors:
|
| 6256 |
Primary: shape=(1, 100, 1152), dtype=torch.float32, device=cuda:0, range=[-0.064982, 0.061193], mean=0.000100, std=0.013510, norm=4.585560
|
|
|
|
| 6260 |
Iterations: 50
|
| 6261 |
|
| 6262 |
Latency Statistics:
|
| 6263 |
+
Average: 46.289 ms
|
| 6264 |
+
Min: 39.979 ms
|
| 6265 |
+
Max: 50.581 ms
|
| 6266 |
+
Std Dev: 2.917 ms
|
| 6267 |
|
| 6268 |
Percentiles:
|
| 6269 |
+
P50 (median): 46.648 ms
|
| 6270 |
+
P95: 50.267 ms
|
| 6271 |
+
P99: 50.516 ms
|
| 6272 |
|
| 6273 |
Throughput:
|
| 6274 |
+
Tokens/sec: 2160.3
|
| 6275 |
+
Std Dev: 139.9
|
| 6276 |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 6277 |
|
| 6278 |
Saved benchmark results to gptoss_training_results.json
|
|
|
|
| 6282 |
<div class="uv-install-logs" id="uv-logs-gptoss_training_run">
|
| 6283 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 6284 |
<div class="uv-logs-content" style="display: none;">
|
| 6285 |
+
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 6286 |
+
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 6287 |
+
Downloading networkx (1.9MiB)
|
| 6288 |
Downloading sympy (6.0MiB)
|
| 6289 |
+
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
|
|
|
| 6290 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 6291 |
+
Downloading setuptools (1.1MiB)
|
|
|
|
|
|
|
|
|
|
| 6292 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 6293 |
+
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 6294 |
+
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 6295 |
+
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 6296 |
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 6297 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 6298 |
+
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 6299 |
Downloading numpy (16.2MiB)
|
| 6300 |
+
Downloading torch (846.9MiB)
|
| 6301 |
Downloading triton (148.3MiB)
|
| 6302 |
+
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 6303 |
Downloading nvidia-cufile-cu12
|
| 6304 |
Downloading setuptools
|
| 6305 |
Downloading networkx
|
|
|
|
| 6312 |
Downloading triton
|
| 6313 |
Downloading nvidia-cufft-cu12
|
| 6314 |
Downloading nvidia-cusolver-cu12
|
|
|
|
| 6315 |
Downloading nvidia-cusparse-cu12
|
| 6316 |
+
Downloading nvidia-cusparselt-cu12
|
| 6317 |
Downloading nvidia-nccl-cu12
|
| 6318 |
Downloading nvidia-cublas-cu12
|
| 6319 |
Downloading nvidia-cudnn-cu12
|
| 6320 |
Downloading torch
|
| 6321 |
+
Installed 26 packages in 570ms
|
| 6322 |
</div>
|
| 6323 |
</div>
|
| 6324 |
<div class="cell-artifacts">
|
|
|
|
| 6337 |
<span onclick="toggleOutput('megablocks_run')" style="cursor: pointer;">▼ output</span>
|
| 6338 |
<span id="uv-indicator-megablocks_run" onclick="toggleUvLogsFromHeader('megablocks_run')" style="cursor: pointer;">▶ uv-logs</span>
|
| 6339 |
</span> |
|
| 6340 |
+
Cell: megablocks_run | deps: torch, numpy, kernels | 40.35s | FAILED
|
| 6341 |
| <button class="run-btn" onclick="runCell('megablocks_run')">▶ run</button>
|
| 6342 |
<button class="copy-btn" onclick="copyCell('megablocks_run')">Copy</button>
|
| 6343 |
<a href="cells/megablocks_run.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 6570 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
| 6571 |
<div class="uv-logs-content" style="display: none;">
|
| 6572 |
Downloading numpy (16.2MiB)
|
| 6573 |
+
Downloading sympy (6.0MiB)
|
| 6574 |
+
Downloading setuptools (1.1MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6575 |
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
|
|
|
|
|
|
| 6576 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 6577 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 6578 |
+
Downloading hf-xet (3.0MiB)
|
| 6579 |
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 6580 |
+
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
|
|
|
| 6581 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 6582 |
+
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 6583 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 6584 |
+
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 6585 |
+
Downloading triton (148.3MiB)
|
| 6586 |
+
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 6587 |
+
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 6588 |
+
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 6589 |
+
Downloading torch (846.9MiB)
|
| 6590 |
+
Downloading networkx (1.9MiB)
|
| 6591 |
Downloading nvidia-cufile-cu12
|
| 6592 |
Downloading hf-xet
|
| 6593 |
Downloading setuptools
|
|
|
|
| 6601 |
Downloading triton
|
| 6602 |
Downloading nvidia-cufft-cu12
|
| 6603 |
Downloading nvidia-cusolver-cu12
|
|
|
|
| 6604 |
Downloading nvidia-cusparselt-cu12
|
| 6605 |
+
Downloading nvidia-cusparse-cu12
|
| 6606 |
Downloading nvidia-nccl-cu12
|
| 6607 |
Downloading nvidia-cublas-cu12
|
| 6608 |
Downloading nvidia-cudnn-cu12
|
| 6609 |
Downloading torch
|
| 6610 |
+
Installed 37 packages in 448ms
|
| 6611 |
</div>
|
| 6612 |
</div>
|
| 6613 |
<div class="cell-stderr">Fetching 66 files: 0%| | 0/66 [00:00<?, ?it/s]
|
| 6614 |
+
Fetching 66 files: 2%|▏ | 1/66 [00:00<00:28, 2.31it/s]
|
| 6615 |
+
Fetching 66 files: 14%|█▎ | 9/66 [00:00<00:03, 18.19it/s]
|
| 6616 |
+
Fetching 66 files: 26%|██▌ | 17/66 [00:01<00:02, 16.61it/s]
|
| 6617 |
+
Fetching 66 files: 52%|█████▏ | 34/66 [00:01<00:00, 38.17it/s]
|
| 6618 |
+
Fetching 66 files: 64%|██████▎ | 42/66 [00:01<00:00, 36.62it/s]
|
| 6619 |
+
Fetching 66 files: 73%|███████▎ | 48/66 [00:01<00:00, 28.57it/s]
|
| 6620 |
+
Fetching 66 files: 92%|█████████▏| 61/66 [00:01<00:00, 39.67it/s]
|
| 6621 |
+
Fetching 66 files: 100%|██████████| 66/66 [00:02<00:00, 32.91it/s]
|
| 6622 |
+
/tmp/tmp1397kafx/cuda_utils.c:5:10: fatal error: Python.h: No such file or directory
|
| 6623 |
5 | #include <Python.h>
|
| 6624 |
| ^~~~~~~~~~
|
| 6625 |
compilation terminated.
|
|
|
|
| 6636 |
File "/repo/moe_benchmarks/megablocks_yamoe/.uvnote/cells/bench_utils.py", line 177, in <lambda>
|
| 6637 |
call = lambda x: fn(x, *args[1:], **kwargs)
|
| 6638 |
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6639 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
|
| 6640 |
return self._call_impl(*args, **kwargs)
|
| 6641 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6642 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
|
| 6643 |
return forward_call(*args, **kwargs)
|
| 6644 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6645 |
File "/repo/moe_benchmarks/megablocks_yamoe/.uvnote/cells/megablocks_run.py", line 81, in forward
|
| 6646 |
output, dummy_routing_weights = self.model(hidden_states)
|
| 6647 |
^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6648 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
|
| 6649 |
return self._call_impl(*args, **kwargs)
|
| 6650 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6651 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
|
| 6652 |
return forward_call(*args, **kwargs)
|
| 6653 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6654 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py", line 896, in forward
|
| 6655 |
output, expert_weights_out, *_ = moe_forward(
|
| 6656 |
^^^^^^^^^^^^
|
| 6657 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py", line 730, in moe_forward
|
| 6658 |
x, tokens_per_expert = forward_fn(**forward_args)
|
| 6659 |
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6660 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py", line 457, in forward_once
|
| 6661 |
x = permute_and_compute(
|
| 6662 |
^^^^^^^^^^^^^^^^^^^^
|
| 6663 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/layers.py", line 401, in permute_and_compute
|
| 6664 |
x = ops.binned_gather(x, indices, bins, expert_capacity, top_k)
|
| 6665 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6666 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/torch/autograd/function.py", line 576, in apply
|
| 6667 |
return super().apply(*args, **kwargs) # type: ignore[misc]
|
| 6668 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6669 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/ops/stk_autocast.py", line 30, in decorate_fwd
|
| 6670 |
return fwd(*args, **kwargs)
|
| 6671 |
^^^^^^^^^^^^^^^^^^^^
|
| 6672 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/ops/binned_gather.py", line 26, in forward
|
| 6673 |
return kernels.binned_gather(x, indices, None, bins, bin_size, top_k)
|
| 6674 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6675 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/huggingface/hub/models--kernels-community--megablocks/snapshots/e0fb1437de3f8d7079c4da13be8cb64dc0cfcdd5/build/torch28-cxx11-cu128-x86_64-linux/megablocks/backend/kernels.py", line 419, in binned_gather
|
| 6676 |
_binned_copy[(num_experts, expert_capacity)](
|
| 6677 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/jit.py", line 390, in <lambda>
|
| 6678 |
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
|
| 6679 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6680 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 239, in run
|
| 6681 |
benchmark()
|
| 6682 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 228, in benchmark
|
| 6683 |
timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
|
| 6684 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6685 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 228, in <dictcomp>
|
| 6686 |
timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
|
| 6687 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6688 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 160, in _bench
|
| 6689 |
return self.do_bench(kernel_call, quantiles=(0.5, 0.2, 0.8))
|
| 6690 |
^^^^^^^^^^^^^
|
| 6691 |
File "/usr/lib/python3.11/functools.py", line 1001, in __get__
|
| 6692 |
val = self.func(instance)
|
| 6693 |
^^^^^^^^^^^^^^^^^^^
|
| 6694 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 121, in do_bench
|
| 6695 |
return driver.active.get_benchmarker()
|
| 6696 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6697 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/driver.py", line 30, in __getattr__
|
| 6698 |
return getattr(self._initialize_obj(), name)
|
| 6699 |
^^^^^^^^^^^^^^^^^^^^^^
|
| 6700 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/driver.py", line 26, in _initialize_obj
|
| 6701 |
self._obj = self._init_fn()
|
| 6702 |
^^^^^^^^^^^^^^^
|
| 6703 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/driver.py", line 12, in _create_driver
|
| 6704 |
return active_drivers[0]()
|
| 6705 |
^^^^^^^^^^^^^^^^^^^
|
| 6706 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 715, in __init__
|
| 6707 |
self.utils = CudaUtils() # TODO: make static
|
| 6708 |
^^^^^^^^^^^
|
| 6709 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 62, in __init__
|
| 6710 |
mod = compile_module_from_src(
|
| 6711 |
^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6712 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/build.py", line 88, in compile_module_from_src
|
| 6713 |
so = _build(name, src_path, tmpdir, library_dirs or [], include_dirs or [], libraries or [])
|
| 6714 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 6715 |
+
File "/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/runtime/build.py", line 51, in _build
|
| 6716 |
subprocess.check_call(cc_cmd, stdout=subprocess.DEVNULL)
|
| 6717 |
File "/usr/lib/python3.11/subprocess.py", line 413, in check_call
|
| 6718 |
raise CalledProcessError(retcode, cmd)
|
| 6719 |
+
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp1397kafx/cuda_utils.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmp1397kafx/cuda_utils.cpython-311-x86_64-linux-gnu.so', '-lcuda', '-L/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/lib', '-L/usr/lib/x86_64-linux-gnu', '-I/tmp/uvnote-run-g9v2jr6r/home/.cache/uv/environments-v2/megablocks-run-8802ebf6d3566120/lib/python3.11/site-packages/triton/backends/nvidia/include', '-I/tmp/tmp1397kafx', '-I/usr/include/python3.11']' returned non-zero exit status 1.</div>
|
| 6720 |
</div>
|
| 6721 |
</div>
|
| 6722 |
|