drbh HF Staff commited on
Commit
aaf1485
·
verified ·
1 Parent(s): 2e0bc99

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. index.html +520 -79
  2. note_test_override.html +520 -79
index.html CHANGED
@@ -3711,14 +3711,14 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
3711
  </div>
3712
 
3713
  <div class="main-content">
3714
- <div class="cell cell-failed" id="cell-setup">
3715
  <div class="cell-header">
3716
  <span class="collapse-indicators">
3717
  <span onclick="toggleCode('setup')" style="cursor: pointer;">▼ code</span>
3718
  <span onclick="toggleOutput('setup')" style="cursor: pointer;">▼ output</span>
3719
  <span id="uv-indicator-setup" onclick="toggleUvLogsFromHeader('setup')" style="cursor: pointer;">▶ uv-logs</span>
3720
  </span> |
3721
- Cell: setup | 99.80s | FAILED
3722
  | <button class="run-btn" onclick="runCell('setup')">▶ run</button>
3723
  <button class="copy-btn" onclick="copyCell('setup')">Copy</button>
3724
  <a href="cells/setup.py" target="_blank" class="raw-btn">Raw</a>
@@ -3967,6 +3967,37 @@ Cell: setup | 99.80s | FAILED
3967
  </div>
3968
  </div>
3969
  <div id="output-setup" class="cell-output">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3970
  <div class="uv-install-logs" id="uv-logs-setup">
3971
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
3972
  <div class="uv-logs-content" style="display: none;">
@@ -3974,32 +4005,435 @@ Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
3974
  Downloading cpython-3.13.7-linux-x86_64-gnu (download)
3975
  Updating https://github.com/huggingface/transformers.git (HEAD)
3976
  Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
3977
- Downloading nvidia-cublas-cu12 (566.8MiB)
3978
  Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
3979
- Downloading nvidia-cufile-cu12 (1.1MiB)
3980
- Downloading jedi (1.5MiB)
3981
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
3982
  Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
3983
- Downloading nvidia-cusparse-cu12 (274.9MiB)
3984
  Downloading sympy (6.0MiB)
3985
- Downloading nvidia-cusolver-cu12 (255.1MiB)
3986
- Downloading hf-xet (3.0MiB)
3987
- Downloading nvidia-cufft-cu12 (184.2MiB)
3988
- Downloading nvidia-cudnn-cu12 (674.0MiB)
3989
- Downloading networkx (1.9MiB)
3990
- Downloading numpy (15.9MiB)
3991
- Downloading torch (846.8MiB)
3992
  Downloading pillow (6.3MiB)
 
 
 
 
 
 
 
 
3993
  Downloading nvidia-nccl-cu12 (307.4MiB)
 
3994
  Downloading nvidia-nvjitlink-cu12 (37.4MiB)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3995
  Downloading pygments (1.2MiB)
 
 
 
 
 
 
 
 
 
 
 
 
3996
  Downloading nvidia-curand-cu12 (60.7MiB)
 
 
 
 
 
3997
  Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
3998
- Downloading tokenizers (3.1MiB)
3999
  Downloading triton (148.4MiB)
4000
  Downloading matplotlib (8.3MiB)
4001
- Downloading fonttools (4.7MiB)
4002
  Downloading kiwisolver (1.4MiB)
 
4003
  Downloading nvidia-cufile-cu12
4004
  Downloading kiwisolver
4005
  Downloading pygments
@@ -4026,77 +4460,84 @@ Downloading kiwisolver (1.4MiB)
4026
  Downloading nvidia-cublas-cu12
4027
  Downloading nvidia-cudnn-cu12
4028
  Downloading torch
4029
- Installed 69 packages in 465ms
4030
  </div>
4031
  </div>
4032
  <div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00&lt;?, ?it/s]
4033
- Fetching 3 files: 33%|███▎ | 1/3 [00:15&lt;00:31, 15.83s/it]
4034
- Fetching 3 files: 67%|██████▋ | 2/3 [00:18&lt;00:08, 8.05s/it]
4035
- Fetching 3 files: 100%|██████████| 3/3 [00:18&lt;00:00, 6.14s/it]
4036
  You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
4037
- INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
4038
 
4039
  Loading checkpoint shards: 0%| | 0/3 [00:00&lt;?, ?it/s]
4040
- Loading checkpoint shards: 33%|███▎ | 1/3 [00:07&lt;00:15, 7.50s/it]
4041
- Loading checkpoint shards: 67%|██████▋ | 2/3 [00:14&lt;00:07, 7.33s/it]
4042
- Loading checkpoint shards: 67%|██████▋ | 2/3 [00:15&lt;00:07, 7.51s/it]
4043
- Traceback (most recent call last):
4044
- File &quot;/tmp/uvnote_5cbrsnjg/.uvnote/cells/setup.py&quot;, line 83, in &lt;module&gt;
4045
- model = GptOssForCausalLM.from_pretrained(
4046
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
4047
- model_id,
4048
- ^^^^^^^^^
4049
- ...&lt;3 lines&gt;...
4050
- quantization_config=quantization_config,
4051
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4052
- ).eval()
4053
- ^
4054
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/modeling_utils.py&quot;, line 285, in _wrapper
4055
- return func(*args, **kwargs)
4056
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/modeling_utils.py&quot;, line 5035, in from_pretrained
4057
- ) = cls._load_pretrained_model(
4058
- ~~~~~~~~~~~~~~~~~~~~~~~~~~^
4059
- model,
4060
- ^^^^^^
4061
- ...&lt;13 lines&gt;...
4062
- weights_only=weights_only,
4063
- ^^^^^^^^^^^^^^^^^^^^^^^^^^
4064
- )
4065
- ^
4066
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/modeling_utils.py&quot;, line 5488, in _load_pretrained_model
4067
- _error_msgs, disk_offload_index, cpu_offload_index = load_shard_file(args)
4068
- ~~~~~~~~~~~~~~~^^^^^^
4069
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/modeling_utils.py&quot;, line 932, in load_shard_file
4070
- disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model(
4071
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
4072
- model_to_load,
4073
- ^^^^^^^^^^^^^^
4074
- ...&lt;13 lines&gt;...
4075
- device_mesh=device_mesh,
4076
- ^^^^^^^^^^^^^^^^^^^^^^^^
4077
- )
4078
- ^
4079
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/torch/utils/_contextlib.py&quot;, line 120, in decorate_context
4080
- return func(*args, **kwargs)
4081
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/modeling_utils.py&quot;, line 840, in _load_state_dict_into_meta_model
4082
- hf_quantizer.create_quantized_param(
4083
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
4084
- model, param, param_name, param_device, state_dict, unexpected_keys
4085
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4086
- )
4087
- ^
4088
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/quantizers/quantizer_mxfp4.py&quot;, line 249, in create_quantized_param
4089
- dequantize(module, param_name, param_value, target_device, dq_param_name, **shard_kwargs)
4090
- ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4091
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/integrations/mxfp4.py&quot;, line 329, in dequantize
4092
- dequantized = convert_moe_packed_tensors(getattr(module, blocks_attr), getattr(module, scales_attr))
4093
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/integrations/mxfp4.py&quot;, line 117, in convert_moe_packed_tensors
4094
- idx_hi = (blk &gt;&gt; 4).to(torch.long)
4095
- torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.98 GiB. GPU 0 has a total capacity of 22.30 GiB of which 1.69 GiB is free. Process 43404 has 20.61 GiB memory in use. Of the allocated memory 17.37 GiB is allocated by PyTorch, and 2.96 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)</div>
 
 
 
 
 
 
 
 
 
 
4096
  </div>
4097
  </div>
4098
-
4099
- <h1>Reference kernel</h1>
4100
  </div>
4101
 
4102
  </body>
 
3711
  </div>
3712
 
3713
  <div class="main-content">
3714
+ <div class="cell" id="cell-setup">
3715
  <div class="cell-header">
3716
  <span class="collapse-indicators">
3717
  <span onclick="toggleCode('setup')" style="cursor: pointer;">▼ code</span>
3718
  <span onclick="toggleOutput('setup')" style="cursor: pointer;">▼ output</span>
3719
  <span id="uv-indicator-setup" onclick="toggleUvLogsFromHeader('setup')" style="cursor: pointer;">▶ uv-logs</span>
3720
  </span> |
3721
+ Cell: setup | 132.82s
3722
  | <button class="run-btn" onclick="runCell('setup')">▶ run</button>
3723
  <button class="copy-btn" onclick="copyCell('setup')">Copy</button>
3724
  <a href="cells/setup.py" target="_blank" class="raw-btn">Raw</a>
 
3967
  </div>
3968
  </div>
3969
  <div id="output-setup" class="cell-output">
3970
+ <div class="cell-stdout">&lt;|start|&gt;system&lt;|message|&gt;You are ChatGPT, a large language model trained by OpenAI.
3971
+ Knowledge cutoff: 2024-06
3972
+ Current date: 2025-09-23
3973
+
3974
+ Reasoning: low
3975
+
3976
+ # Valid channels: analysis, commentary, final. Channel must be included for every message.&lt;|end|&gt;&lt;|start|&gt;developer&lt;|message|&gt;# Instructions
3977
+
3978
+ What is Tensor Parallelism?
3979
+
3980
+ &lt;|end|&gt;&lt;|start|&gt;assistant&lt;|channel|&gt;analysis&lt;|message|&gt;We need to explain what Tensor Parallelism is. It&#x27;s a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices, so each device holds a slice of the matrix. During forward/backward passes, each device computes partial results and then they are aggregated. It&#x27;s used to scale up models beyond single device memory. Also mention pipeline parallelism, data parallelism. Provide details: e.g., for a linear layer weight matrix W of shape (out_features, in_features), we can split along out_features dimension across devices. Each device computes its part of the output. Then gather. Similarly for attention QKV projections. Provide example: GPT-3 uses tensor parallelism. Also mention frameworks: Megatron-LM, DeepSpeed, etc. Provide pros/cons. Provide typical implementation: using torch.distributed.all_reduce, gather, etc. Provide code snippet. Also mention that it&#x27;s different from data parallelism. Provide explanation of how it works in practice. Provide mention of &quot;tensor model parallelism&quot; vs &quot;tensor parallelism&quot; synonyms. Provide mention of &quot;tensor parallelism&quot; in context of &quot;DeepSpeed ZeRO Stage 3&quot; or &quot;Megatron-LM&quot;. Provide mention of &quot;tensor parallelism&quot; as part of &quot;model parallelism&quot; to reduce memory usage. Provide mention of &quot;tensor parallelism&quot; as &quot;splitting weight matrices across GPUs&quot; and &quot;communication overhead&quot;.
3981
+
3982
+ Also mention that it&#x27;s used for large transformer models like GPT-3, LLaMA, etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;DeepSpeed&#x27;s ZeRO-Offload&quot; or &quot;ZeRO-3&quot;.
3983
+
3984
+ Also mention that &quot;tensor parallelism&quot; can be combined with &quot;pipeline parallelism&quot; and &quot;data parallelism&quot; to achieve full scaling.
3985
+
3986
+ Also mention that &quot;tensor parallelism&quot; can be implemented by splitting the weight matrix along the output dimension, performing local matrix multiplication, then all-reduce to sum partial outputs.
3987
+
3988
+ Also mention that &quot;tensor parallelism&quot; can be used for linear layers, self-attention, feed-forward networks, etc.
3989
+
3990
+ Also mention that &quot;tensor parallelism&quot; can be used for &quot;embedding tables&quot; by sharding them across devices.
3991
+
3992
+ Also mention that &quot;tensor parallelism&quot; can be used for &quot;attention heads&quot; by splitting across heads.
3993
+
3994
+ Also mention that &quot;tensor parallelism&quot; can be used for &quot;parameter sharding&quot;.
3995
+
3996
+ Also mention that &quot;tensor parallelism&quot; can be used for &quot;model parallelism&quot; to reduce memory usage.
3997
+
3998
+
3999
+ Generation took 51.90 seconds
4000
+ </div>
4001
  <div class="uv-install-logs" id="uv-logs-setup">
4002
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4003
  <div class="uv-logs-content" style="display: none;">
 
4005
  Downloading cpython-3.13.7-linux-x86_64-gnu (download)
4006
  Updating https://github.com/huggingface/transformers.git (HEAD)
4007
  Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
 
4008
  Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
4009
+ Downloading nvidia-cusolver-cu12 (255.1MiB)
 
 
4010
  Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4011
+ Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4012
  Downloading sympy (6.0MiB)
4013
+ Downloading jedi (1.5MiB)
4014
+ Downloading fonttools (4.7MiB)
 
 
 
 
 
4015
  Downloading pillow (6.3MiB)
4016
+ Downloading nvidia-cusparse-cu12 (274.9MiB)
4017
+ Downloading nvidia-curand-cu12 (60.7MiB)
4018
+ Downloading numpy (15.9MiB)
4019
+ Downloading nvidia-cufft-cu12 (184.2MiB)
4020
+ Downloading matplotlib (8.3MiB)
4021
+ Downloading pygments (1.2MiB)
4022
+ Downloading nvidia-cublas-cu12 (566.8MiB)
4023
+ Downloading kiwisolver (1.4MiB)
4024
  Downloading nvidia-nccl-cu12 (307.4MiB)
4025
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
4026
  Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4027
+ Downloading networkx (1.9MiB)
4028
+ Downloading hf-xet (3.0MiB)
4029
+ Downloading nvidia-cudnn-cu12 (674.0MiB)
4030
+ Downloading torch (846.8MiB)
4031
+ Downloading tokenizers (3.1MiB)
4032
+ Downloading nvidia-cufile-cu12 (1.1MiB)
4033
+ Downloading triton (148.4MiB)
4034
+ Downloading nvidia-cufile-cu12
4035
+ Downloading kiwisolver
4036
+ Downloading pygments
4037
+ Downloading hf-xet
4038
+ Downloading tokenizers
4039
+ Downloading networkx
4040
+ Downloading fonttools
4041
+ Downloading pillow
4042
+ Downloading matplotlib
4043
+ Downloading nvidia-cuda-cupti-cu12
4044
+ Downloading numpy
4045
+ Downloading sympy
4046
+ Downloading nvidia-nvjitlink-cu12
4047
+ Built transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
4048
+ Downloading jedi
4049
+ Downloading nvidia-curand-cu12
4050
+ Downloading nvidia-cuda-nvrtc-cu12
4051
+ Downloading triton
4052
+ Downloading nvidia-cufft-cu12
4053
+ Downloading nvidia-cusolver-cu12
4054
+ Downloading nvidia-cusparselt-cu12
4055
+ Downloading nvidia-cusparse-cu12
4056
+ Downloading nvidia-nccl-cu12
4057
+ Downloading nvidia-cublas-cu12
4058
+ Downloading nvidia-cudnn-cu12
4059
+ Downloading torch
4060
+ Installed 69 packages in 539ms
4061
+ </div>
4062
+ </div>
4063
+ <div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00&lt;?, ?it/s]
4064
+ Fetching 3 files: 33%|███▎ | 1/3 [00:07&lt;00:15, 7.55s/it]
4065
+ Fetching 3 files: 67%|██████▋ | 2/3 [00:08&lt;00:03, 3.72s/it]
4066
+ Fetching 3 files: 100%|██████████| 3/3 [00:08&lt;00:00, 2.87s/it]
4067
+ You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
4068
+
4069
+ Loading checkpoint shards: 0%| | 0/3 [00:00&lt;?, ?it/s]
4070
+ Loading checkpoint shards: 33%|███▎ | 1/3 [00:02&lt;00:04, 2.34s/it]
4071
+ Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04&lt;00:02, 2.28s/it]
4072
+ Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.82s/it]
4073
+ Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.95s/it]
4074
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4075
+
4076
+ Fetching 6 files: 0%| | 0/6 [00:00&lt;?, ?it/s]
4077
+ Fetching 6 files: 17%|█▋ | 1/6 [00:00&lt;00:00, 5.35it/s]
4078
+ Fetching 6 files: 50%|█████ | 3/6 [00:00&lt;00:00, 6.55it/s]
4079
+ Fetching 6 files: 100%|██████████| 6/6 [00:00&lt;00:00, 12.81it/s]
4080
+ /tmp/uvnote-run-og9tszom/home/.cache/uv/environments-v2/setup-d9b6d9dd835772a9/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4081
+ No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4082
+ warnings.warn(
4083
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4084
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4085
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4086
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4087
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4088
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4089
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4090
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4091
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4092
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4093
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4094
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4095
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4096
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4097
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4098
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4099
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4100
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4101
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4102
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4103
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4104
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4105
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4106
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4107
+ /tmp/uvnote-run-og9tszom/home/.cache/uv/environments-v2/setup-d9b6d9dd835772a9/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4108
+ No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4109
+ warnings.warn(
4110
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4111
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4112
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4113
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4114
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4115
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4116
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4117
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4118
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4119
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4120
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4121
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4122
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4123
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4124
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4125
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4126
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4127
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4128
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4129
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4130
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4131
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4132
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`</div>
4133
+ </div>
4134
+ </div>
4135
+
4136
+ <h1>Reference kernel</h1>
4137
+ <div class="cell" id="cell-setup2">
4138
+ <div class="cell-header">
4139
+ <span class="collapse-indicators">
4140
+ <span onclick="toggleCode('setup2')" style="cursor: pointer;">▼ code</span>
4141
+ <span onclick="toggleOutput('setup2')" style="cursor: pointer;">▼ output</span>
4142
+ <span id="uv-indicator-setup2" onclick="toggleUvLogsFromHeader('setup2')" style="cursor: pointer;">▶ uv-logs</span>
4143
+ </span> |
4144
+ Cell: setup2 | 140.15s
4145
+ | <button class="run-btn" onclick="runCell('setup2')">▶ run</button>
4146
+ <button class="copy-btn" onclick="copyCell('setup2')">Copy</button>
4147
+ <a href="cells/setup2.py" target="_blank" class="raw-btn">Raw</a>
4148
+ </div>
4149
+ <div id="code-setup2" class="cell-code" data-lines="115">
4150
+ <div class="highlight-with-lines">
4151
+ <div class="line-numbers" id="lines-setup2">
4152
+ <a class="line-number" data-cell="setup2" data-line="1" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 1, true);">1</a>
4153
+ <a class="line-number" data-cell="setup2" data-line="2" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 2, true);">2</a>
4154
+ <a class="line-number" data-cell="setup2" data-line="3" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 3, true);">3</a>
4155
+ <a class="line-number" data-cell="setup2" data-line="4" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 4, true);">4</a>
4156
+ <a class="line-number" data-cell="setup2" data-line="5" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 5, true);">5</a>
4157
+ <a class="line-number" data-cell="setup2" data-line="6" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 6, true);">6</a>
4158
+ <a class="line-number" data-cell="setup2" data-line="7" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 7, true);">7</a>
4159
+ <a class="line-number" data-cell="setup2" data-line="8" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 8, true);">8</a>
4160
+ <a class="line-number" data-cell="setup2" data-line="9" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 9, true);">9</a>
4161
+ <a class="line-number" data-cell="setup2" data-line="10" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 10, true);">10</a>
4162
+ <a class="line-number" data-cell="setup2" data-line="11" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 11, true);">11</a>
4163
+ <a class="line-number" data-cell="setup2" data-line="12" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 12, true);">12</a>
4164
+ <a class="line-number" data-cell="setup2" data-line="13" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 13, true);">13</a>
4165
+ <a class="line-number" data-cell="setup2" data-line="14" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 14, true);">14</a>
4166
+ <a class="line-number" data-cell="setup2" data-line="15" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 15, true);">15</a>
4167
+ <a class="line-number" data-cell="setup2" data-line="16" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 16, true);">16</a>
4168
+ <a class="line-number" data-cell="setup2" data-line="17" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 17, true);">17</a>
4169
+ <a class="line-number" data-cell="setup2" data-line="18" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 18, true);">18</a>
4170
+ <a class="line-number" data-cell="setup2" data-line="19" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 19, true);">19</a>
4171
+ <a class="line-number" data-cell="setup2" data-line="20" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 20, true);">20</a>
4172
+ <a class="line-number" data-cell="setup2" data-line="21" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 21, true);">21</a>
4173
+ <a class="line-number" data-cell="setup2" data-line="22" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 22, true);">22</a>
4174
+ <a class="line-number" data-cell="setup2" data-line="23" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 23, true);">23</a>
4175
+ <a class="line-number" data-cell="setup2" data-line="24" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 24, true);">24</a>
4176
+ <a class="line-number" data-cell="setup2" data-line="25" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 25, true);">25</a>
4177
+ <a class="line-number" data-cell="setup2" data-line="26" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 26, true);">26</a>
4178
+ <a class="line-number" data-cell="setup2" data-line="27" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 27, true);">27</a>
4179
+ <a class="line-number" data-cell="setup2" data-line="28" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 28, true);">28</a>
4180
+ <a class="line-number" data-cell="setup2" data-line="29" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 29, true);">29</a>
4181
+ <a class="line-number" data-cell="setup2" data-line="30" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 30, true);">30</a>
4182
+ <a class="line-number" data-cell="setup2" data-line="31" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 31, true);">31</a>
4183
+ <a class="line-number" data-cell="setup2" data-line="32" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 32, true);">32</a>
4184
+ <a class="line-number" data-cell="setup2" data-line="33" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 33, true);">33</a>
4185
+ <a class="line-number" data-cell="setup2" data-line="34" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 34, true);">34</a>
4186
+ <a class="line-number" data-cell="setup2" data-line="35" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 35, true);">35</a>
4187
+ <a class="line-number" data-cell="setup2" data-line="36" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 36, true);">36</a>
4188
+ <a class="line-number" data-cell="setup2" data-line="37" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 37, true);">37</a>
4189
+ <a class="line-number" data-cell="setup2" data-line="38" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 38, true);">38</a>
4190
+ <a class="line-number" data-cell="setup2" data-line="39" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 39, true);">39</a>
4191
+ <a class="line-number" data-cell="setup2" data-line="40" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 40, true);">40</a>
4192
+ <a class="line-number" data-cell="setup2" data-line="41" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 41, true);">41</a>
4193
+ <a class="line-number" data-cell="setup2" data-line="42" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 42, true);">42</a>
4194
+ <a class="line-number" data-cell="setup2" data-line="43" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 43, true);">43</a>
4195
+ <a class="line-number" data-cell="setup2" data-line="44" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 44, true);">44</a>
4196
+ <a class="line-number" data-cell="setup2" data-line="45" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 45, true);">45</a>
4197
+ <a class="line-number" data-cell="setup2" data-line="46" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 46, true);">46</a>
4198
+ <a class="line-number" data-cell="setup2" data-line="47" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 47, true);">47</a>
4199
+ <a class="line-number" data-cell="setup2" data-line="48" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 48, true);">48</a>
4200
+ <a class="line-number" data-cell="setup2" data-line="49" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 49, true);">49</a>
4201
+ <a class="line-number" data-cell="setup2" data-line="50" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 50, true);">50</a>
4202
+ <a class="line-number" data-cell="setup2" data-line="51" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 51, true);">51</a>
4203
+ <a class="line-number" data-cell="setup2" data-line="52" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 52, true);">52</a>
4204
+ <a class="line-number" data-cell="setup2" data-line="53" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 53, true);">53</a>
4205
+ <a class="line-number" data-cell="setup2" data-line="54" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 54, true);">54</a>
4206
+ <a class="line-number" data-cell="setup2" data-line="55" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 55, true);">55</a>
4207
+ <a class="line-number" data-cell="setup2" data-line="56" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 56, true);">56</a>
4208
+ <a class="line-number" data-cell="setup2" data-line="57" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 57, true);">57</a>
4209
+ <a class="line-number" data-cell="setup2" data-line="58" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 58, true);">58</a>
4210
+ <a class="line-number" data-cell="setup2" data-line="59" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 59, true);">59</a>
4211
+ <a class="line-number" data-cell="setup2" data-line="60" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 60, true);">60</a>
4212
+ <a class="line-number" data-cell="setup2" data-line="61" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 61, true);">61</a>
4213
+ <a class="line-number" data-cell="setup2" data-line="62" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 62, true);">62</a>
4214
+ <a class="line-number" data-cell="setup2" data-line="63" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 63, true);">63</a>
4215
+ <a class="line-number" data-cell="setup2" data-line="64" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 64, true);">64</a>
4216
+ <a class="line-number" data-cell="setup2" data-line="65" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 65, true);">65</a>
4217
+ <a class="line-number" data-cell="setup2" data-line="66" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 66, true);">66</a>
4218
+ <a class="line-number" data-cell="setup2" data-line="67" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 67, true);">67</a>
4219
+ <a class="line-number" data-cell="setup2" data-line="68" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 68, true);">68</a>
4220
+ <a class="line-number" data-cell="setup2" data-line="69" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 69, true);">69</a>
4221
+ <a class="line-number" data-cell="setup2" data-line="70" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 70, true);">70</a>
4222
+ <a class="line-number" data-cell="setup2" data-line="71" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 71, true);">71</a>
4223
+ <a class="line-number" data-cell="setup2" data-line="72" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 72, true);">72</a>
4224
+ <a class="line-number" data-cell="setup2" data-line="73" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 73, true);">73</a>
4225
+ <a class="line-number" data-cell="setup2" data-line="74" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 74, true);">74</a>
4226
+ <a class="line-number" data-cell="setup2" data-line="75" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 75, true);">75</a>
4227
+ <a class="line-number" data-cell="setup2" data-line="76" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 76, true);">76</a>
4228
+ <a class="line-number" data-cell="setup2" data-line="77" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 77, true);">77</a>
4229
+ <a class="line-number" data-cell="setup2" data-line="78" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 78, true);">78</a>
4230
+ <a class="line-number" data-cell="setup2" data-line="79" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 79, true);">79</a>
4231
+ <a class="line-number" data-cell="setup2" data-line="80" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 80, true);">80</a>
4232
+ <a class="line-number" data-cell="setup2" data-line="81" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 81, true);">81</a>
4233
+ <a class="line-number" data-cell="setup2" data-line="82" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 82, true);">82</a>
4234
+ <a class="line-number" data-cell="setup2" data-line="83" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 83, true);">83</a>
4235
+ <a class="line-number" data-cell="setup2" data-line="84" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 84, true);">84</a>
4236
+ <a class="line-number" data-cell="setup2" data-line="85" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 85, true);">85</a>
4237
+ <a class="line-number" data-cell="setup2" data-line="86" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 86, true);">86</a>
4238
+ <a class="line-number" data-cell="setup2" data-line="87" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 87, true);">87</a>
4239
+ <a class="line-number" data-cell="setup2" data-line="88" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 88, true);">88</a>
4240
+ <a class="line-number" data-cell="setup2" data-line="89" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 89, true);">89</a>
4241
+ <a class="line-number" data-cell="setup2" data-line="90" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 90, true);">90</a>
4242
+ <a class="line-number" data-cell="setup2" data-line="91" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 91, true);">91</a>
4243
+ <a class="line-number" data-cell="setup2" data-line="92" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 92, true);">92</a>
4244
+ <a class="line-number" data-cell="setup2" data-line="93" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 93, true);">93</a>
4245
+ <a class="line-number" data-cell="setup2" data-line="94" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 94, true);">94</a>
4246
+ <a class="line-number" data-cell="setup2" data-line="95" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 95, true);">95</a>
4247
+ <a class="line-number" data-cell="setup2" data-line="96" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 96, true);">96</a>
4248
+ <a class="line-number" data-cell="setup2" data-line="97" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 97, true);">97</a>
4249
+ <a class="line-number" data-cell="setup2" data-line="98" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 98, true);">98</a>
4250
+ <a class="line-number" data-cell="setup2" data-line="99" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 99, true);">99</a>
4251
+ <a class="line-number" data-cell="setup2" data-line="100" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 100, true);">100</a>
4252
+ <a class="line-number" data-cell="setup2" data-line="101" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 101, true);">101</a>
4253
+ <a class="line-number" data-cell="setup2" data-line="102" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 102, true);">102</a>
4254
+ <a class="line-number" data-cell="setup2" data-line="103" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 103, true);">103</a>
4255
+ <a class="line-number" data-cell="setup2" data-line="104" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 104, true);">104</a>
4256
+ <a class="line-number" data-cell="setup2" data-line="105" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 105, true);">105</a>
4257
+ <a class="line-number" data-cell="setup2" data-line="106" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 106, true);">106</a>
4258
+ <a class="line-number" data-cell="setup2" data-line="107" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 107, true);">107</a>
4259
+ <a class="line-number" data-cell="setup2" data-line="108" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 108, true);">108</a>
4260
+ <a class="line-number" data-cell="setup2" data-line="109" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 109, true);">109</a>
4261
+ <a class="line-number" data-cell="setup2" data-line="110" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 110, true);">110</a>
4262
+ <a class="line-number" data-cell="setup2" data-line="111" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 111, true);">111</a>
4263
+ <a class="line-number" data-cell="setup2" data-line="112" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 112, true);">112</a>
4264
+ <a class="line-number" data-cell="setup2" data-line="113" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 113, true);">113</a>
4265
+ <a class="line-number" data-cell="setup2" data-line="114" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 114, true);">114</a>
4266
+ <a class="line-number" data-cell="setup2" data-line="115" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 115, true);">115</a>
4267
+ </div>
4268
+ <div class="code-wrap">
4269
+ <div class="highlight"><pre><span></span><span class="c1"># /// script</span>
4270
+ <span class="c1"># requires-python = &quot;&gt;=3.12&quot;</span>
4271
+ <span class="c1"># dependencies = [</span>
4272
+ <span class="c1"># &quot;accelerate&gt;=1.10.1&quot;,</span>
4273
+ <span class="c1"># &quot;torch&gt;=2.7.0&quot;,</span>
4274
+ <span class="c1"># &quot;kernels==0.10.0&quot;,</span>
4275
+ <span class="c1"># &quot;transformers@https://github.com/huggingface/transformers.git&quot;,</span>
4276
+ <span class="c1"># &quot;ipdb&gt;=0.13.13&quot;,</span>
4277
+ <span class="c1"># &quot;matplotlib&gt;=3.7.2&quot;,</span>
4278
+ <span class="c1"># &quot;numpy&gt;=1.24.3&quot;,</span>
4279
+ <span class="c1"># ]</span>
4280
+ <span class="c1"># ///</span>
4281
+
4282
+ <span class="kn">import</span><span class="w"> </span><span class="nn">torch</span>
4283
+ <span class="kn">from</span><span class="w"> </span><span class="nn">transformers</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssForCausalLM</span><span class="p">,</span> <span class="n">PreTrainedTokenizerFast</span><span class="p">,</span> <span class="n">Mxfp4Config</span>
4284
+ <span class="kn">import</span><span class="w"> </span><span class="nn">time</span>
4285
+ <span class="kn">import</span><span class="w"> </span><span class="nn">torch.nn</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">nn</span>
4286
+ <span class="kn">from</span><span class="w"> </span><span class="nn">kernels</span><span class="w"> </span><span class="kn">import</span> <span class="n">register_kernel_mapping</span><span class="p">,</span> <span class="n">Mode</span><span class="p">,</span> <span class="n">LayerRepository</span>
4287
+ <span class="kn">import</span><span class="w"> </span><span class="nn">sys</span>
4288
+ <span class="kn">import</span><span class="w"> </span><span class="nn">torch.profiler</span>
4289
+ <span class="kn">import</span><span class="w"> </span><span class="nn">gc</span>
4290
+ <span class="kn">import</span><span class="w"> </span><span class="nn">logging</span>
4291
+
4292
+ <span class="c1"># set to debug logging</span>
4293
+ <span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
4294
+
4295
+ <span class="k">def</span><span class="w"> </span><span class="nf">reset_peak_memory_stats</span><span class="p">():</span>
4296
+ <span class="w"> </span><span class="sd">&quot;&quot;&quot;Clear CUDA cache and reset memory allocation counters.&quot;&quot;&quot;</span>
4297
+ <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">empty_cache</span><span class="p">()</span>
4298
+ <span class="k">if</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">is_available</span><span class="p">():</span>
4299
+ <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">reset_peak_memory_stats</span><span class="p">()</span>
4300
+ <span class="n">gc</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
4301
+
4302
+ <span class="k">def</span><span class="w"> </span><span class="nf">get_memory_stats</span><span class="p">():</span>
4303
+ <span class="w"> </span><span class="sd">&quot;&quot;&quot;Get current and peak CUDA memory usage.&quot;&quot;&quot;</span>
4304
+ <span class="k">if</span> <span class="ow">not</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">is_available</span><span class="p">():</span>
4305
+ <span class="k">return</span> <span class="p">{</span><span class="s2">&quot;allocated_gb&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&quot;peak_gb&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&quot;reserved_gb&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">}</span>
4306
+ <span class="k">return</span> <span class="p">{</span>
4307
+ <span class="s2">&quot;allocated_gb&quot;</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">memory_allocated</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
4308
+ <span class="s2">&quot;peak_gb&quot;</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">max_memory_allocated</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
4309
+ <span class="s2">&quot;reserved_gb&quot;</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">memory_reserved</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
4310
+ <span class="p">}</span>
4311
+
4312
+ <span class="k">def</span><span class="w"> </span><span class="nf">override_kernel_layer_name</span><span class="p">(</span><span class="n">cls_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">bool</span><span class="p">:</span>
4313
+ <span class="w"> </span><span class="sd">&quot;&quot;&quot;Helper to dynamically override the kernel_layer_name in a model class.&quot;&quot;&quot;</span>
4314
+ <span class="k">for</span> <span class="n">mod</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">modules</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
4315
+ <span class="k">if</span> <span class="n">mod</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
4316
+ <span class="k">continue</span>
4317
+ <span class="n">obj</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">cls_name</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
4318
+ <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="nb">type</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">issubclass</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
4319
+ <span class="nb">setattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="s2">&quot;kernel_layer_name&quot;</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
4320
+ <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Overrode </span><span class="si">{</span><span class="n">cls_name</span><span class="si">}</span><span class="s2">.kernel_layer_name to </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4321
+ <span class="k">return</span> <span class="kc">True</span>
4322
+ <span class="k">return</span> <span class="kc">False</span>
4323
+
4324
+
4325
+ <span class="c1"># Init the model the normal way</span>
4326
+ <span class="n">model_id</span> <span class="o">=</span> <span class="s2">&quot;openai/gpt-oss-20b&quot;</span>
4327
+ <span class="n">tokenizer</span> <span class="o">=</span> <span class="n">PreTrainedTokenizerFast</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">model_id</span><span class="p">)</span>
4328
+ <span class="n">quantization_config</span> <span class="o">=</span> <span class="n">Mxfp4Config</span><span class="p">(</span><span class="n">dequantize</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
4329
+
4330
+
4331
+ <span class="kn">from</span><span class="w"> </span><span class="nn">kernels</span><span class="w"> </span><span class="kn">import</span> <span class="n">replace_kernel_forward_from_hub</span><span class="p">,</span> <span class="n">register_kernel_mapping</span><span class="p">,</span> <span class="n">LayerRepository</span><span class="p">,</span> <span class="n">Mode</span>
4332
+
4333
+ <span class="kn">from</span><span class="w"> </span><span class="nn">transformers.models.gpt_oss.modeling_gpt_oss</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssMLP</span><span class="p">,</span> <span class="n">GptOssRMSNorm</span>
4334
+
4335
+ <span class="n">replace_kernel_forward_from_hub</span><span class="p">(</span><span class="n">GptOssRMSNorm</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span> <span class="c1"># direct, type-safe</span>
4336
+ <span class="n">custom_mapping</span> <span class="o">=</span> <span class="p">{</span>
4337
+ <span class="s2">&quot;Yamoe&quot;</span><span class="p">:</span> <span class="p">{</span>
4338
+ <span class="s2">&quot;cuda&quot;</span><span class="p">:</span> <span class="p">{</span>
4339
+ <span class="n">Mode</span><span class="o">.</span><span class="n">INFERENCE</span><span class="p">:</span> <span class="n">LayerRepository</span><span class="p">(</span>
4340
+ <span class="n">repo_id</span><span class="o">=</span><span class="s2">&quot;drbh/yamoe&quot;</span><span class="p">,</span>
4341
+ <span class="n">layer_name</span><span class="o">=</span><span class="s2">&quot;Yamoe&quot;</span><span class="p">,</span>
4342
+ <span class="n">revision</span><span class="o">=</span><span class="s2">&quot;v0.3.0&quot;</span><span class="p">,</span>
4343
+ <span class="p">)</span>
4344
+ <span class="p">}</span>
4345
+ <span class="p">}</span>
4346
+ <span class="p">}</span>
4347
+ <span class="n">register_kernel_mapping</span><span class="p">(</span><span class="n">custom_mapping</span><span class="p">)</span>
4348
+
4349
+
4350
+ <span class="n">model</span> <span class="o">=</span> <span class="n">GptOssForCausalLM</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
4351
+ <span class="n">model_id</span><span class="p">,</span>
4352
+ <span class="n">dtype</span><span class="o">=</span><span class="s2">&quot;bfloat16&quot;</span><span class="p">,</span>
4353
+ <span class="n">device_map</span><span class="o">=</span><span class="s2">&quot;auto&quot;</span><span class="p">,</span>
4354
+ <span class="n">use_kernels</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
4355
+ <span class="n">quantization_config</span><span class="o">=</span><span class="n">quantization_config</span><span class="p">,</span>
4356
+ <span class="p">)</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
4357
+
4358
+ <span class="n">messages</span> <span class="o">=</span> <span class="p">[</span>
4359
+ <span class="p">{</span><span class="s2">&quot;role&quot;</span><span class="p">:</span> <span class="s2">&quot;system&quot;</span><span class="p">,</span> <span class="s2">&quot;content&quot;</span><span class="p">:</span> <span class="s2">&quot;What is Tensor Parallelism?&quot;</span><span class="p">},</span>
4360
+ <span class="p">]</span>
4361
+
4362
+ <span class="n">inputs</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">apply_chat_template</span><span class="p">(</span>
4363
+ <span class="n">messages</span><span class="p">,</span>
4364
+ <span class="n">add_generation_prompt</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
4365
+ <span class="n">return_tensors</span><span class="o">=</span><span class="s2">&quot;pt&quot;</span><span class="p">,</span>
4366
+ <span class="n">return_dict</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
4367
+ <span class="n">reasoning_effort</span><span class="o">=</span><span class="s2">&quot;low&quot;</span><span class="p">,</span>
4368
+ <span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s2">&quot;cuda&quot;</span><span class="p">)</span>
4369
+
4370
+ <span class="n">max_tokens</span> <span class="o">=</span> <span class="mi">512</span>
4371
+
4372
+ <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">inference_mode</span><span class="p">():</span>
4373
+ <span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
4374
+ <span class="n">generated</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span>
4375
+ <span class="o">**</span><span class="n">inputs</span><span class="p">,</span>
4376
+ <span class="n">max_new_tokens</span><span class="o">=</span><span class="n">max_tokens</span><span class="p">,</span>
4377
+ <span class="n">do_sample</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
4378
+ <span class="n">temperature</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
4379
+ <span class="p">)</span>
4380
+ <span class="n">end_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
4381
+
4382
+ <span class="nb">print</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">generated</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">skip_special_tokens</span><span class="o">=</span><span class="kc">False</span><span class="p">))</span>
4383
+ <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Generation took </span><span class="si">{</span><span class="n">end_time</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">start_time</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2"> seconds&quot;</span><span class="p">)</span>
4384
+ </pre></div>
4385
+
4386
+ <div class="code-line-highlight" id="line-highlight-setup2"></div>
4387
+ </div>
4388
+ </div>
4389
+ </div>
4390
+ <div id="output-setup2" class="cell-output">
4391
+ <div class="cell-stdout">&lt;|start|&gt;system&lt;|message|&gt;You are ChatGPT, a large language model trained by OpenAI.
4392
+ Knowledge cutoff: 2024-06
4393
+ Current date: 2025-09-23
4394
+
4395
+ Reasoning: low
4396
+
4397
+ # Valid channels: analysis, commentary, final. Channel must be included for every message.&lt;|end|&gt;&lt;|start|&gt;developer&lt;|message|&gt;# Instructions
4398
+
4399
+ What is Tensor Parallelism?
4400
+
4401
+ &lt;|end|&gt;&lt;|start|&gt;assistant&lt;|channel|&gt;analysis&lt;|message|&gt;We need to explain what Tensor Parallelism is. It&#x27;s a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices. Provide details: how it works, benefits, challenges, typical frameworks, etc. Also mention difference from data parallelism, pipeline parallelism. Provide example: splitting a weight matrix across GPUs, each GPU holds a slice, compute partial results, then gather. Provide mention of communication overhead, scaling, etc. Also mention that it&#x27;s used in large models like GPT-3, Megatron-LM, DeepSpeed, etc. Provide explanation of how it reduces memory usage, increases throughput. Provide mention of &quot;tensor model parallelism&quot; vs &quot;tensor parallelism&quot; synonyms. Provide mention of &quot;tensor parallelism&quot; in context of huggingface accelerate, DeepSpeed, Megatron. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in the &quot;DeepSpeed ZeRO-Offload&quot; or &quot;ZeRO-3&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&quot; and &quot;Megatron-LM&quot; and &quot;DeepSpeed&#x27;s ZeRO&quot; and &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the
4402
+ Generation took 57.93 seconds
4403
+ </div>
4404
+ <div class="uv-install-logs" id="uv-logs-setup2">
4405
+ <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4406
+ <div class="uv-logs-content" style="display: none;">
4407
+ Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
4408
+ Downloading cpython-3.13.7-linux-x86_64-gnu (download)
4409
+ Updating https://github.com/huggingface/transformers.git (HEAD)
4410
+ Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
4411
+ Downloading numpy (15.9MiB)
4412
  Downloading pygments (1.2MiB)
4413
+ Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
4414
+ Downloading pillow (6.3MiB)
4415
+ Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4416
+ Downloading nvidia-cusolver-cu12 (255.1MiB)
4417
+ Downloading networkx (1.9MiB)
4418
+ Downloading nvidia-cufile-cu12 (1.1MiB)
4419
+ Downloading tokenizers (3.1MiB)
4420
+ Downloading hf-xet (3.0MiB)
4421
+ Downloading nvidia-cublas-cu12 (566.8MiB)
4422
+ Downloading nvidia-cudnn-cu12 (674.0MiB)
4423
+ Downloading nvidia-cufft-cu12 (184.2MiB)
4424
+ Downloading sympy (6.0MiB)
4425
  Downloading nvidia-curand-cu12 (60.7MiB)
4426
+ Downloading nvidia-cusparse-cu12 (274.9MiB)
4427
+ Downloading jedi (1.5MiB)
4428
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
4429
+ Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4430
+ Downloading nvidia-nccl-cu12 (307.4MiB)
4431
  Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4432
+ Downloading torch (846.8MiB)
4433
  Downloading triton (148.4MiB)
4434
  Downloading matplotlib (8.3MiB)
 
4435
  Downloading kiwisolver (1.4MiB)
4436
+ Downloading fonttools (4.7MiB)
4437
  Downloading nvidia-cufile-cu12
4438
  Downloading kiwisolver
4439
  Downloading pygments
 
4460
  Downloading nvidia-cublas-cu12
4461
  Downloading nvidia-cudnn-cu12
4462
  Downloading torch
4463
+ Installed 69 packages in 460ms
4464
  </div>
4465
  </div>
4466
  <div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00&lt;?, ?it/s]
4467
+ Fetching 3 files: 33%|███▎ | 1/3 [00:07&lt;00:14, 7.31s/it]
4468
+ Fetching 3 files: 67%|██████▋ | 2/3 [00:08&lt;00:03, 3.67s/it]
4469
+ Fetching 3 files: 100%|██████████| 3/3 [00:08&lt;00:00, 2.81s/it]
4470
  You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
 
4471
 
4472
  Loading checkpoint shards: 0%| | 0/3 [00:00&lt;?, ?it/s]
4473
+ Loading checkpoint shards: 33%|███▎ | 1/3 [00:02&lt;00:04, 2.35s/it]
4474
+ Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04&lt;00:02, 2.25s/it]
4475
+ Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.80s/it]
4476
+ Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.93s/it]
4477
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4478
+
4479
+ Fetching 66 files: 0%| | 0/66 [00:00&lt;?, ?it/s]
4480
+ Fetching 66 files: 2%|▏ | 1/66 [00:00&lt;00:17, 3.80it/s]
4481
+ Fetching 66 files: 9%|▉ | 6/66 [00:00&lt;00:03, 19.77it/s]
4482
+ Fetching 66 files: 14%|█▎ | 9/66 [00:00&lt;00:02, 19.05it/s]
4483
+ Fetching 66 files: 26%|██▌ | 17/66 [00:01&lt;00:04, 10.13it/s]
4484
+ Fetching 66 files: 86%|████████▋ | 57/66 [00:01&lt;00:00, 47.70it/s]
4485
+ Fetching 66 files: 100%|██████████| 66/66 [00:01&lt;00:00, 36.16it/s]
4486
+ /tmp/uvnote-run-d2g9g4zl/home/.cache/uv/environments-v2/setup2-ea0d7cee95bc10c1/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4487
+ No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4488
+ warnings.warn(
4489
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4490
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4491
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4492
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4493
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4494
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4495
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4496
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4497
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4498
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4499
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4500
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4501
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4502
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4503
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4504
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4505
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4506
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4507
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4508
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4509
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4510
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4511
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4512
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4513
+ /tmp/uvnote-run-d2g9g4zl/home/.cache/uv/environments-v2/setup2-ea0d7cee95bc10c1/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4514
+ No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4515
+ warnings.warn(
4516
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4517
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4518
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4519
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4520
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4521
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4522
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4523
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4524
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4525
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4526
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4527
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4528
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4529
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4530
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4531
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4532
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4533
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4534
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4535
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4536
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4537
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4538
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`</div>
4539
  </div>
4540
  </div>
 
 
4541
  </div>
4542
 
4543
  </body>
note_test_override.html CHANGED
@@ -3711,14 +3711,14 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
3711
  </div>
3712
 
3713
  <div class="main-content">
3714
- <div class="cell cell-failed" id="cell-setup">
3715
  <div class="cell-header">
3716
  <span class="collapse-indicators">
3717
  <span onclick="toggleCode('setup')" style="cursor: pointer;">▼ code</span>
3718
  <span onclick="toggleOutput('setup')" style="cursor: pointer;">▼ output</span>
3719
  <span id="uv-indicator-setup" onclick="toggleUvLogsFromHeader('setup')" style="cursor: pointer;">▶ uv-logs</span>
3720
  </span> |
3721
- Cell: setup | 99.80s | FAILED
3722
  | <button class="run-btn" onclick="runCell('setup')">▶ run</button>
3723
  <button class="copy-btn" onclick="copyCell('setup')">Copy</button>
3724
  <a href="cells/setup.py" target="_blank" class="raw-btn">Raw</a>
@@ -3967,6 +3967,37 @@ Cell: setup | 99.80s | FAILED
3967
  </div>
3968
  </div>
3969
  <div id="output-setup" class="cell-output">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3970
  <div class="uv-install-logs" id="uv-logs-setup">
3971
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
3972
  <div class="uv-logs-content" style="display: none;">
@@ -3974,32 +4005,435 @@ Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
3974
  Downloading cpython-3.13.7-linux-x86_64-gnu (download)
3975
  Updating https://github.com/huggingface/transformers.git (HEAD)
3976
  Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
3977
- Downloading nvidia-cublas-cu12 (566.8MiB)
3978
  Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
3979
- Downloading nvidia-cufile-cu12 (1.1MiB)
3980
- Downloading jedi (1.5MiB)
3981
- Downloading nvidia-cusparselt-cu12 (273.9MiB)
3982
  Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
3983
- Downloading nvidia-cusparse-cu12 (274.9MiB)
3984
  Downloading sympy (6.0MiB)
3985
- Downloading nvidia-cusolver-cu12 (255.1MiB)
3986
- Downloading hf-xet (3.0MiB)
3987
- Downloading nvidia-cufft-cu12 (184.2MiB)
3988
- Downloading nvidia-cudnn-cu12 (674.0MiB)
3989
- Downloading networkx (1.9MiB)
3990
- Downloading numpy (15.9MiB)
3991
- Downloading torch (846.8MiB)
3992
  Downloading pillow (6.3MiB)
 
 
 
 
 
 
 
 
3993
  Downloading nvidia-nccl-cu12 (307.4MiB)
 
3994
  Downloading nvidia-nvjitlink-cu12 (37.4MiB)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3995
  Downloading pygments (1.2MiB)
 
 
 
 
 
 
 
 
 
 
 
 
3996
  Downloading nvidia-curand-cu12 (60.7MiB)
 
 
 
 
 
3997
  Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
3998
- Downloading tokenizers (3.1MiB)
3999
  Downloading triton (148.4MiB)
4000
  Downloading matplotlib (8.3MiB)
4001
- Downloading fonttools (4.7MiB)
4002
  Downloading kiwisolver (1.4MiB)
 
4003
  Downloading nvidia-cufile-cu12
4004
  Downloading kiwisolver
4005
  Downloading pygments
@@ -4026,77 +4460,84 @@ Downloading kiwisolver (1.4MiB)
4026
  Downloading nvidia-cublas-cu12
4027
  Downloading nvidia-cudnn-cu12
4028
  Downloading torch
4029
- Installed 69 packages in 465ms
4030
  </div>
4031
  </div>
4032
  <div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00&lt;?, ?it/s]
4033
- Fetching 3 files: 33%|███▎ | 1/3 [00:15&lt;00:31, 15.83s/it]
4034
- Fetching 3 files: 67%|██████▋ | 2/3 [00:18&lt;00:08, 8.05s/it]
4035
- Fetching 3 files: 100%|██████████| 3/3 [00:18&lt;00:00, 6.14s/it]
4036
  You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
4037
- INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
4038
 
4039
  Loading checkpoint shards: 0%| | 0/3 [00:00&lt;?, ?it/s]
4040
- Loading checkpoint shards: 33%|███▎ | 1/3 [00:07&lt;00:15, 7.50s/it]
4041
- Loading checkpoint shards: 67%|██████▋ | 2/3 [00:14&lt;00:07, 7.33s/it]
4042
- Loading checkpoint shards: 67%|██████▋ | 2/3 [00:15&lt;00:07, 7.51s/it]
4043
- Traceback (most recent call last):
4044
- File &quot;/tmp/uvnote_5cbrsnjg/.uvnote/cells/setup.py&quot;, line 83, in &lt;module&gt;
4045
- model = GptOssForCausalLM.from_pretrained(
4046
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
4047
- model_id,
4048
- ^^^^^^^^^
4049
- ...&lt;3 lines&gt;...
4050
- quantization_config=quantization_config,
4051
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4052
- ).eval()
4053
- ^
4054
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/modeling_utils.py&quot;, line 285, in _wrapper
4055
- return func(*args, **kwargs)
4056
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/modeling_utils.py&quot;, line 5035, in from_pretrained
4057
- ) = cls._load_pretrained_model(
4058
- ~~~~~~~~~~~~~~~~~~~~~~~~~~^
4059
- model,
4060
- ^^^^^^
4061
- ...&lt;13 lines&gt;...
4062
- weights_only=weights_only,
4063
- ^^^^^^^^^^^^^^^^^^^^^^^^^^
4064
- )
4065
- ^
4066
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/modeling_utils.py&quot;, line 5488, in _load_pretrained_model
4067
- _error_msgs, disk_offload_index, cpu_offload_index = load_shard_file(args)
4068
- ~~~~~~~~~~~~~~~^^^^^^
4069
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/modeling_utils.py&quot;, line 932, in load_shard_file
4070
- disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model(
4071
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
4072
- model_to_load,
4073
- ^^^^^^^^^^^^^^
4074
- ...&lt;13 lines&gt;...
4075
- device_mesh=device_mesh,
4076
- ^^^^^^^^^^^^^^^^^^^^^^^^
4077
- )
4078
- ^
4079
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/torch/utils/_contextlib.py&quot;, line 120, in decorate_context
4080
- return func(*args, **kwargs)
4081
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/modeling_utils.py&quot;, line 840, in _load_state_dict_into_meta_model
4082
- hf_quantizer.create_quantized_param(
4083
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
4084
- model, param, param_name, param_device, state_dict, unexpected_keys
4085
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4086
- )
4087
- ^
4088
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/quantizers/quantizer_mxfp4.py&quot;, line 249, in create_quantized_param
4089
- dequantize(module, param_name, param_value, target_device, dq_param_name, **shard_kwargs)
4090
- ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4091
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/integrations/mxfp4.py&quot;, line 329, in dequantize
4092
- dequantized = convert_moe_packed_tensors(getattr(module, blocks_attr), getattr(module, scales_attr))
4093
- File &quot;/tmp/uvnote-run-vr4catz8/home/.cache/uv/environments-v2/setup-4117b8f0d0f9a3df/lib/python3.13/site-packages/transformers/integrations/mxfp4.py&quot;, line 117, in convert_moe_packed_tensors
4094
- idx_hi = (blk &gt;&gt; 4).to(torch.long)
4095
- torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.98 GiB. GPU 0 has a total capacity of 22.30 GiB of which 1.69 GiB is free. Process 43404 has 20.61 GiB memory in use. Of the allocated memory 17.37 GiB is allocated by PyTorch, and 2.96 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)</div>
 
 
 
 
 
 
 
 
 
 
4096
  </div>
4097
  </div>
4098
-
4099
- <h1>Reference kernel</h1>
4100
  </div>
4101
 
4102
  </body>
 
3711
  </div>
3712
 
3713
  <div class="main-content">
3714
+ <div class="cell" id="cell-setup">
3715
  <div class="cell-header">
3716
  <span class="collapse-indicators">
3717
  <span onclick="toggleCode('setup')" style="cursor: pointer;">▼ code</span>
3718
  <span onclick="toggleOutput('setup')" style="cursor: pointer;">▼ output</span>
3719
  <span id="uv-indicator-setup" onclick="toggleUvLogsFromHeader('setup')" style="cursor: pointer;">▶ uv-logs</span>
3720
  </span> |
3721
+ Cell: setup | 132.82s
3722
  | <button class="run-btn" onclick="runCell('setup')">▶ run</button>
3723
  <button class="copy-btn" onclick="copyCell('setup')">Copy</button>
3724
  <a href="cells/setup.py" target="_blank" class="raw-btn">Raw</a>
 
3967
  </div>
3968
  </div>
3969
  <div id="output-setup" class="cell-output">
3970
+ <div class="cell-stdout">&lt;|start|&gt;system&lt;|message|&gt;You are ChatGPT, a large language model trained by OpenAI.
3971
+ Knowledge cutoff: 2024-06
3972
+ Current date: 2025-09-23
3973
+
3974
+ Reasoning: low
3975
+
3976
+ # Valid channels: analysis, commentary, final. Channel must be included for every message.&lt;|end|&gt;&lt;|start|&gt;developer&lt;|message|&gt;# Instructions
3977
+
3978
+ What is Tensor Parallelism?
3979
+
3980
+ &lt;|end|&gt;&lt;|start|&gt;assistant&lt;|channel|&gt;analysis&lt;|message|&gt;We need to explain what Tensor Parallelism is. It&#x27;s a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices, so each device holds a slice of the matrix. During forward/backward passes, each device computes partial results and then they are aggregated. It&#x27;s used to scale up models beyond single device memory. Also mention pipeline parallelism, data parallelism. Provide details: e.g., for a linear layer weight matrix W of shape (out_features, in_features), we can split along out_features dimension across devices. Each device computes its part of the output. Then gather. Similarly for attention QKV projections. Provide example: GPT-3 uses tensor parallelism. Also mention frameworks: Megatron-LM, DeepSpeed, etc. Provide pros/cons. Provide typical implementation: using torch.distributed.all_reduce, gather, etc. Provide code snippet. Also mention that it&#x27;s different from data parallelism. Provide explanation of how it works in practice. Provide mention of &quot;tensor model parallelism&quot; vs &quot;tensor parallelism&quot; synonyms. Provide mention of &quot;tensor parallelism&quot; in context of &quot;DeepSpeed ZeRO Stage 3&quot; or &quot;Megatron-LM&quot;. Provide mention of &quot;tensor parallelism&quot; as part of &quot;model parallelism&quot; to reduce memory usage. Provide mention of &quot;tensor parallelism&quot; as &quot;splitting weight matrices across GPUs&quot; and &quot;communication overhead&quot;.
3981
+
3982
+ Also mention that it&#x27;s used for large transformer models like GPT-3, LLaMA, etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;DeepSpeed&#x27;s ZeRO-Offload&quot; or &quot;ZeRO-3&quot;.
3983
+
3984
+ Also mention that &quot;tensor parallelism&quot; can be combined with &quot;pipeline parallelism&quot; and &quot;data parallelism&quot; to achieve full scaling.
3985
+
3986
+ Also mention that &quot;tensor parallelism&quot; can be implemented by splitting the weight matrix along the output dimension, performing local matrix multiplication, then all-reduce to sum partial outputs.
3987
+
3988
+ Also mention that &quot;tensor parallelism&quot; can be used for linear layers, self-attention, feed-forward networks, etc.
3989
+
3990
+ Also mention that &quot;tensor parallelism&quot; can be used for &quot;embedding tables&quot; by sharding them across devices.
3991
+
3992
+ Also mention that &quot;tensor parallelism&quot; can be used for &quot;attention heads&quot; by splitting across heads.
3993
+
3994
+ Also mention that &quot;tensor parallelism&quot; can be used for &quot;parameter sharding&quot;.
3995
+
3996
+ Also mention that &quot;tensor parallelism&quot; can be used for &quot;model parallelism&quot; to reduce memory usage.
3997
+
3998
+
3999
+ Generation took 51.90 seconds
4000
+ </div>
4001
  <div class="uv-install-logs" id="uv-logs-setup">
4002
  <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4003
  <div class="uv-logs-content" style="display: none;">
 
4005
  Downloading cpython-3.13.7-linux-x86_64-gnu (download)
4006
  Updating https://github.com/huggingface/transformers.git (HEAD)
4007
  Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
 
4008
  Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
4009
+ Downloading nvidia-cusolver-cu12 (255.1MiB)
 
 
4010
  Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4011
+ Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4012
  Downloading sympy (6.0MiB)
4013
+ Downloading jedi (1.5MiB)
4014
+ Downloading fonttools (4.7MiB)
 
 
 
 
 
4015
  Downloading pillow (6.3MiB)
4016
+ Downloading nvidia-cusparse-cu12 (274.9MiB)
4017
+ Downloading nvidia-curand-cu12 (60.7MiB)
4018
+ Downloading numpy (15.9MiB)
4019
+ Downloading nvidia-cufft-cu12 (184.2MiB)
4020
+ Downloading matplotlib (8.3MiB)
4021
+ Downloading pygments (1.2MiB)
4022
+ Downloading nvidia-cublas-cu12 (566.8MiB)
4023
+ Downloading kiwisolver (1.4MiB)
4024
  Downloading nvidia-nccl-cu12 (307.4MiB)
4025
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
4026
  Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4027
+ Downloading networkx (1.9MiB)
4028
+ Downloading hf-xet (3.0MiB)
4029
+ Downloading nvidia-cudnn-cu12 (674.0MiB)
4030
+ Downloading torch (846.8MiB)
4031
+ Downloading tokenizers (3.1MiB)
4032
+ Downloading nvidia-cufile-cu12 (1.1MiB)
4033
+ Downloading triton (148.4MiB)
4034
+ Downloading nvidia-cufile-cu12
4035
+ Downloading kiwisolver
4036
+ Downloading pygments
4037
+ Downloading hf-xet
4038
+ Downloading tokenizers
4039
+ Downloading networkx
4040
+ Downloading fonttools
4041
+ Downloading pillow
4042
+ Downloading matplotlib
4043
+ Downloading nvidia-cuda-cupti-cu12
4044
+ Downloading numpy
4045
+ Downloading sympy
4046
+ Downloading nvidia-nvjitlink-cu12
4047
+ Built transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
4048
+ Downloading jedi
4049
+ Downloading nvidia-curand-cu12
4050
+ Downloading nvidia-cuda-nvrtc-cu12
4051
+ Downloading triton
4052
+ Downloading nvidia-cufft-cu12
4053
+ Downloading nvidia-cusolver-cu12
4054
+ Downloading nvidia-cusparselt-cu12
4055
+ Downloading nvidia-cusparse-cu12
4056
+ Downloading nvidia-nccl-cu12
4057
+ Downloading nvidia-cublas-cu12
4058
+ Downloading nvidia-cudnn-cu12
4059
+ Downloading torch
4060
+ Installed 69 packages in 539ms
4061
+ </div>
4062
+ </div>
4063
+ <div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00&lt;?, ?it/s]
4064
+ Fetching 3 files: 33%|███▎ | 1/3 [00:07&lt;00:15, 7.55s/it]
4065
+ Fetching 3 files: 67%|██████▋ | 2/3 [00:08&lt;00:03, 3.72s/it]
4066
+ Fetching 3 files: 100%|██████████| 3/3 [00:08&lt;00:00, 2.87s/it]
4067
+ You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
4068
+
4069
+ Loading checkpoint shards: 0%| | 0/3 [00:00&lt;?, ?it/s]
4070
+ Loading checkpoint shards: 33%|███▎ | 1/3 [00:02&lt;00:04, 2.34s/it]
4071
+ Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04&lt;00:02, 2.28s/it]
4072
+ Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.82s/it]
4073
+ Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.95s/it]
4074
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4075
+
4076
+ Fetching 6 files: 0%| | 0/6 [00:00&lt;?, ?it/s]
4077
+ Fetching 6 files: 17%|█▋ | 1/6 [00:00&lt;00:00, 5.35it/s]
4078
+ Fetching 6 files: 50%|█████ | 3/6 [00:00&lt;00:00, 6.55it/s]
4079
+ Fetching 6 files: 100%|██████████| 6/6 [00:00&lt;00:00, 12.81it/s]
4080
+ /tmp/uvnote-run-og9tszom/home/.cache/uv/environments-v2/setup-d9b6d9dd835772a9/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4081
+ No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4082
+ warnings.warn(
4083
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4084
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4085
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4086
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4087
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4088
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4089
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4090
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4091
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4092
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4093
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4094
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4095
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4096
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4097
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4098
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4099
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4100
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4101
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4102
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4103
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4104
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4105
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4106
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4107
+ /tmp/uvnote-run-og9tszom/home/.cache/uv/environments-v2/setup-d9b6d9dd835772a9/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4108
+ No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4109
+ warnings.warn(
4110
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4111
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4112
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4113
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4114
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4115
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4116
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4117
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4118
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4119
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4120
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4121
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4122
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4123
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4124
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4125
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4126
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4127
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4128
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4129
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4130
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4131
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
4132
+ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`</div>
4133
+ </div>
4134
+ </div>
4135
+
4136
+ <h1>Reference kernel</h1>
4137
+ <div class="cell" id="cell-setup2">
4138
+ <div class="cell-header">
4139
+ <span class="collapse-indicators">
4140
+ <span onclick="toggleCode('setup2')" style="cursor: pointer;">▼ code</span>
4141
+ <span onclick="toggleOutput('setup2')" style="cursor: pointer;">▼ output</span>
4142
+ <span id="uv-indicator-setup2" onclick="toggleUvLogsFromHeader('setup2')" style="cursor: pointer;">▶ uv-logs</span>
4143
+ </span> |
4144
+ Cell: setup2 | 140.15s
4145
+ | <button class="run-btn" onclick="runCell('setup2')">▶ run</button>
4146
+ <button class="copy-btn" onclick="copyCell('setup2')">Copy</button>
4147
+ <a href="cells/setup2.py" target="_blank" class="raw-btn">Raw</a>
4148
+ </div>
4149
+ <div id="code-setup2" class="cell-code" data-lines="115">
4150
+ <div class="highlight-with-lines">
4151
+ <div class="line-numbers" id="lines-setup2">
4152
+ <a class="line-number" data-cell="setup2" data-line="1" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 1, true);">1</a>
4153
+ <a class="line-number" data-cell="setup2" data-line="2" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 2, true);">2</a>
4154
+ <a class="line-number" data-cell="setup2" data-line="3" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 3, true);">3</a>
4155
+ <a class="line-number" data-cell="setup2" data-line="4" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 4, true);">4</a>
4156
+ <a class="line-number" data-cell="setup2" data-line="5" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 5, true);">5</a>
4157
+ <a class="line-number" data-cell="setup2" data-line="6" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 6, true);">6</a>
4158
+ <a class="line-number" data-cell="setup2" data-line="7" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 7, true);">7</a>
4159
+ <a class="line-number" data-cell="setup2" data-line="8" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 8, true);">8</a>
4160
+ <a class="line-number" data-cell="setup2" data-line="9" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 9, true);">9</a>
4161
+ <a class="line-number" data-cell="setup2" data-line="10" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 10, true);">10</a>
4162
+ <a class="line-number" data-cell="setup2" data-line="11" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 11, true);">11</a>
4163
+ <a class="line-number" data-cell="setup2" data-line="12" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 12, true);">12</a>
4164
+ <a class="line-number" data-cell="setup2" data-line="13" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 13, true);">13</a>
4165
+ <a class="line-number" data-cell="setup2" data-line="14" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 14, true);">14</a>
4166
+ <a class="line-number" data-cell="setup2" data-line="15" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 15, true);">15</a>
4167
+ <a class="line-number" data-cell="setup2" data-line="16" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 16, true);">16</a>
4168
+ <a class="line-number" data-cell="setup2" data-line="17" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 17, true);">17</a>
4169
+ <a class="line-number" data-cell="setup2" data-line="18" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 18, true);">18</a>
4170
+ <a class="line-number" data-cell="setup2" data-line="19" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 19, true);">19</a>
4171
+ <a class="line-number" data-cell="setup2" data-line="20" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 20, true);">20</a>
4172
+ <a class="line-number" data-cell="setup2" data-line="21" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 21, true);">21</a>
4173
+ <a class="line-number" data-cell="setup2" data-line="22" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 22, true);">22</a>
4174
+ <a class="line-number" data-cell="setup2" data-line="23" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 23, true);">23</a>
4175
+ <a class="line-number" data-cell="setup2" data-line="24" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 24, true);">24</a>
4176
+ <a class="line-number" data-cell="setup2" data-line="25" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 25, true);">25</a>
4177
+ <a class="line-number" data-cell="setup2" data-line="26" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 26, true);">26</a>
4178
+ <a class="line-number" data-cell="setup2" data-line="27" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 27, true);">27</a>
4179
+ <a class="line-number" data-cell="setup2" data-line="28" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 28, true);">28</a>
4180
+ <a class="line-number" data-cell="setup2" data-line="29" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 29, true);">29</a>
4181
+ <a class="line-number" data-cell="setup2" data-line="30" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 30, true);">30</a>
4182
+ <a class="line-number" data-cell="setup2" data-line="31" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 31, true);">31</a>
4183
+ <a class="line-number" data-cell="setup2" data-line="32" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 32, true);">32</a>
4184
+ <a class="line-number" data-cell="setup2" data-line="33" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 33, true);">33</a>
4185
+ <a class="line-number" data-cell="setup2" data-line="34" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 34, true);">34</a>
4186
+ <a class="line-number" data-cell="setup2" data-line="35" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 35, true);">35</a>
4187
+ <a class="line-number" data-cell="setup2" data-line="36" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 36, true);">36</a>
4188
+ <a class="line-number" data-cell="setup2" data-line="37" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 37, true);">37</a>
4189
+ <a class="line-number" data-cell="setup2" data-line="38" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 38, true);">38</a>
4190
+ <a class="line-number" data-cell="setup2" data-line="39" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 39, true);">39</a>
4191
+ <a class="line-number" data-cell="setup2" data-line="40" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 40, true);">40</a>
4192
+ <a class="line-number" data-cell="setup2" data-line="41" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 41, true);">41</a>
4193
+ <a class="line-number" data-cell="setup2" data-line="42" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 42, true);">42</a>
4194
+ <a class="line-number" data-cell="setup2" data-line="43" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 43, true);">43</a>
4195
+ <a class="line-number" data-cell="setup2" data-line="44" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 44, true);">44</a>
4196
+ <a class="line-number" data-cell="setup2" data-line="45" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 45, true);">45</a>
4197
+ <a class="line-number" data-cell="setup2" data-line="46" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 46, true);">46</a>
4198
+ <a class="line-number" data-cell="setup2" data-line="47" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 47, true);">47</a>
4199
+ <a class="line-number" data-cell="setup2" data-line="48" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 48, true);">48</a>
4200
+ <a class="line-number" data-cell="setup2" data-line="49" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 49, true);">49</a>
4201
+ <a class="line-number" data-cell="setup2" data-line="50" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 50, true);">50</a>
4202
+ <a class="line-number" data-cell="setup2" data-line="51" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 51, true);">51</a>
4203
+ <a class="line-number" data-cell="setup2" data-line="52" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 52, true);">52</a>
4204
+ <a class="line-number" data-cell="setup2" data-line="53" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 53, true);">53</a>
4205
+ <a class="line-number" data-cell="setup2" data-line="54" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 54, true);">54</a>
4206
+ <a class="line-number" data-cell="setup2" data-line="55" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 55, true);">55</a>
4207
+ <a class="line-number" data-cell="setup2" data-line="56" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 56, true);">56</a>
4208
+ <a class="line-number" data-cell="setup2" data-line="57" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 57, true);">57</a>
4209
+ <a class="line-number" data-cell="setup2" data-line="58" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 58, true);">58</a>
4210
+ <a class="line-number" data-cell="setup2" data-line="59" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 59, true);">59</a>
4211
+ <a class="line-number" data-cell="setup2" data-line="60" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 60, true);">60</a>
4212
+ <a class="line-number" data-cell="setup2" data-line="61" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 61, true);">61</a>
4213
+ <a class="line-number" data-cell="setup2" data-line="62" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 62, true);">62</a>
4214
+ <a class="line-number" data-cell="setup2" data-line="63" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 63, true);">63</a>
4215
+ <a class="line-number" data-cell="setup2" data-line="64" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 64, true);">64</a>
4216
+ <a class="line-number" data-cell="setup2" data-line="65" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 65, true);">65</a>
4217
+ <a class="line-number" data-cell="setup2" data-line="66" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 66, true);">66</a>
4218
+ <a class="line-number" data-cell="setup2" data-line="67" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 67, true);">67</a>
4219
+ <a class="line-number" data-cell="setup2" data-line="68" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 68, true);">68</a>
4220
+ <a class="line-number" data-cell="setup2" data-line="69" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 69, true);">69</a>
4221
+ <a class="line-number" data-cell="setup2" data-line="70" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 70, true);">70</a>
4222
+ <a class="line-number" data-cell="setup2" data-line="71" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 71, true);">71</a>
4223
+ <a class="line-number" data-cell="setup2" data-line="72" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 72, true);">72</a>
4224
+ <a class="line-number" data-cell="setup2" data-line="73" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 73, true);">73</a>
4225
+ <a class="line-number" data-cell="setup2" data-line="74" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 74, true);">74</a>
4226
+ <a class="line-number" data-cell="setup2" data-line="75" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 75, true);">75</a>
4227
+ <a class="line-number" data-cell="setup2" data-line="76" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 76, true);">76</a>
4228
+ <a class="line-number" data-cell="setup2" data-line="77" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 77, true);">77</a>
4229
+ <a class="line-number" data-cell="setup2" data-line="78" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 78, true);">78</a>
4230
+ <a class="line-number" data-cell="setup2" data-line="79" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 79, true);">79</a>
4231
+ <a class="line-number" data-cell="setup2" data-line="80" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 80, true);">80</a>
4232
+ <a class="line-number" data-cell="setup2" data-line="81" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 81, true);">81</a>
4233
+ <a class="line-number" data-cell="setup2" data-line="82" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 82, true);">82</a>
4234
+ <a class="line-number" data-cell="setup2" data-line="83" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 83, true);">83</a>
4235
+ <a class="line-number" data-cell="setup2" data-line="84" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 84, true);">84</a>
4236
+ <a class="line-number" data-cell="setup2" data-line="85" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 85, true);">85</a>
4237
+ <a class="line-number" data-cell="setup2" data-line="86" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 86, true);">86</a>
4238
+ <a class="line-number" data-cell="setup2" data-line="87" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 87, true);">87</a>
4239
+ <a class="line-number" data-cell="setup2" data-line="88" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 88, true);">88</a>
4240
+ <a class="line-number" data-cell="setup2" data-line="89" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 89, true);">89</a>
4241
+ <a class="line-number" data-cell="setup2" data-line="90" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 90, true);">90</a>
4242
+ <a class="line-number" data-cell="setup2" data-line="91" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 91, true);">91</a>
4243
+ <a class="line-number" data-cell="setup2" data-line="92" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 92, true);">92</a>
4244
+ <a class="line-number" data-cell="setup2" data-line="93" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 93, true);">93</a>
4245
+ <a class="line-number" data-cell="setup2" data-line="94" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 94, true);">94</a>
4246
+ <a class="line-number" data-cell="setup2" data-line="95" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 95, true);">95</a>
4247
+ <a class="line-number" data-cell="setup2" data-line="96" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 96, true);">96</a>
4248
+ <a class="line-number" data-cell="setup2" data-line="97" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 97, true);">97</a>
4249
+ <a class="line-number" data-cell="setup2" data-line="98" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 98, true);">98</a>
4250
+ <a class="line-number" data-cell="setup2" data-line="99" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 99, true);">99</a>
4251
+ <a class="line-number" data-cell="setup2" data-line="100" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 100, true);">100</a>
4252
+ <a class="line-number" data-cell="setup2" data-line="101" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 101, true);">101</a>
4253
+ <a class="line-number" data-cell="setup2" data-line="102" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 102, true);">102</a>
4254
+ <a class="line-number" data-cell="setup2" data-line="103" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 103, true);">103</a>
4255
+ <a class="line-number" data-cell="setup2" data-line="104" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 104, true);">104</a>
4256
+ <a class="line-number" data-cell="setup2" data-line="105" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 105, true);">105</a>
4257
+ <a class="line-number" data-cell="setup2" data-line="106" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 106, true);">106</a>
4258
+ <a class="line-number" data-cell="setup2" data-line="107" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 107, true);">107</a>
4259
+ <a class="line-number" data-cell="setup2" data-line="108" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 108, true);">108</a>
4260
+ <a class="line-number" data-cell="setup2" data-line="109" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 109, true);">109</a>
4261
+ <a class="line-number" data-cell="setup2" data-line="110" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 110, true);">110</a>
4262
+ <a class="line-number" data-cell="setup2" data-line="111" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 111, true);">111</a>
4263
+ <a class="line-number" data-cell="setup2" data-line="112" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 112, true);">112</a>
4264
+ <a class="line-number" data-cell="setup2" data-line="113" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 113, true);">113</a>
4265
+ <a class="line-number" data-cell="setup2" data-line="114" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 114, true);">114</a>
4266
+ <a class="line-number" data-cell="setup2" data-line="115" href="#cell-setup2" onclick="event.preventDefault(); selectCellLine('setup2', 115, true);">115</a>
4267
+ </div>
4268
+ <div class="code-wrap">
4269
+ <div class="highlight"><pre><span></span><span class="c1"># /// script</span>
4270
+ <span class="c1"># requires-python = &quot;&gt;=3.12&quot;</span>
4271
+ <span class="c1"># dependencies = [</span>
4272
+ <span class="c1"># &quot;accelerate&gt;=1.10.1&quot;,</span>
4273
+ <span class="c1"># &quot;torch&gt;=2.7.0&quot;,</span>
4274
+ <span class="c1"># &quot;kernels==0.10.0&quot;,</span>
4275
+ <span class="c1"># &quot;transformers@https://github.com/huggingface/transformers.git&quot;,</span>
4276
+ <span class="c1"># &quot;ipdb&gt;=0.13.13&quot;,</span>
4277
+ <span class="c1"># &quot;matplotlib&gt;=3.7.2&quot;,</span>
4278
+ <span class="c1"># &quot;numpy&gt;=1.24.3&quot;,</span>
4279
+ <span class="c1"># ]</span>
4280
+ <span class="c1"># ///</span>
4281
+
4282
+ <span class="kn">import</span><span class="w"> </span><span class="nn">torch</span>
4283
+ <span class="kn">from</span><span class="w"> </span><span class="nn">transformers</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssForCausalLM</span><span class="p">,</span> <span class="n">PreTrainedTokenizerFast</span><span class="p">,</span> <span class="n">Mxfp4Config</span>
4284
+ <span class="kn">import</span><span class="w"> </span><span class="nn">time</span>
4285
+ <span class="kn">import</span><span class="w"> </span><span class="nn">torch.nn</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">nn</span>
4286
+ <span class="kn">from</span><span class="w"> </span><span class="nn">kernels</span><span class="w"> </span><span class="kn">import</span> <span class="n">register_kernel_mapping</span><span class="p">,</span> <span class="n">Mode</span><span class="p">,</span> <span class="n">LayerRepository</span>
4287
+ <span class="kn">import</span><span class="w"> </span><span class="nn">sys</span>
4288
+ <span class="kn">import</span><span class="w"> </span><span class="nn">torch.profiler</span>
4289
+ <span class="kn">import</span><span class="w"> </span><span class="nn">gc</span>
4290
+ <span class="kn">import</span><span class="w"> </span><span class="nn">logging</span>
4291
+
4292
+ <span class="c1"># set to debug logging</span>
4293
+ <span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
4294
+
4295
+ <span class="k">def</span><span class="w"> </span><span class="nf">reset_peak_memory_stats</span><span class="p">():</span>
4296
+ <span class="w"> </span><span class="sd">&quot;&quot;&quot;Clear CUDA cache and reset memory allocation counters.&quot;&quot;&quot;</span>
4297
+ <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">empty_cache</span><span class="p">()</span>
4298
+ <span class="k">if</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">is_available</span><span class="p">():</span>
4299
+ <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">reset_peak_memory_stats</span><span class="p">()</span>
4300
+ <span class="n">gc</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
4301
+
4302
+ <span class="k">def</span><span class="w"> </span><span class="nf">get_memory_stats</span><span class="p">():</span>
4303
+ <span class="w"> </span><span class="sd">&quot;&quot;&quot;Get current and peak CUDA memory usage.&quot;&quot;&quot;</span>
4304
+ <span class="k">if</span> <span class="ow">not</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">is_available</span><span class="p">():</span>
4305
+ <span class="k">return</span> <span class="p">{</span><span class="s2">&quot;allocated_gb&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&quot;peak_gb&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">&quot;reserved_gb&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">}</span>
4306
+ <span class="k">return</span> <span class="p">{</span>
4307
+ <span class="s2">&quot;allocated_gb&quot;</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">memory_allocated</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
4308
+ <span class="s2">&quot;peak_gb&quot;</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">max_memory_allocated</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
4309
+ <span class="s2">&quot;reserved_gb&quot;</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">memory_reserved</span><span class="p">()</span> <span class="o">/</span> <span class="mf">1e9</span><span class="p">,</span>
4310
+ <span class="p">}</span>
4311
+
4312
+ <span class="k">def</span><span class="w"> </span><span class="nf">override_kernel_layer_name</span><span class="p">(</span><span class="n">cls_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">bool</span><span class="p">:</span>
4313
+ <span class="w"> </span><span class="sd">&quot;&quot;&quot;Helper to dynamically override the kernel_layer_name in a model class.&quot;&quot;&quot;</span>
4314
+ <span class="k">for</span> <span class="n">mod</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">modules</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
4315
+ <span class="k">if</span> <span class="n">mod</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
4316
+ <span class="k">continue</span>
4317
+ <span class="n">obj</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">cls_name</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
4318
+ <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="nb">type</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">issubclass</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
4319
+ <span class="nb">setattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="s2">&quot;kernel_layer_name&quot;</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
4320
+ <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Overrode </span><span class="si">{</span><span class="n">cls_name</span><span class="si">}</span><span class="s2">.kernel_layer_name to </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
4321
+ <span class="k">return</span> <span class="kc">True</span>
4322
+ <span class="k">return</span> <span class="kc">False</span>
4323
+
4324
+
4325
+ <span class="c1"># Init the model the normal way</span>
4326
+ <span class="n">model_id</span> <span class="o">=</span> <span class="s2">&quot;openai/gpt-oss-20b&quot;</span>
4327
+ <span class="n">tokenizer</span> <span class="o">=</span> <span class="n">PreTrainedTokenizerFast</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="n">model_id</span><span class="p">)</span>
4328
+ <span class="n">quantization_config</span> <span class="o">=</span> <span class="n">Mxfp4Config</span><span class="p">(</span><span class="n">dequantize</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
4329
+
4330
+
4331
+ <span class="kn">from</span><span class="w"> </span><span class="nn">kernels</span><span class="w"> </span><span class="kn">import</span> <span class="n">replace_kernel_forward_from_hub</span><span class="p">,</span> <span class="n">register_kernel_mapping</span><span class="p">,</span> <span class="n">LayerRepository</span><span class="p">,</span> <span class="n">Mode</span>
4332
+
4333
+ <span class="kn">from</span><span class="w"> </span><span class="nn">transformers.models.gpt_oss.modeling_gpt_oss</span><span class="w"> </span><span class="kn">import</span> <span class="n">GptOssMLP</span><span class="p">,</span> <span class="n">GptOssRMSNorm</span>
4334
+
4335
+ <span class="n">replace_kernel_forward_from_hub</span><span class="p">(</span><span class="n">GptOssRMSNorm</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span> <span class="c1"># direct, type-safe</span>
4336
+ <span class="n">custom_mapping</span> <span class="o">=</span> <span class="p">{</span>
4337
+ <span class="s2">&quot;Yamoe&quot;</span><span class="p">:</span> <span class="p">{</span>
4338
+ <span class="s2">&quot;cuda&quot;</span><span class="p">:</span> <span class="p">{</span>
4339
+ <span class="n">Mode</span><span class="o">.</span><span class="n">INFERENCE</span><span class="p">:</span> <span class="n">LayerRepository</span><span class="p">(</span>
4340
+ <span class="n">repo_id</span><span class="o">=</span><span class="s2">&quot;drbh/yamoe&quot;</span><span class="p">,</span>
4341
+ <span class="n">layer_name</span><span class="o">=</span><span class="s2">&quot;Yamoe&quot;</span><span class="p">,</span>
4342
+ <span class="n">revision</span><span class="o">=</span><span class="s2">&quot;v0.3.0&quot;</span><span class="p">,</span>
4343
+ <span class="p">)</span>
4344
+ <span class="p">}</span>
4345
+ <span class="p">}</span>
4346
+ <span class="p">}</span>
4347
+ <span class="n">register_kernel_mapping</span><span class="p">(</span><span class="n">custom_mapping</span><span class="p">)</span>
4348
+
4349
+
4350
+ <span class="n">model</span> <span class="o">=</span> <span class="n">GptOssForCausalLM</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
4351
+ <span class="n">model_id</span><span class="p">,</span>
4352
+ <span class="n">dtype</span><span class="o">=</span><span class="s2">&quot;bfloat16&quot;</span><span class="p">,</span>
4353
+ <span class="n">device_map</span><span class="o">=</span><span class="s2">&quot;auto&quot;</span><span class="p">,</span>
4354
+ <span class="n">use_kernels</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
4355
+ <span class="n">quantization_config</span><span class="o">=</span><span class="n">quantization_config</span><span class="p">,</span>
4356
+ <span class="p">)</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
4357
+
4358
+ <span class="n">messages</span> <span class="o">=</span> <span class="p">[</span>
4359
+ <span class="p">{</span><span class="s2">&quot;role&quot;</span><span class="p">:</span> <span class="s2">&quot;system&quot;</span><span class="p">,</span> <span class="s2">&quot;content&quot;</span><span class="p">:</span> <span class="s2">&quot;What is Tensor Parallelism?&quot;</span><span class="p">},</span>
4360
+ <span class="p">]</span>
4361
+
4362
+ <span class="n">inputs</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">apply_chat_template</span><span class="p">(</span>
4363
+ <span class="n">messages</span><span class="p">,</span>
4364
+ <span class="n">add_generation_prompt</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
4365
+ <span class="n">return_tensors</span><span class="o">=</span><span class="s2">&quot;pt&quot;</span><span class="p">,</span>
4366
+ <span class="n">return_dict</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
4367
+ <span class="n">reasoning_effort</span><span class="o">=</span><span class="s2">&quot;low&quot;</span><span class="p">,</span>
4368
+ <span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s2">&quot;cuda&quot;</span><span class="p">)</span>
4369
+
4370
+ <span class="n">max_tokens</span> <span class="o">=</span> <span class="mi">512</span>
4371
+
4372
+ <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">inference_mode</span><span class="p">():</span>
4373
+ <span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
4374
+ <span class="n">generated</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span>
4375
+ <span class="o">**</span><span class="n">inputs</span><span class="p">,</span>
4376
+ <span class="n">max_new_tokens</span><span class="o">=</span><span class="n">max_tokens</span><span class="p">,</span>
4377
+ <span class="n">do_sample</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
4378
+ <span class="n">temperature</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
4379
+ <span class="p">)</span>
4380
+ <span class="n">end_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">perf_counter</span><span class="p">()</span>
4381
+
4382
+ <span class="nb">print</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">generated</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">skip_special_tokens</span><span class="o">=</span><span class="kc">False</span><span class="p">))</span>
4383
+ <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Generation took </span><span class="si">{</span><span class="n">end_time</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">start_time</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2"> seconds&quot;</span><span class="p">)</span>
4384
+ </pre></div>
4385
+
4386
+ <div class="code-line-highlight" id="line-highlight-setup2"></div>
4387
+ </div>
4388
+ </div>
4389
+ </div>
4390
+ <div id="output-setup2" class="cell-output">
4391
+ <div class="cell-stdout">&lt;|start|&gt;system&lt;|message|&gt;You are ChatGPT, a large language model trained by OpenAI.
4392
+ Knowledge cutoff: 2024-06
4393
+ Current date: 2025-09-23
4394
+
4395
+ Reasoning: low
4396
+
4397
+ # Valid channels: analysis, commentary, final. Channel must be included for every message.&lt;|end|&gt;&lt;|start|&gt;developer&lt;|message|&gt;# Instructions
4398
+
4399
+ What is Tensor Parallelism?
4400
+
4401
+ &lt;|end|&gt;&lt;|start|&gt;assistant&lt;|channel|&gt;analysis&lt;|message|&gt;We need to explain what Tensor Parallelism is. It&#x27;s a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices. Provide details: how it works, benefits, challenges, typical frameworks, etc. Also mention difference from data parallelism, pipeline parallelism. Provide example: splitting a weight matrix across GPUs, each GPU holds a slice, compute partial results, then gather. Provide mention of communication overhead, scaling, etc. Also mention that it&#x27;s used in large models like GPT-3, Megatron-LM, DeepSpeed, etc. Provide explanation of how it reduces memory usage, increases throughput. Provide mention of &quot;tensor model parallelism&quot; vs &quot;tensor parallelism&quot; synonyms. Provide mention of &quot;tensor parallelism&quot; in context of huggingface accelerate, DeepSpeed, Megatron. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in the &quot;DeepSpeed ZeRO-Offload&quot; or &quot;ZeRO-3&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&quot; and &quot;Megatron-LM&quot; and &quot;DeepSpeed&#x27;s ZeRO&quot; and &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the context of &quot;tensor parallelism&quot; in &quot;DeepSpeed&#x27;s ZeRO-3&quot; and &quot;DeepSpeed&#x27;s ZeRO-2&quot; etc. Provide mention of &quot;tensor parallelism&quot; in the
4402
+ Generation took 57.93 seconds
4403
+ </div>
4404
+ <div class="uv-install-logs" id="uv-logs-setup2">
4405
+ <div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
4406
+ <div class="uv-logs-content" style="display: none;">
4407
+ Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
4408
+ Downloading cpython-3.13.7-linux-x86_64-gnu (download)
4409
+ Updating https://github.com/huggingface/transformers.git (HEAD)
4410
+ Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
4411
+ Downloading numpy (15.9MiB)
4412
  Downloading pygments (1.2MiB)
4413
+ Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
4414
+ Downloading pillow (6.3MiB)
4415
+ Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
4416
+ Downloading nvidia-cusolver-cu12 (255.1MiB)
4417
+ Downloading networkx (1.9MiB)
4418
+ Downloading nvidia-cufile-cu12 (1.1MiB)
4419
+ Downloading tokenizers (3.1MiB)
4420
+ Downloading hf-xet (3.0MiB)
4421
+ Downloading nvidia-cublas-cu12 (566.8MiB)
4422
+ Downloading nvidia-cudnn-cu12 (674.0MiB)
4423
+ Downloading nvidia-cufft-cu12 (184.2MiB)
4424
+ Downloading sympy (6.0MiB)
4425
  Downloading nvidia-curand-cu12 (60.7MiB)
4426
+ Downloading nvidia-cusparse-cu12 (274.9MiB)
4427
+ Downloading jedi (1.5MiB)
4428
+ Downloading nvidia-cusparselt-cu12 (273.9MiB)
4429
+ Downloading nvidia-nvjitlink-cu12 (37.4MiB)
4430
+ Downloading nvidia-nccl-cu12 (307.4MiB)
4431
  Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
4432
+ Downloading torch (846.8MiB)
4433
  Downloading triton (148.4MiB)
4434
  Downloading matplotlib (8.3MiB)
 
4435
  Downloading kiwisolver (1.4MiB)
4436
+ Downloading fonttools (4.7MiB)
4437
  Downloading nvidia-cufile-cu12
4438
  Downloading kiwisolver
4439
  Downloading pygments
 
4460
  Downloading nvidia-cublas-cu12
4461
  Downloading nvidia-cudnn-cu12
4462
  Downloading torch
4463
+ Installed 69 packages in 460ms
4464
  </div>
4465
  </div>
4466
  <div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00&lt;?, ?it/s]
4467
+ Fetching 3 files: 33%|███▎ | 1/3 [00:07&lt;00:14, 7.31s/it]
4468
+ Fetching 3 files: 67%|██████▋ | 2/3 [00:08&lt;00:03, 3.67s/it]
4469
+ Fetching 3 files: 100%|██████████| 3/3 [00:08&lt;00:00, 2.81s/it]
4470
  You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
 
4471
 
4472
  Loading checkpoint shards: 0%| | 0/3 [00:00&lt;?, ?it/s]
4473
+ Loading checkpoint shards: 33%|███▎ | 1/3 [00:02&lt;00:04, 2.35s/it]
4474
+ Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04&lt;00:02, 2.25s/it]
4475
+ Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.80s/it]
4476
+ Loading checkpoint shards: 100%|██████████| 3/3 [00:05&lt;00:00, 1.93s/it]
4477
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4478
+
4479
+ Fetching 66 files: 0%| | 0/66 [00:00&lt;?, ?it/s]
4480
+ Fetching 66 files: 2%|▏ | 1/66 [00:00&lt;00:17, 3.80it/s]
4481
+ Fetching 66 files: 9%|▉ | 6/66 [00:00&lt;00:03, 19.77it/s]
4482
+ Fetching 66 files: 14%|█▎ | 9/66 [00:00&lt;00:02, 19.05it/s]
4483
+ Fetching 66 files: 26%|██▌ | 17/66 [00:01&lt;00:04, 10.13it/s]
4484
+ Fetching 66 files: 86%|████████▋ | 57/66 [00:01&lt;00:00, 47.70it/s]
4485
+ Fetching 66 files: 100%|██████████| 66/66 [00:01&lt;00:00, 36.16it/s]
4486
+ /tmp/uvnote-run-d2g9g4zl/home/.cache/uv/environments-v2/setup2-ea0d7cee95bc10c1/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4487
+ No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4488
+ warnings.warn(
4489
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4490
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4491
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4492
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4493
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4494
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4495
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4496
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4497
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4498
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4499
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4500
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4501
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4502
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4503
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4504
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4505
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4506
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4507
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4508
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4509
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4510
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4511
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4512
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4513
+ /tmp/uvnote-run-d2g9g4zl/home/.cache/uv/environments-v2/setup2-ea0d7cee95bc10c1/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
4514
+ No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
4515
+ warnings.warn(
4516
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4517
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4518
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4519
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4520
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4521
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4522
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4523
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4524
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4525
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4526
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4527
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4528
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4529
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4530
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4531
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4532
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4533
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4534
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4535
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4536
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4537
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
4538
+ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`</div>
4539
  </div>
4540
  </div>
 
 
4541
  </div>
4542
 
4543
  </body>