Upload folder using huggingface_hub
Browse files- cells/nv.py +3 -0
- index.html +132 -83
- note_test_override.html +132 -83
cells/nv.py
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import subprocess
|
| 2 |
+
|
| 3 |
+
print(subprocess.run(["nvidia-smi"], capture_output=True, text=True).stdout)
|
index.html
CHANGED
|
@@ -3711,14 +3711,81 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
|
|
| 3711 |
</div>
|
| 3712 |
|
| 3713 |
<div class="main-content">
|
| 3714 |
-
<div class="cell" id="cell-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3715 |
<div class="cell-header">
|
| 3716 |
<span class="collapse-indicators">
|
| 3717 |
<span onclick="toggleCode('setup')" style="cursor: pointer;">▼ code</span>
|
| 3718 |
<span onclick="toggleOutput('setup')" style="cursor: pointer;">▼ output</span>
|
| 3719 |
<span id="uv-indicator-setup" onclick="toggleUvLogsFromHeader('setup')" style="cursor: pointer;">▶ uv-logs</span>
|
| 3720 |
</span> |
|
| 3721 |
-
Cell: setup |
|
| 3722 |
| <button class="run-btn" onclick="runCell('setup')">▶ run</button>
|
| 3723 |
<button class="copy-btn" onclick="copyCell('setup')">Copy</button>
|
| 3724 |
<a href="cells/setup.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -3977,26 +4044,8 @@ Reasoning: low
|
|
| 3977 |
|
| 3978 |
What is Tensor Parallelism?
|
| 3979 |
|
| 3980 |
-
<|end|><|start|>assistant<|channel|>analysis<|message|>We need to explain what Tensor Parallelism is. It's a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices, so each device holds a slice of the matrix. During forward/backward passes, each device computes partial results and then they are aggregated. It's used to scale up models beyond single device memory. Also mention pipeline parallelism, data parallelism. Provide details: e.g., for a linear layer weight matrix W of shape (out_features, in_features), we can split along out_features dimension across devices. Each device computes its part of the output. Then gather.
|
| 3981 |
-
|
| 3982 |
-
Also mention that it's used for large transformer models like GPT-3, LLaMA, etc. Provide mention of "tensor parallelism" in the context of "DeepSpeed's ZeRO-Offload" or "ZeRO-3".
|
| 3983 |
-
|
| 3984 |
-
Also mention that "tensor parallelism" can be combined with "pipeline parallelism" and "data parallelism" to achieve full scaling.
|
| 3985 |
-
|
| 3986 |
-
Also mention that "tensor parallelism" can be implemented by splitting the weight matrix along the output dimension, performing local matrix multiplication, then all-reduce to sum partial outputs.
|
| 3987 |
-
|
| 3988 |
-
Also mention that "tensor parallelism" can be used for linear layers, self-attention, feed-forward networks, etc.
|
| 3989 |
-
|
| 3990 |
-
Also mention that "tensor parallelism" can be used for "embedding tables" by sharding them across devices.
|
| 3991 |
-
|
| 3992 |
-
Also mention that "tensor parallelism" can be used for "attention heads" by splitting across heads.
|
| 3993 |
-
|
| 3994 |
-
Also mention that "tensor parallelism" can be used for "parameter sharding".
|
| 3995 |
-
|
| 3996 |
-
Also mention that "tensor parallelism" can be used for "model parallelism" to reduce memory usage.
|
| 3997 |
-
|
| 3998 |
-
|
| 3999 |
-
Generation took 51.90 seconds
|
| 4000 |
</div>
|
| 4001 |
<div class="uv-install-logs" id="uv-logs-setup">
|
| 4002 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
|
@@ -4006,31 +4055,31 @@ Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
|
|
| 4006 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4007 |
Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
|
| 4008 |
Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
| 4009 |
-
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4010 |
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4011 |
-
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4012 |
-
Downloading sympy (6.0MiB)
|
| 4013 |
Downloading jedi (1.5MiB)
|
| 4014 |
-
Downloading fonttools (4.7MiB)
|
| 4015 |
-
Downloading pillow (6.3MiB)
|
| 4016 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
|
|
|
|
|
|
|
|
|
| 4017 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4018 |
-
Downloading
|
| 4019 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4020 |
-
Downloading
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4021 |
Downloading pygments (1.2MiB)
|
|
|
|
| 4022 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
|
|
|
|
|
|
| 4023 |
Downloading kiwisolver (1.4MiB)
|
| 4024 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4025 |
-
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4026 |
-
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4027 |
-
Downloading networkx (1.9MiB)
|
| 4028 |
-
Downloading hf-xet (3.0MiB)
|
| 4029 |
-
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4030 |
Downloading torch (846.8MiB)
|
| 4031 |
-
Downloading tokenizers (3.1MiB)
|
| 4032 |
-
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4033 |
-
Downloading triton (148.4MiB)
|
| 4034 |
Downloading nvidia-cufile-cu12
|
| 4035 |
Downloading kiwisolver
|
| 4036 |
Downloading pygments
|
|
@@ -4043,8 +4092,8 @@ Downloading triton (148.4MiB)
|
|
| 4043 |
Downloading nvidia-cuda-cupti-cu12
|
| 4044 |
Downloading numpy
|
| 4045 |
Downloading sympy
|
| 4046 |
-
Downloading nvidia-nvjitlink-cu12
|
| 4047 |
Built transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
|
|
|
| 4048 |
Downloading jedi
|
| 4049 |
Downloading nvidia-curand-cu12
|
| 4050 |
Downloading nvidia-cuda-nvrtc-cu12
|
|
@@ -4057,27 +4106,26 @@ Downloading triton (148.4MiB)
|
|
| 4057 |
Downloading nvidia-cublas-cu12
|
| 4058 |
Downloading nvidia-cudnn-cu12
|
| 4059 |
Downloading torch
|
| 4060 |
-
Installed 69 packages in
|
| 4061 |
</div>
|
| 4062 |
</div>
|
| 4063 |
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4064 |
-
Fetching 3 files: 33%|███▎ | 1/3 [00:
|
| 4065 |
-
Fetching 3 files: 67%|██████▋ | 2/3 [00:08<00:03, 3.
|
| 4066 |
-
Fetching 3 files: 100%|██████████| 3/3 [00:08<00:00, 2.
|
| 4067 |
You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
|
| 4068 |
|
| 4069 |
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4070 |
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.34s/it]
|
| 4071 |
-
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.
|
| 4072 |
-
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.
|
| 4073 |
-
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.
|
| 4074 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4075 |
|
| 4076 |
Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s]
|
| 4077 |
-
Fetching 6 files: 17%|█▋ | 1/6 [00:00<00:
|
| 4078 |
-
Fetching 6 files:
|
| 4079 |
-
|
| 4080 |
-
/tmp/uvnote-run-og9tszom/home/.cache/uv/environments-v2/setup-d9b6d9dd835772a9/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4081 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4082 |
warnings.warn(
|
| 4083 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
|
@@ -4104,7 +4152,7 @@ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for laye
|
|
| 4104 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4105 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4106 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4107 |
-
/tmp/uvnote-run-
|
| 4108 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4109 |
warnings.warn(
|
| 4110 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
|
@@ -4141,7 +4189,7 @@ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for laye
|
|
| 4141 |
<span onclick="toggleOutput('setup2')" style="cursor: pointer;">▼ output</span>
|
| 4142 |
<span id="uv-indicator-setup2" onclick="toggleUvLogsFromHeader('setup2')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4143 |
</span> |
|
| 4144 |
-
Cell: setup2 |
|
| 4145 |
| <button class="run-btn" onclick="runCell('setup2')">▶ run</button>
|
| 4146 |
<button class="copy-btn" onclick="copyCell('setup2')">Copy</button>
|
| 4147 |
<a href="cells/setup2.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -4399,7 +4447,7 @@ Reasoning: low
|
|
| 4399 |
What is Tensor Parallelism?
|
| 4400 |
|
| 4401 |
<|end|><|start|>assistant<|channel|>analysis<|message|>We need to explain what Tensor Parallelism is. It's a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices. Provide details: how it works, benefits, challenges, typical frameworks, etc. Also mention difference from data parallelism, pipeline parallelism. Provide example: splitting a weight matrix across GPUs, each GPU holds a slice, compute partial results, then gather. Provide mention of communication overhead, scaling, etc. Also mention that it's used in large models like GPT-3, Megatron-LM, DeepSpeed, etc. Provide explanation of how it reduces memory usage, increases throughput. Provide mention of "tensor model parallelism" vs "tensor parallelism" synonyms. Provide mention of "tensor parallelism" in context of huggingface accelerate, DeepSpeed, Megatron. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in the "DeepSpeed ZeRO-Offload" or "ZeRO-3" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed" and "Megatron-LM" and "DeepSpeed's ZeRO" and "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the
|
| 4402 |
-
Generation took 57.
|
| 4403 |
</div>
|
| 4404 |
<div class="uv-install-logs" id="uv-logs-setup2">
|
| 4405 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
|
@@ -4408,32 +4456,32 @@ Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
|
|
| 4408 |
Downloading cpython-3.13.7-linux-x86_64-gnu (download)
|
| 4409 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4410 |
Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
|
| 4411 |
-
Downloading
|
| 4412 |
Downloading pygments (1.2MiB)
|
| 4413 |
-
Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
| 4414 |
-
Downloading pillow (6.3MiB)
|
| 4415 |
-
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4416 |
-
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4417 |
-
Downloading networkx (1.9MiB)
|
| 4418 |
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4419 |
-
|
| 4420 |
Downloading hf-xet (3.0MiB)
|
| 4421 |
-
Downloading nvidia-
|
| 4422 |
-
Downloading
|
| 4423 |
-
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4424 |
Downloading sympy (6.0MiB)
|
| 4425 |
-
Downloading
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4426 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4427 |
-
Downloading
|
| 4428 |
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4429 |
-
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4430 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4431 |
-
Downloading nvidia-cuda-
|
| 4432 |
-
Downloading
|
|
|
|
| 4433 |
Downloading triton (148.4MiB)
|
| 4434 |
-
Downloading matplotlib (8.3MiB)
|
| 4435 |
Downloading kiwisolver (1.4MiB)
|
| 4436 |
-
Downloading
|
| 4437 |
Downloading nvidia-cufile-cu12
|
| 4438 |
Downloading kiwisolver
|
| 4439 |
Downloading pygments
|
|
@@ -4446,8 +4494,8 @@ Downloading fonttools (4.7MiB)
|
|
| 4446 |
Downloading nvidia-cuda-cupti-cu12
|
| 4447 |
Downloading numpy
|
| 4448 |
Downloading sympy
|
| 4449 |
-
Downloading nvidia-nvjitlink-cu12
|
| 4450 |
Built transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
|
|
|
| 4451 |
Downloading jedi
|
| 4452 |
Downloading nvidia-curand-cu12
|
| 4453 |
Downloading nvidia-cuda-nvrtc-cu12
|
|
@@ -4460,30 +4508,31 @@ Downloading fonttools (4.7MiB)
|
|
| 4460 |
Downloading nvidia-cublas-cu12
|
| 4461 |
Downloading nvidia-cudnn-cu12
|
| 4462 |
Downloading torch
|
| 4463 |
-
Installed 69 packages in
|
| 4464 |
</div>
|
| 4465 |
</div>
|
| 4466 |
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4467 |
-
Fetching 3 files: 33%|███▎ | 1/3 [00:
|
| 4468 |
-
Fetching 3 files: 67%|██████▋ | 2/3 [00:08<00:03, 3.
|
| 4469 |
-
Fetching 3 files: 100%|██████████| 3/3 [00:08<00:00, 2.
|
| 4470 |
You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
|
| 4471 |
|
| 4472 |
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4473 |
-
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.
|
| 4474 |
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.25s/it]
|
| 4475 |
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.80s/it]
|
| 4476 |
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.93s/it]
|
| 4477 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4478 |
|
| 4479 |
Fetching 66 files: 0%| | 0/66 [00:00<?, ?it/s]
|
| 4480 |
-
Fetching 66 files: 2%|▏ | 1/66 [00:00<00:
|
| 4481 |
-
Fetching 66 files:
|
| 4482 |
-
Fetching 66 files:
|
| 4483 |
-
Fetching 66 files:
|
| 4484 |
-
Fetching 66 files:
|
| 4485 |
-
Fetching 66 files:
|
| 4486 |
-
/
|
|
|
|
| 4487 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4488 |
warnings.warn(
|
| 4489 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
|
@@ -4510,7 +4559,7 @@ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks
|
|
| 4510 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4511 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4512 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4513 |
-
/tmp/uvnote-run-
|
| 4514 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4515 |
warnings.warn(
|
| 4516 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
|
|
|
| 3711 |
</div>
|
| 3712 |
|
| 3713 |
<div class="main-content">
|
| 3714 |
+
<div class="cell" id="cell-nv">
|
| 3715 |
+
<div class="cell-header">
|
| 3716 |
+
<span class="collapse-indicators">
|
| 3717 |
+
<span onclick="toggleCode('nv')" style="cursor: pointer;">▼ code</span>
|
| 3718 |
+
<span onclick="toggleOutput('nv')" style="cursor: pointer;">▼ output</span>
|
| 3719 |
+
<span id="uv-indicator-nv" style="cursor: default; opacity: 0.3;">▶ uv-logs</span>
|
| 3720 |
+
</span> |
|
| 3721 |
+
Cell: nv | 0.53s
|
| 3722 |
+
| <button class="run-btn" onclick="runCell('nv')">▶ run</button>
|
| 3723 |
+
<button class="copy-btn" onclick="copyCell('nv')">Copy</button>
|
| 3724 |
+
<a href="cells/nv.py" target="_blank" class="raw-btn">Raw</a>
|
| 3725 |
+
</div>
|
| 3726 |
+
<div id="code-nv" class="cell-code" data-lines="3">
|
| 3727 |
+
<div class="highlight-with-lines">
|
| 3728 |
+
<div class="line-numbers" id="lines-nv">
|
| 3729 |
+
<a class="line-number" data-cell="nv" data-line="1" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 1, true);">1</a>
|
| 3730 |
+
<a class="line-number" data-cell="nv" data-line="2" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 2, true);">2</a>
|
| 3731 |
+
<a class="line-number" data-cell="nv" data-line="3" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 3, true);">3</a>
|
| 3732 |
+
</div>
|
| 3733 |
+
<div class="code-wrap">
|
| 3734 |
+
<div class="highlight"><pre><span></span><span class="kn">import</span><span class="w"> </span><span class="nn">subprocess</span>
|
| 3735 |
+
|
| 3736 |
+
<span class="nb">print</span><span class="p">(</span><span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">"nvidia-smi"</span><span class="p">],</span> <span class="n">capture_output</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">stdout</span><span class="p">)</span>
|
| 3737 |
+
</pre></div>
|
| 3738 |
+
|
| 3739 |
+
<div class="code-line-highlight" id="line-highlight-nv"></div>
|
| 3740 |
+
</div>
|
| 3741 |
+
</div>
|
| 3742 |
+
</div>
|
| 3743 |
+
<div id="output-nv" class="cell-output">
|
| 3744 |
+
<div class="cell-stdout">Tue Sep 23 19:46:07 2025
|
| 3745 |
+
+-----------------------------------------------------------------------------------------+
|
| 3746 |
+
| NVIDIA-SMI 570.172.08 Driver Version: 570.172.08 CUDA Version: 12.8 |
|
| 3747 |
+
|-----------------------------------------+------------------------+----------------------+
|
| 3748 |
+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
| 3749 |
+
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
| 3750 |
+
| | | MIG M. |
|
| 3751 |
+
|=========================================+========================+======================|
|
| 3752 |
+
| 0 NVIDIA A10G On | 00000000:00:1B.0 Off | 0 |
|
| 3753 |
+
| 0% 42C P0 71W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3754 |
+
| | | N/A |
|
| 3755 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 3756 |
+
| 1 NVIDIA A10G On | 00000000:00:1C.0 Off | 0 |
|
| 3757 |
+
| 0% 43C P0 44W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3758 |
+
| | | N/A |
|
| 3759 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 3760 |
+
| 2 NVIDIA A10G On | 00000000:00:1D.0 Off | 0 |
|
| 3761 |
+
| 0% 42C P0 46W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3762 |
+
| | | N/A |
|
| 3763 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 3764 |
+
| 3 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
|
| 3765 |
+
| 0% 41C P0 43W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3766 |
+
| | | N/A |
|
| 3767 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 3768 |
+
|
| 3769 |
+
+-----------------------------------------------------------------------------------------+
|
| 3770 |
+
| Processes: |
|
| 3771 |
+
| GPU GI CI PID Type Process name GPU Memory |
|
| 3772 |
+
| ID ID Usage |
|
| 3773 |
+
|=========================================================================================|
|
| 3774 |
+
| No running processes found |
|
| 3775 |
+
+-----------------------------------------------------------------------------------------+
|
| 3776 |
+
|
| 3777 |
+
</div>
|
| 3778 |
+
</div>
|
| 3779 |
+
</div>
|
| 3780 |
+
|
| 3781 |
+
<div class="cell" id="cell-setup">
|
| 3782 |
<div class="cell-header">
|
| 3783 |
<span class="collapse-indicators">
|
| 3784 |
<span onclick="toggleCode('setup')" style="cursor: pointer;">▼ code</span>
|
| 3785 |
<span onclick="toggleOutput('setup')" style="cursor: pointer;">▼ output</span>
|
| 3786 |
<span id="uv-indicator-setup" onclick="toggleUvLogsFromHeader('setup')" style="cursor: pointer;">▶ uv-logs</span>
|
| 3787 |
</span> |
|
| 3788 |
+
Cell: setup | 133.12s
|
| 3789 |
| <button class="run-btn" onclick="runCell('setup')">▶ run</button>
|
| 3790 |
<button class="copy-btn" onclick="copyCell('setup')">Copy</button>
|
| 3791 |
<a href="cells/setup.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 4044 |
|
| 4045 |
What is Tensor Parallelism?
|
| 4046 |
|
| 4047 |
+
<|end|><|start|>assistant<|channel|>analysis<|message|>We need to explain what Tensor Parallelism is. It's a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices, so each device holds a slice of the matrix. During forward/backward passes, each device computes partial results and then they are aggregated. It's used to scale up models beyond single device memory. Also mention pipeline parallelism, data parallelism. Provide details: e.g., for a linear layer weight matrix W of shape (out_features, in_features), we can split along out_features dimension across devices. Each device computes its part of the output. Then gather results. In backward, gradients are computed locally and then aggregated. Provide example: GPT-3 training uses tensor parallelism. Also mention frameworks: Megatron-LM, DeepSpeed, etc. Provide pros/cons. Provide code snippet maybe. Also mention that it's different from data parallelism. Provide explanation of how it works in practice. Provide mention of communication overhead. Provide mention of "tensor model parallelism" vs "tensor parallelism" synonyms. Provide mention of "tensor parallelism" in context of huggingface accelerate. Provide mention of "tensor parallelism" in context of DeepSpeed ZeRO stage 3. Provide mention of "tensor parallelism" in context of Megatron-LM. Provide mention of "tensor parallelism" in context of GPT-NeoX. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-Offload" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-2" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor
|
| 4048 |
+
Generation took 51.92 seconds
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4049 |
</div>
|
| 4050 |
<div class="uv-install-logs" id="uv-logs-setup">
|
| 4051 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
|
|
|
| 4055 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4056 |
Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
|
| 4057 |
Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
|
|
|
| 4058 |
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
|
|
|
|
|
|
| 4059 |
Downloading jedi (1.5MiB)
|
|
|
|
|
|
|
| 4060 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4061 |
+
Downloading hf-xet (3.0MiB)
|
| 4062 |
+
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4063 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4064 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4065 |
+
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4066 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4067 |
+
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4068 |
+
Downloading pillow (6.3MiB)
|
| 4069 |
+
Downloading sympy (6.0MiB)
|
| 4070 |
+
Downloading fonttools (4.7MiB)
|
| 4071 |
+
Downloading numpy (15.9MiB)
|
| 4072 |
+
Downloading triton (148.4MiB)
|
| 4073 |
+
Downloading networkx (1.9MiB)
|
| 4074 |
+
Downloading tokenizers (3.1MiB)
|
| 4075 |
Downloading pygments (1.2MiB)
|
| 4076 |
+
Downloading matplotlib (8.3MiB)
|
| 4077 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4078 |
+
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4079 |
+
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4080 |
Downloading kiwisolver (1.4MiB)
|
| 4081 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4082 |
Downloading torch (846.8MiB)
|
|
|
|
|
|
|
|
|
|
| 4083 |
Downloading nvidia-cufile-cu12
|
| 4084 |
Downloading kiwisolver
|
| 4085 |
Downloading pygments
|
|
|
|
| 4092 |
Downloading nvidia-cuda-cupti-cu12
|
| 4093 |
Downloading numpy
|
| 4094 |
Downloading sympy
|
|
|
|
| 4095 |
Built transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
| 4096 |
+
Downloading nvidia-nvjitlink-cu12
|
| 4097 |
Downloading jedi
|
| 4098 |
Downloading nvidia-curand-cu12
|
| 4099 |
Downloading nvidia-cuda-nvrtc-cu12
|
|
|
|
| 4106 |
Downloading nvidia-cublas-cu12
|
| 4107 |
Downloading nvidia-cudnn-cu12
|
| 4108 |
Downloading torch
|
| 4109 |
+
Installed 69 packages in 467ms
|
| 4110 |
</div>
|
| 4111 |
</div>
|
| 4112 |
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4113 |
+
Fetching 3 files: 33%|███▎ | 1/3 [00:06<00:13, 6.78s/it]
|
| 4114 |
+
Fetching 3 files: 67%|██████▋ | 2/3 [00:08<00:03, 3.65s/it]
|
| 4115 |
+
Fetching 3 files: 100%|██████████| 3/3 [00:08<00:00, 2.75s/it]
|
| 4116 |
You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
|
| 4117 |
|
| 4118 |
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4119 |
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.34s/it]
|
| 4120 |
+
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.25s/it]
|
| 4121 |
+
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.80s/it]
|
| 4122 |
+
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.93s/it]
|
| 4123 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4124 |
|
| 4125 |
Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s]
|
| 4126 |
+
Fetching 6 files: 17%|█▋ | 1/6 [00:00<00:01, 3.89it/s]
|
| 4127 |
+
Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 17.67it/s]
|
| 4128 |
+
/tmp/uvnote-run-hvgovjfd/home/.cache/uv/environments-v2/setup-443c07e337d3be43/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
|
|
|
| 4129 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4130 |
warnings.warn(
|
| 4131 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
|
|
|
| 4152 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4153 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4154 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4155 |
+
/tmp/uvnote-run-hvgovjfd/home/.cache/uv/environments-v2/setup-443c07e337d3be43/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4156 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4157 |
warnings.warn(
|
| 4158 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
|
|
|
| 4189 |
<span onclick="toggleOutput('setup2')" style="cursor: pointer;">▼ output</span>
|
| 4190 |
<span id="uv-indicator-setup2" onclick="toggleUvLogsFromHeader('setup2')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4191 |
</span> |
|
| 4192 |
+
Cell: setup2 | 139.97s
|
| 4193 |
| <button class="run-btn" onclick="runCell('setup2')">▶ run</button>
|
| 4194 |
<button class="copy-btn" onclick="copyCell('setup2')">Copy</button>
|
| 4195 |
<a href="cells/setup2.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 4447 |
What is Tensor Parallelism?
|
| 4448 |
|
| 4449 |
<|end|><|start|>assistant<|channel|>analysis<|message|>We need to explain what Tensor Parallelism is. It's a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices. Provide details: how it works, benefits, challenges, typical frameworks, etc. Also mention difference from data parallelism, pipeline parallelism. Provide example: splitting a weight matrix across GPUs, each GPU holds a slice, compute partial results, then gather. Provide mention of communication overhead, scaling, etc. Also mention that it's used in large models like GPT-3, Megatron-LM, DeepSpeed, etc. Provide explanation of how it reduces memory usage, increases throughput. Provide mention of "tensor model parallelism" vs "tensor parallelism" synonyms. Provide mention of "tensor parallelism" in context of huggingface accelerate, DeepSpeed, Megatron. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in the "DeepSpeed ZeRO-Offload" or "ZeRO-3" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed" and "Megatron-LM" and "DeepSpeed's ZeRO" and "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the
|
| 4450 |
+
Generation took 57.98 seconds
|
| 4451 |
</div>
|
| 4452 |
<div class="uv-install-logs" id="uv-logs-setup2">
|
| 4453 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
|
|
|
| 4456 |
Downloading cpython-3.13.7-linux-x86_64-gnu (download)
|
| 4457 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4458 |
Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
|
| 4459 |
+
Downloading jedi (1.5MiB)
|
| 4460 |
Downloading pygments (1.2MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4461 |
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4462 |
+
Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
| 4463 |
Downloading hf-xet (3.0MiB)
|
| 4464 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4465 |
+
Downloading numpy (15.9MiB)
|
|
|
|
| 4466 |
Downloading sympy (6.0MiB)
|
| 4467 |
+
Downloading matplotlib (8.3MiB)
|
| 4468 |
+
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4469 |
+
Downloading networkx (1.9MiB)
|
| 4470 |
+
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4471 |
+
Downloading pillow (6.3MiB)
|
| 4472 |
+
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4473 |
+
Downloading tokenizers (3.1MiB)
|
| 4474 |
+
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4475 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4476 |
+
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4477 |
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
|
|
|
| 4478 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4479 |
+
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4480 |
+
Downloading fonttools (4.7MiB)
|
| 4481 |
+
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4482 |
Downloading triton (148.4MiB)
|
|
|
|
| 4483 |
Downloading kiwisolver (1.4MiB)
|
| 4484 |
+
Downloading torch (846.8MiB)
|
| 4485 |
Downloading nvidia-cufile-cu12
|
| 4486 |
Downloading kiwisolver
|
| 4487 |
Downloading pygments
|
|
|
|
| 4494 |
Downloading nvidia-cuda-cupti-cu12
|
| 4495 |
Downloading numpy
|
| 4496 |
Downloading sympy
|
|
|
|
| 4497 |
Built transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
| 4498 |
+
Downloading nvidia-nvjitlink-cu12
|
| 4499 |
Downloading jedi
|
| 4500 |
Downloading nvidia-curand-cu12
|
| 4501 |
Downloading nvidia-cuda-nvrtc-cu12
|
|
|
|
| 4508 |
Downloading nvidia-cublas-cu12
|
| 4509 |
Downloading nvidia-cudnn-cu12
|
| 4510 |
Downloading torch
|
| 4511 |
+
Installed 69 packages in 468ms
|
| 4512 |
</div>
|
| 4513 |
</div>
|
| 4514 |
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4515 |
+
Fetching 3 files: 33%|███▎ | 1/3 [00:06<00:12, 6.38s/it]
|
| 4516 |
+
Fetching 3 files: 67%|██████▋ | 2/3 [00:08<00:03, 3.61s/it]
|
| 4517 |
+
Fetching 3 files: 100%|██████████| 3/3 [00:08<00:00, 2.69s/it]
|
| 4518 |
You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
|
| 4519 |
|
| 4520 |
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4521 |
+
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.34s/it]
|
| 4522 |
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.25s/it]
|
| 4523 |
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.80s/it]
|
| 4524 |
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.93s/it]
|
| 4525 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4526 |
|
| 4527 |
Fetching 66 files: 0%| | 0/66 [00:00<?, ?it/s]
|
| 4528 |
+
Fetching 66 files: 2%|▏ | 1/66 [00:00<00:10, 6.10it/s]
|
| 4529 |
+
Fetching 66 files: 14%|█▎ | 9/66 [00:00<00:01, 30.47it/s]
|
| 4530 |
+
Fetching 66 files: 24%|██▍ | 16/66 [00:00<00:01, 37.56it/s]
|
| 4531 |
+
Fetching 66 files: 30%|███ | 20/66 [00:01<00:03, 14.24it/s]
|
| 4532 |
+
Fetching 66 files: 67%|██████▋ | 44/66 [00:01<00:00, 37.14it/s]
|
| 4533 |
+
Fetching 66 files: 91%|█████████ | 60/66 [00:01<00:00, 49.97it/s]
|
| 4534 |
+
Fetching 66 files: 100%|██████████| 66/66 [00:01<00:00, 36.02it/s]
|
| 4535 |
+
/tmp/uvnote-run-nw4e52ut/home/.cache/uv/environments-v2/setup2-69adf76231e4ab4f/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4536 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4537 |
warnings.warn(
|
| 4538 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
|
|
|
| 4559 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4560 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4561 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4562 |
+
/tmp/uvnote-run-nw4e52ut/home/.cache/uv/environments-v2/setup2-69adf76231e4ab4f/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4563 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4564 |
warnings.warn(
|
| 4565 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
note_test_override.html
CHANGED
|
@@ -3711,14 +3711,81 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
|
|
| 3711 |
</div>
|
| 3712 |
|
| 3713 |
<div class="main-content">
|
| 3714 |
-
<div class="cell" id="cell-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3715 |
<div class="cell-header">
|
| 3716 |
<span class="collapse-indicators">
|
| 3717 |
<span onclick="toggleCode('setup')" style="cursor: pointer;">▼ code</span>
|
| 3718 |
<span onclick="toggleOutput('setup')" style="cursor: pointer;">▼ output</span>
|
| 3719 |
<span id="uv-indicator-setup" onclick="toggleUvLogsFromHeader('setup')" style="cursor: pointer;">▶ uv-logs</span>
|
| 3720 |
</span> |
|
| 3721 |
-
Cell: setup |
|
| 3722 |
| <button class="run-btn" onclick="runCell('setup')">▶ run</button>
|
| 3723 |
<button class="copy-btn" onclick="copyCell('setup')">Copy</button>
|
| 3724 |
<a href="cells/setup.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -3977,26 +4044,8 @@ Reasoning: low
|
|
| 3977 |
|
| 3978 |
What is Tensor Parallelism?
|
| 3979 |
|
| 3980 |
-
<|end|><|start|>assistant<|channel|>analysis<|message|>We need to explain what Tensor Parallelism is. It's a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices, so each device holds a slice of the matrix. During forward/backward passes, each device computes partial results and then they are aggregated. It's used to scale up models beyond single device memory. Also mention pipeline parallelism, data parallelism. Provide details: e.g., for a linear layer weight matrix W of shape (out_features, in_features), we can split along out_features dimension across devices. Each device computes its part of the output. Then gather.
|
| 3981 |
-
|
| 3982 |
-
Also mention that it's used for large transformer models like GPT-3, LLaMA, etc. Provide mention of "tensor parallelism" in the context of "DeepSpeed's ZeRO-Offload" or "ZeRO-3".
|
| 3983 |
-
|
| 3984 |
-
Also mention that "tensor parallelism" can be combined with "pipeline parallelism" and "data parallelism" to achieve full scaling.
|
| 3985 |
-
|
| 3986 |
-
Also mention that "tensor parallelism" can be implemented by splitting the weight matrix along the output dimension, performing local matrix multiplication, then all-reduce to sum partial outputs.
|
| 3987 |
-
|
| 3988 |
-
Also mention that "tensor parallelism" can be used for linear layers, self-attention, feed-forward networks, etc.
|
| 3989 |
-
|
| 3990 |
-
Also mention that "tensor parallelism" can be used for "embedding tables" by sharding them across devices.
|
| 3991 |
-
|
| 3992 |
-
Also mention that "tensor parallelism" can be used for "attention heads" by splitting across heads.
|
| 3993 |
-
|
| 3994 |
-
Also mention that "tensor parallelism" can be used for "parameter sharding".
|
| 3995 |
-
|
| 3996 |
-
Also mention that "tensor parallelism" can be used for "model parallelism" to reduce memory usage.
|
| 3997 |
-
|
| 3998 |
-
|
| 3999 |
-
Generation took 51.90 seconds
|
| 4000 |
</div>
|
| 4001 |
<div class="uv-install-logs" id="uv-logs-setup">
|
| 4002 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
|
@@ -4006,31 +4055,31 @@ Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
|
|
| 4006 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4007 |
Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
|
| 4008 |
Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
| 4009 |
-
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4010 |
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4011 |
-
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4012 |
-
Downloading sympy (6.0MiB)
|
| 4013 |
Downloading jedi (1.5MiB)
|
| 4014 |
-
Downloading fonttools (4.7MiB)
|
| 4015 |
-
Downloading pillow (6.3MiB)
|
| 4016 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
|
|
|
|
|
|
|
|
|
| 4017 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4018 |
-
Downloading
|
| 4019 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4020 |
-
Downloading
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4021 |
Downloading pygments (1.2MiB)
|
|
|
|
| 4022 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
|
|
|
|
|
|
| 4023 |
Downloading kiwisolver (1.4MiB)
|
| 4024 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4025 |
-
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4026 |
-
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4027 |
-
Downloading networkx (1.9MiB)
|
| 4028 |
-
Downloading hf-xet (3.0MiB)
|
| 4029 |
-
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4030 |
Downloading torch (846.8MiB)
|
| 4031 |
-
Downloading tokenizers (3.1MiB)
|
| 4032 |
-
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4033 |
-
Downloading triton (148.4MiB)
|
| 4034 |
Downloading nvidia-cufile-cu12
|
| 4035 |
Downloading kiwisolver
|
| 4036 |
Downloading pygments
|
|
@@ -4043,8 +4092,8 @@ Downloading triton (148.4MiB)
|
|
| 4043 |
Downloading nvidia-cuda-cupti-cu12
|
| 4044 |
Downloading numpy
|
| 4045 |
Downloading sympy
|
| 4046 |
-
Downloading nvidia-nvjitlink-cu12
|
| 4047 |
Built transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
|
|
|
| 4048 |
Downloading jedi
|
| 4049 |
Downloading nvidia-curand-cu12
|
| 4050 |
Downloading nvidia-cuda-nvrtc-cu12
|
|
@@ -4057,27 +4106,26 @@ Downloading triton (148.4MiB)
|
|
| 4057 |
Downloading nvidia-cublas-cu12
|
| 4058 |
Downloading nvidia-cudnn-cu12
|
| 4059 |
Downloading torch
|
| 4060 |
-
Installed 69 packages in
|
| 4061 |
</div>
|
| 4062 |
</div>
|
| 4063 |
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4064 |
-
Fetching 3 files: 33%|███▎ | 1/3 [00:
|
| 4065 |
-
Fetching 3 files: 67%|██████▋ | 2/3 [00:08<00:03, 3.
|
| 4066 |
-
Fetching 3 files: 100%|██████████| 3/3 [00:08<00:00, 2.
|
| 4067 |
You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
|
| 4068 |
|
| 4069 |
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4070 |
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.34s/it]
|
| 4071 |
-
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.
|
| 4072 |
-
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.
|
| 4073 |
-
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.
|
| 4074 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4075 |
|
| 4076 |
Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s]
|
| 4077 |
-
Fetching 6 files: 17%|█▋ | 1/6 [00:00<00:
|
| 4078 |
-
Fetching 6 files:
|
| 4079 |
-
|
| 4080 |
-
/tmp/uvnote-run-og9tszom/home/.cache/uv/environments-v2/setup-d9b6d9dd835772a9/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4081 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4082 |
warnings.warn(
|
| 4083 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
|
@@ -4104,7 +4152,7 @@ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for laye
|
|
| 4104 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4105 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4106 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4107 |
-
/tmp/uvnote-run-
|
| 4108 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4109 |
warnings.warn(
|
| 4110 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
|
@@ -4141,7 +4189,7 @@ INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for laye
|
|
| 4141 |
<span onclick="toggleOutput('setup2')" style="cursor: pointer;">▼ output</span>
|
| 4142 |
<span id="uv-indicator-setup2" onclick="toggleUvLogsFromHeader('setup2')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4143 |
</span> |
|
| 4144 |
-
Cell: setup2 |
|
| 4145 |
| <button class="run-btn" onclick="runCell('setup2')">▶ run</button>
|
| 4146 |
<button class="copy-btn" onclick="copyCell('setup2')">Copy</button>
|
| 4147 |
<a href="cells/setup2.py" target="_blank" class="raw-btn">Raw</a>
|
|
@@ -4399,7 +4447,7 @@ Reasoning: low
|
|
| 4399 |
What is Tensor Parallelism?
|
| 4400 |
|
| 4401 |
<|end|><|start|>assistant<|channel|>analysis<|message|>We need to explain what Tensor Parallelism is. It's a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices. Provide details: how it works, benefits, challenges, typical frameworks, etc. Also mention difference from data parallelism, pipeline parallelism. Provide example: splitting a weight matrix across GPUs, each GPU holds a slice, compute partial results, then gather. Provide mention of communication overhead, scaling, etc. Also mention that it's used in large models like GPT-3, Megatron-LM, DeepSpeed, etc. Provide explanation of how it reduces memory usage, increases throughput. Provide mention of "tensor model parallelism" vs "tensor parallelism" synonyms. Provide mention of "tensor parallelism" in context of huggingface accelerate, DeepSpeed, Megatron. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in the "DeepSpeed ZeRO-Offload" or "ZeRO-3" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed" and "Megatron-LM" and "DeepSpeed's ZeRO" and "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the
|
| 4402 |
-
Generation took 57.
|
| 4403 |
</div>
|
| 4404 |
<div class="uv-install-logs" id="uv-logs-setup2">
|
| 4405 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
|
@@ -4408,32 +4456,32 @@ Downloading cpython-3.13.7-linux-x86_64-gnu (download) (32.0MiB)
|
|
| 4408 |
Downloading cpython-3.13.7-linux-x86_64-gnu (download)
|
| 4409 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4410 |
Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
|
| 4411 |
-
Downloading
|
| 4412 |
Downloading pygments (1.2MiB)
|
| 4413 |
-
Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
| 4414 |
-
Downloading pillow (6.3MiB)
|
| 4415 |
-
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4416 |
-
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4417 |
-
Downloading networkx (1.9MiB)
|
| 4418 |
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4419 |
-
|
| 4420 |
Downloading hf-xet (3.0MiB)
|
| 4421 |
-
Downloading nvidia-
|
| 4422 |
-
Downloading
|
| 4423 |
-
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4424 |
Downloading sympy (6.0MiB)
|
| 4425 |
-
Downloading
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4426 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4427 |
-
Downloading
|
| 4428 |
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4429 |
-
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4430 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4431 |
-
Downloading nvidia-cuda-
|
| 4432 |
-
Downloading
|
|
|
|
| 4433 |
Downloading triton (148.4MiB)
|
| 4434 |
-
Downloading matplotlib (8.3MiB)
|
| 4435 |
Downloading kiwisolver (1.4MiB)
|
| 4436 |
-
Downloading
|
| 4437 |
Downloading nvidia-cufile-cu12
|
| 4438 |
Downloading kiwisolver
|
| 4439 |
Downloading pygments
|
|
@@ -4446,8 +4494,8 @@ Downloading fonttools (4.7MiB)
|
|
| 4446 |
Downloading nvidia-cuda-cupti-cu12
|
| 4447 |
Downloading numpy
|
| 4448 |
Downloading sympy
|
| 4449 |
-
Downloading nvidia-nvjitlink-cu12
|
| 4450 |
Built transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
|
|
|
| 4451 |
Downloading jedi
|
| 4452 |
Downloading nvidia-curand-cu12
|
| 4453 |
Downloading nvidia-cuda-nvrtc-cu12
|
|
@@ -4460,30 +4508,31 @@ Downloading fonttools (4.7MiB)
|
|
| 4460 |
Downloading nvidia-cublas-cu12
|
| 4461 |
Downloading nvidia-cudnn-cu12
|
| 4462 |
Downloading torch
|
| 4463 |
-
Installed 69 packages in
|
| 4464 |
</div>
|
| 4465 |
</div>
|
| 4466 |
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4467 |
-
Fetching 3 files: 33%|███▎ | 1/3 [00:
|
| 4468 |
-
Fetching 3 files: 67%|██████▋ | 2/3 [00:08<00:03, 3.
|
| 4469 |
-
Fetching 3 files: 100%|██████████| 3/3 [00:08<00:00, 2.
|
| 4470 |
You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
|
| 4471 |
|
| 4472 |
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4473 |
-
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.
|
| 4474 |
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.25s/it]
|
| 4475 |
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.80s/it]
|
| 4476 |
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.93s/it]
|
| 4477 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4478 |
|
| 4479 |
Fetching 66 files: 0%| | 0/66 [00:00<?, ?it/s]
|
| 4480 |
-
Fetching 66 files: 2%|▏ | 1/66 [00:00<00:
|
| 4481 |
-
Fetching 66 files:
|
| 4482 |
-
Fetching 66 files:
|
| 4483 |
-
Fetching 66 files:
|
| 4484 |
-
Fetching 66 files:
|
| 4485 |
-
Fetching 66 files:
|
| 4486 |
-
/
|
|
|
|
| 4487 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4488 |
warnings.warn(
|
| 4489 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
|
@@ -4510,7 +4559,7 @@ INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks
|
|
| 4510 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4511 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4512 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4513 |
-
/tmp/uvnote-run-
|
| 4514 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4515 |
warnings.warn(
|
| 4516 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
|
|
|
| 3711 |
</div>
|
| 3712 |
|
| 3713 |
<div class="main-content">
|
| 3714 |
+
<div class="cell" id="cell-nv">
|
| 3715 |
+
<div class="cell-header">
|
| 3716 |
+
<span class="collapse-indicators">
|
| 3717 |
+
<span onclick="toggleCode('nv')" style="cursor: pointer;">▼ code</span>
|
| 3718 |
+
<span onclick="toggleOutput('nv')" style="cursor: pointer;">▼ output</span>
|
| 3719 |
+
<span id="uv-indicator-nv" style="cursor: default; opacity: 0.3;">▶ uv-logs</span>
|
| 3720 |
+
</span> |
|
| 3721 |
+
Cell: nv | 0.53s
|
| 3722 |
+
| <button class="run-btn" onclick="runCell('nv')">▶ run</button>
|
| 3723 |
+
<button class="copy-btn" onclick="copyCell('nv')">Copy</button>
|
| 3724 |
+
<a href="cells/nv.py" target="_blank" class="raw-btn">Raw</a>
|
| 3725 |
+
</div>
|
| 3726 |
+
<div id="code-nv" class="cell-code" data-lines="3">
|
| 3727 |
+
<div class="highlight-with-lines">
|
| 3728 |
+
<div class="line-numbers" id="lines-nv">
|
| 3729 |
+
<a class="line-number" data-cell="nv" data-line="1" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 1, true);">1</a>
|
| 3730 |
+
<a class="line-number" data-cell="nv" data-line="2" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 2, true);">2</a>
|
| 3731 |
+
<a class="line-number" data-cell="nv" data-line="3" href="#cell-nv" onclick="event.preventDefault(); selectCellLine('nv', 3, true);">3</a>
|
| 3732 |
+
</div>
|
| 3733 |
+
<div class="code-wrap">
|
| 3734 |
+
<div class="highlight"><pre><span></span><span class="kn">import</span><span class="w"> </span><span class="nn">subprocess</span>
|
| 3735 |
+
|
| 3736 |
+
<span class="nb">print</span><span class="p">(</span><span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">"nvidia-smi"</span><span class="p">],</span> <span class="n">capture_output</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">stdout</span><span class="p">)</span>
|
| 3737 |
+
</pre></div>
|
| 3738 |
+
|
| 3739 |
+
<div class="code-line-highlight" id="line-highlight-nv"></div>
|
| 3740 |
+
</div>
|
| 3741 |
+
</div>
|
| 3742 |
+
</div>
|
| 3743 |
+
<div id="output-nv" class="cell-output">
|
| 3744 |
+
<div class="cell-stdout">Tue Sep 23 19:46:07 2025
|
| 3745 |
+
+-----------------------------------------------------------------------------------------+
|
| 3746 |
+
| NVIDIA-SMI 570.172.08 Driver Version: 570.172.08 CUDA Version: 12.8 |
|
| 3747 |
+
|-----------------------------------------+------------------------+----------------------+
|
| 3748 |
+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
| 3749 |
+
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
| 3750 |
+
| | | MIG M. |
|
| 3751 |
+
|=========================================+========================+======================|
|
| 3752 |
+
| 0 NVIDIA A10G On | 00000000:00:1B.0 Off | 0 |
|
| 3753 |
+
| 0% 42C P0 71W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3754 |
+
| | | N/A |
|
| 3755 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 3756 |
+
| 1 NVIDIA A10G On | 00000000:00:1C.0 Off | 0 |
|
| 3757 |
+
| 0% 43C P0 44W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3758 |
+
| | | N/A |
|
| 3759 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 3760 |
+
| 2 NVIDIA A10G On | 00000000:00:1D.0 Off | 0 |
|
| 3761 |
+
| 0% 42C P0 46W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3762 |
+
| | | N/A |
|
| 3763 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 3764 |
+
| 3 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
|
| 3765 |
+
| 0% 41C P0 43W / 300W | 0MiB / 23028MiB | 0% Default |
|
| 3766 |
+
| | | N/A |
|
| 3767 |
+
+-----------------------------------------+------------------------+----------------------+
|
| 3768 |
+
|
| 3769 |
+
+-----------------------------------------------------------------------------------------+
|
| 3770 |
+
| Processes: |
|
| 3771 |
+
| GPU GI CI PID Type Process name GPU Memory |
|
| 3772 |
+
| ID ID Usage |
|
| 3773 |
+
|=========================================================================================|
|
| 3774 |
+
| No running processes found |
|
| 3775 |
+
+-----------------------------------------------------------------------------------------+
|
| 3776 |
+
|
| 3777 |
+
</div>
|
| 3778 |
+
</div>
|
| 3779 |
+
</div>
|
| 3780 |
+
|
| 3781 |
+
<div class="cell" id="cell-setup">
|
| 3782 |
<div class="cell-header">
|
| 3783 |
<span class="collapse-indicators">
|
| 3784 |
<span onclick="toggleCode('setup')" style="cursor: pointer;">▼ code</span>
|
| 3785 |
<span onclick="toggleOutput('setup')" style="cursor: pointer;">▼ output</span>
|
| 3786 |
<span id="uv-indicator-setup" onclick="toggleUvLogsFromHeader('setup')" style="cursor: pointer;">▶ uv-logs</span>
|
| 3787 |
</span> |
|
| 3788 |
+
Cell: setup | 133.12s
|
| 3789 |
| <button class="run-btn" onclick="runCell('setup')">▶ run</button>
|
| 3790 |
<button class="copy-btn" onclick="copyCell('setup')">Copy</button>
|
| 3791 |
<a href="cells/setup.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 4044 |
|
| 4045 |
What is Tensor Parallelism?
|
| 4046 |
|
| 4047 |
+
<|end|><|start|>assistant<|channel|>analysis<|message|>We need to explain what Tensor Parallelism is. It's a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices, so each device holds a slice of the matrix. During forward/backward passes, each device computes partial results and then they are aggregated. It's used to scale up models beyond single device memory. Also mention pipeline parallelism, data parallelism. Provide details: e.g., for a linear layer weight matrix W of shape (out_features, in_features), we can split along out_features dimension across devices. Each device computes its part of the output. Then gather results. In backward, gradients are computed locally and then aggregated. Provide example: GPT-3 training uses tensor parallelism. Also mention frameworks: Megatron-LM, DeepSpeed, etc. Provide pros/cons. Provide code snippet maybe. Also mention that it's different from data parallelism. Provide explanation of how it works in practice. Provide mention of communication overhead. Provide mention of "tensor model parallelism" vs "tensor parallelism" synonyms. Provide mention of "tensor parallelism" in context of huggingface accelerate. Provide mention of "tensor parallelism" in context of DeepSpeed ZeRO stage 3. Provide mention of "tensor parallelism" in context of Megatron-LM. Provide mention of "tensor parallelism" in context of GPT-NeoX. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-Offload" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-2" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor parallelism" in context of "DeepSpeed's ZeRO-3" maybe. Provide mention of "tensor
|
| 4048 |
+
Generation took 51.92 seconds
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4049 |
</div>
|
| 4050 |
<div class="uv-install-logs" id="uv-logs-setup">
|
| 4051 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
|
|
|
| 4055 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4056 |
Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
|
| 4057 |
Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
|
|
|
| 4058 |
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
|
|
|
|
|
|
| 4059 |
Downloading jedi (1.5MiB)
|
|
|
|
|
|
|
| 4060 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4061 |
+
Downloading hf-xet (3.0MiB)
|
| 4062 |
+
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
| 4063 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4064 |
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4065 |
+
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4066 |
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4067 |
+
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4068 |
+
Downloading pillow (6.3MiB)
|
| 4069 |
+
Downloading sympy (6.0MiB)
|
| 4070 |
+
Downloading fonttools (4.7MiB)
|
| 4071 |
+
Downloading numpy (15.9MiB)
|
| 4072 |
+
Downloading triton (148.4MiB)
|
| 4073 |
+
Downloading networkx (1.9MiB)
|
| 4074 |
+
Downloading tokenizers (3.1MiB)
|
| 4075 |
Downloading pygments (1.2MiB)
|
| 4076 |
+
Downloading matplotlib (8.3MiB)
|
| 4077 |
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4078 |
+
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4079 |
+
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4080 |
Downloading kiwisolver (1.4MiB)
|
| 4081 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4082 |
Downloading torch (846.8MiB)
|
|
|
|
|
|
|
|
|
|
| 4083 |
Downloading nvidia-cufile-cu12
|
| 4084 |
Downloading kiwisolver
|
| 4085 |
Downloading pygments
|
|
|
|
| 4092 |
Downloading nvidia-cuda-cupti-cu12
|
| 4093 |
Downloading numpy
|
| 4094 |
Downloading sympy
|
|
|
|
| 4095 |
Built transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
| 4096 |
+
Downloading nvidia-nvjitlink-cu12
|
| 4097 |
Downloading jedi
|
| 4098 |
Downloading nvidia-curand-cu12
|
| 4099 |
Downloading nvidia-cuda-nvrtc-cu12
|
|
|
|
| 4106 |
Downloading nvidia-cublas-cu12
|
| 4107 |
Downloading nvidia-cudnn-cu12
|
| 4108 |
Downloading torch
|
| 4109 |
+
Installed 69 packages in 467ms
|
| 4110 |
</div>
|
| 4111 |
</div>
|
| 4112 |
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4113 |
+
Fetching 3 files: 33%|███▎ | 1/3 [00:06<00:13, 6.78s/it]
|
| 4114 |
+
Fetching 3 files: 67%|██████▋ | 2/3 [00:08<00:03, 3.65s/it]
|
| 4115 |
+
Fetching 3 files: 100%|██████████| 3/3 [00:08<00:00, 2.75s/it]
|
| 4116 |
You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
|
| 4117 |
|
| 4118 |
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4119 |
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.34s/it]
|
| 4120 |
+
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.25s/it]
|
| 4121 |
+
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.80s/it]
|
| 4122 |
+
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.93s/it]
|
| 4123 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4124 |
|
| 4125 |
Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s]
|
| 4126 |
+
Fetching 6 files: 17%|█▋ | 1/6 [00:00<00:01, 3.89it/s]
|
| 4127 |
+
Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 17.67it/s]
|
| 4128 |
+
/tmp/uvnote-run-hvgovjfd/home/.cache/uv/environments-v2/setup-443c07e337d3be43/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
|
|
|
| 4129 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4130 |
warnings.warn(
|
| 4131 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
|
|
|
| 4152 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4153 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4154 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
| 4155 |
+
/tmp/uvnote-run-hvgovjfd/home/.cache/uv/environments-v2/setup-443c07e337d3be43/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4156 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4157 |
warnings.warn(
|
| 4158 |
INFO:root:Using layer `Yamoe` from repo `drbh/yamoe` (revision: v0.3.0) for layer `Yamoe`
|
|
|
|
| 4189 |
<span onclick="toggleOutput('setup2')" style="cursor: pointer;">▼ output</span>
|
| 4190 |
<span id="uv-indicator-setup2" onclick="toggleUvLogsFromHeader('setup2')" style="cursor: pointer;">▶ uv-logs</span>
|
| 4191 |
</span> |
|
| 4192 |
+
Cell: setup2 | 139.97s
|
| 4193 |
| <button class="run-btn" onclick="runCell('setup2')">▶ run</button>
|
| 4194 |
<button class="copy-btn" onclick="copyCell('setup2')">Copy</button>
|
| 4195 |
<a href="cells/setup2.py" target="_blank" class="raw-btn">Raw</a>
|
|
|
|
| 4447 |
What is Tensor Parallelism?
|
| 4448 |
|
| 4449 |
<|end|><|start|>assistant<|channel|>analysis<|message|>We need to explain what Tensor Parallelism is. It's a concept in distributed training of large language models. It refers to splitting the weight matrices (tensors) across multiple devices. Provide details: how it works, benefits, challenges, typical frameworks, etc. Also mention difference from data parallelism, pipeline parallelism. Provide example: splitting a weight matrix across GPUs, each GPU holds a slice, compute partial results, then gather. Provide mention of communication overhead, scaling, etc. Also mention that it's used in large models like GPT-3, Megatron-LM, DeepSpeed, etc. Provide explanation of how it reduces memory usage, increases throughput. Provide mention of "tensor model parallelism" vs "tensor parallelism" synonyms. Provide mention of "tensor parallelism" in context of huggingface accelerate, DeepSpeed, Megatron. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in the "DeepSpeed ZeRO-Offload" or "ZeRO-3" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed" and "Megatron-LM" and "DeepSpeed's ZeRO" and "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the context of "tensor parallelism" in "DeepSpeed's ZeRO-3" and "DeepSpeed's ZeRO-2" etc. Provide mention of "tensor parallelism" in the
|
| 4450 |
+
Generation took 57.98 seconds
|
| 4451 |
</div>
|
| 4452 |
<div class="uv-install-logs" id="uv-logs-setup2">
|
| 4453 |
<div class="uv-logs-header" onclick="toggleUvLogs(this)">▶ UV Install Logs</div>
|
|
|
|
| 4456 |
Downloading cpython-3.13.7-linux-x86_64-gnu (download)
|
| 4457 |
Updating https://github.com/huggingface/transformers.git (HEAD)
|
| 4458 |
Updated https://github.com/huggingface/transformers.git (99b0995138c17ef953959c70f35cb2bdc41111a2)
|
| 4459 |
+
Downloading jedi (1.5MiB)
|
| 4460 |
Downloading pygments (1.2MiB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4461 |
Downloading nvidia-cufile-cu12 (1.1MiB)
|
| 4462 |
+
Building transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
| 4463 |
Downloading hf-xet (3.0MiB)
|
| 4464 |
+
Downloading nvidia-cuda-cupti-cu12 (9.8MiB)
|
| 4465 |
+
Downloading numpy (15.9MiB)
|
|
|
|
| 4466 |
Downloading sympy (6.0MiB)
|
| 4467 |
+
Downloading matplotlib (8.3MiB)
|
| 4468 |
+
Downloading nvidia-cudnn-cu12 (674.0MiB)
|
| 4469 |
+
Downloading networkx (1.9MiB)
|
| 4470 |
+
Downloading nvidia-nvjitlink-cu12 (37.4MiB)
|
| 4471 |
+
Downloading pillow (6.3MiB)
|
| 4472 |
+
Downloading nvidia-cublas-cu12 (566.8MiB)
|
| 4473 |
+
Downloading tokenizers (3.1MiB)
|
| 4474 |
+
Downloading nvidia-cusolver-cu12 (255.1MiB)
|
| 4475 |
Downloading nvidia-cusparse-cu12 (274.9MiB)
|
| 4476 |
+
Downloading nvidia-curand-cu12 (60.7MiB)
|
| 4477 |
Downloading nvidia-cusparselt-cu12 (273.9MiB)
|
|
|
|
| 4478 |
Downloading nvidia-nccl-cu12 (307.4MiB)
|
| 4479 |
+
Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB)
|
| 4480 |
+
Downloading fonttools (4.7MiB)
|
| 4481 |
+
Downloading nvidia-cufft-cu12 (184.2MiB)
|
| 4482 |
Downloading triton (148.4MiB)
|
|
|
|
| 4483 |
Downloading kiwisolver (1.4MiB)
|
| 4484 |
+
Downloading torch (846.8MiB)
|
| 4485 |
Downloading nvidia-cufile-cu12
|
| 4486 |
Downloading kiwisolver
|
| 4487 |
Downloading pygments
|
|
|
|
| 4494 |
Downloading nvidia-cuda-cupti-cu12
|
| 4495 |
Downloading numpy
|
| 4496 |
Downloading sympy
|
|
|
|
| 4497 |
Built transformers @ git+https://github.com/huggingface/transformers.git@99b0995138c17ef953959c70f35cb2bdc41111a2
|
| 4498 |
+
Downloading nvidia-nvjitlink-cu12
|
| 4499 |
Downloading jedi
|
| 4500 |
Downloading nvidia-curand-cu12
|
| 4501 |
Downloading nvidia-cuda-nvrtc-cu12
|
|
|
|
| 4508 |
Downloading nvidia-cublas-cu12
|
| 4509 |
Downloading nvidia-cudnn-cu12
|
| 4510 |
Downloading torch
|
| 4511 |
+
Installed 69 packages in 468ms
|
| 4512 |
</div>
|
| 4513 |
</div>
|
| 4514 |
<div class="cell-stderr">Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4515 |
+
Fetching 3 files: 33%|███▎ | 1/3 [00:06<00:12, 6.38s/it]
|
| 4516 |
+
Fetching 3 files: 67%|██████▋ | 2/3 [00:08<00:03, 3.61s/it]
|
| 4517 |
+
Fetching 3 files: 100%|██████████| 3/3 [00:08<00:00, 2.69s/it]
|
| 4518 |
You are using full precision kernels, we will dequantize the model to bf16. To use the quantized model with quantization kernels, please set use_kernels=False
|
| 4519 |
|
| 4520 |
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
| 4521 |
+
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.34s/it]
|
| 4522 |
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.25s/it]
|
| 4523 |
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.80s/it]
|
| 4524 |
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.93s/it]
|
| 4525 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4526 |
|
| 4527 |
Fetching 66 files: 0%| | 0/66 [00:00<?, ?it/s]
|
| 4528 |
+
Fetching 66 files: 2%|▏ | 1/66 [00:00<00:10, 6.10it/s]
|
| 4529 |
+
Fetching 66 files: 14%|█▎ | 9/66 [00:00<00:01, 30.47it/s]
|
| 4530 |
+
Fetching 66 files: 24%|██▍ | 16/66 [00:00<00:01, 37.56it/s]
|
| 4531 |
+
Fetching 66 files: 30%|███ | 20/66 [00:01<00:03, 14.24it/s]
|
| 4532 |
+
Fetching 66 files: 67%|██████▋ | 44/66 [00:01<00:00, 37.14it/s]
|
| 4533 |
+
Fetching 66 files: 91%|█████████ | 60/66 [00:01<00:00, 49.97it/s]
|
| 4534 |
+
Fetching 66 files: 100%|██████████| 66/66 [00:01<00:00, 36.02it/s]
|
| 4535 |
+
/tmp/uvnote-run-nw4e52ut/home/.cache/uv/environments-v2/setup2-69adf76231e4ab4f/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4536 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4537 |
warnings.warn(
|
| 4538 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
|
|
|
| 4559 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4560 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4561 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|
| 4562 |
+
/tmp/uvnote-run-nw4e52ut/home/.cache/uv/environments-v2/setup2-69adf76231e4ab4f/lib/python3.13/site-packages/kernels/layer.py:868: UserWarning:
|
| 4563 |
No kernel mapping found for layer `None`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
|
| 4564 |
warnings.warn(
|
| 4565 |
INFO:root:Using layer `MegaBlocksMoeMLP` from repo `kernels-community/megablocks` (revision: main) for layer `MegaBlocksMoeMLP`
|