Update
Browse files- index.html +2 -2
- llm_conf.qmd +2 -2
    	
        index.html
    CHANGED
    
    | @@ -512,11 +512,11 @@ | |
| 512 | 
             
            <ul>
         | 
| 513 | 
             
            <li>No distributed techniques at play</li>
         | 
| 514 | 
             
            </ul></li>
         | 
| 515 | 
            -
            <li>DDP:
         | 
| 516 | 
             
            <ul>
         | 
| 517 | 
             
            <li>A full copy of the model exists on each device, but data is chunked between each GPU</li>
         | 
| 518 | 
             
            </ul></li>
         | 
| 519 | 
            -
            <li>FSDP & DeepSpeed:
         | 
| 520 | 
             
            <ul>
         | 
| 521 | 
             
            <li>Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs</li>
         | 
| 522 | 
             
            </ul></li>
         | 
|  | |
| 512 | 
             
            <ul>
         | 
| 513 | 
             
            <li>No distributed techniques at play</li>
         | 
| 514 | 
             
            </ul></li>
         | 
| 515 | 
            +
            <li>Distributed Data Parallelism (DDP):
         | 
| 516 | 
             
            <ul>
         | 
| 517 | 
             
            <li>A full copy of the model exists on each device, but data is chunked between each GPU</li>
         | 
| 518 | 
             
            </ul></li>
         | 
| 519 | 
            +
            <li>Fully Sharded Data Parallelism (FSDP) & DeepSpeed (DS):
         | 
| 520 | 
             
            <ul>
         | 
| 521 | 
             
            <li>Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs</li>
         | 
| 522 | 
             
            </ul></li>
         | 
    	
        llm_conf.qmd
    CHANGED
    
    | @@ -61,9 +61,9 @@ What can we do? | |
| 61 |  | 
| 62 | 
             
            * Single GPU:
         | 
| 63 | 
             
              * No distributed techniques at play
         | 
| 64 | 
            -
            * DDP: 
         | 
| 65 | 
             
              * A full copy of the model exists on each device, but data is chunked between each GPU
         | 
| 66 | 
            -
            * FSDP & DeepSpeed:
         | 
| 67 | 
             
              * Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs
         | 
| 68 |  | 
| 69 |  | 
|  | |
| 61 |  | 
| 62 | 
             
            * Single GPU:
         | 
| 63 | 
             
              * No distributed techniques at play
         | 
| 64 | 
            +
            * Distributed Data Parallelism (DDP): 
         | 
| 65 | 
             
              * A full copy of the model exists on each device, but data is chunked between each GPU
         | 
| 66 | 
            +
            * Fully Sharded Data Parallelism (FSDP) & DeepSpeed (DS):
         | 
| 67 | 
             
              * Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs
         | 
| 68 |  | 
| 69 |  | 
