Distributed: Backend, mappings and strategies
Distributed API
Mappings
Broadcast tensor to all ranks in the tensor parallel group. |
|
Broadcast tensor to all ranks in the data parallel group. |
|
All-to-all gather tensors from ranks in the tensor parallel group. |
|
All-to-all gather tensors from ranks in the data parallel group. |
|
Scatter tensors to ranks in the tensor parallel group. |
|
Scatter tensors to ranks in the data parallel group. |
|
Gather groups of tensors from ranks of the pipeline group to pipeline rank0. |
Strategies
Defines a routing strategy, which every device participates in. |
|
Defines a routing strategy for parameter tensors supporting TP sharding. |
|
Defines a routing strategy which materializes the activation tensor on all TP and DP ranks via all-gather collectives. |
|
Defines an offload strategy, which each device may or may not participate in. |
|
Defines an editing function execution strategy. |
|