All-gather the input tensor across model parallel group.
 
    
  Gather the input tensor across model parallel group.
 
  Reduce-Scatter the input tensor across model parallel group.