Rearrange a large matrix by breaking it into blocks and applying the rearrangement pattern.
See
https://docs.nvidia.com/cuda/cublas/index.html#d-block-scaling-factors-layout
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| input_matrix | Tensor | Input tensor of shape (H, W) | required | 
| backend | Literal['torch', 'triton'] | "torch" (PyTorch path) or "triton" (Triton kernel) | 'triton' | 
Returns:
| Type | Description | 
|---|---|
| Tensor | Rearranged tensor of shape (32ceil_div(H,128), 16ceil_div(W,4)) | 
Source code in vllm/model_executor/layers/quantization/qutlass_utils.py
  
  Rearranges an E8M0 tensor scale from row-major format to block-scaled swizzle format.
This format is suitable for Tmem as described in NVIDIA documentation: https://docs.nvidia.com/cuda/cublas/index.html#d-block-scaling-factors-layout
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| scale_tensor | Tensor | Input tensor in row-major format with 8-bit elements | required | 
Returns:
| Type | Description | 
|---|---|
| Tensor | Rearranged tensor in block-scaled swizzle format | 
Source code in vllm/model_executor/layers/quantization/qutlass_utils.py
  
 triton_scale_swizzle(
    scale_ptr: Tensor,
    scale_rows: int,
    scale_cols: int,
    output_ptr: Tensor,
    input_row_stride: int,
    output_block_stride: int,
    BLOCK_ROWS: constexpr,
    BLOCK_COLS: constexpr,
)
Rearranges tensor data from row-major to block-scaled swizzle format.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| scale_ptr | Tensor | Pointer to the input scale tensor | required | 
| scale_rows | int | Number of rows in the scale tensor | required | 
| scale_cols | int | Number of columns in the scale tensor | required | 
| output_ptr | Tensor | Pointer to the output tensor | required | 
| input_row_stride | int | Stride between rows in the input tensor | required | 
| output_block_stride | int | Stride between blocks in the output tensor | required | 
| BLOCK_ROWS | constexpr | Number of rows in a tile (compile-time constant) | required | 
| BLOCK_COLS | constexpr | Number of columns in a tile (compile-time constant) | required |