Utility helpers for NVFP4 + FlashInfer fused-MoE path
 module-attribute  ¶
 __all__ = [
    "is_flashinfer_fp4_cutlass_moe_available",
    "reorder_w1w3_to_w3w1",
    "build_flashinfer_fp4_cutlass_moe_prepare_finalize",
]
 
 build_flashinfer_fp4_cutlass_moe_prepare_finalize(
    moe: FusedMoEConfig,
) -> FusedMoEPrepareAndFinalize
Create a FlashInfer CUTLASS fused-MoE prepare finalize kernel
Source code in vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
  
 is_flashinfer_fp4_cutlass_moe_available() -> bool
Return True when FlashInfer CUTLASS NV-FP4 kernels can be used.
Source code in vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
  
  Re-order the concatenated [w1, w3] tensors to [w3, w1]
Source code in vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
  
 select_nvfp4_gemm_impl(
    moe: FusedMoEConfig,
    moe_quant_config: FusedMoEQuantConfig,
    allow_flashinfer: bool,
) -> FusedMoEPermuteExpertsUnpermute
Return a GEMM experts implementation for NV-FP4 fused-MoE layers