Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/midashenglm.py
  
 __init__(
    embed_dim: int = 768,
    outputdim: int = 527,
    patch_size: int | tuple[int, int] = 16,
    patch_stride: int | tuple[int, int] = 16,
    input_channels: int = 1,
    target_length: int = 1012,
    depth: int = 12,
    num_heads: int = 12,
    mlp_ratio: float = 4.0,
    qkv_bias: bool = True,
    init_values: float | None = None,
    drop_rate: float = 0.0,
    attn_drop_rate: float = 0.0,
    f_min: float = 0.0,
    f_max: float = 8000.0,
    center: bool = True,
    win_length: int = 512,
    hop_length: int = 160,
    sample_rate: int = 16000,
    n_fft: int = 512,
    n_mels: int = 64,
    **kwargs,
)
Source code in vllm/transformers_utils/configs/midashenglm.py
  
  Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/midashenglm.py
  instance-attribute  ¶
 audio_encoder_config = DashengConfig(
    **(audio_encoder_config or {})
)
 instance-attribute  ¶
   
 __init__(
    audio_encoder_config: dict | None = None,
    subsample_factor: int = 5,
    text_config: dict | None = None,
    audio_token_id: int | None = None,
    **kwargs,
)