Bases: PretrainedConfig
This is the configuration class to store the configuration of an [AIMv2Model]. Instantiating a configuration with the defaults will yield a similar configuration to that of the apple/aimv2-large-patch14-224. Args: hidden_size: Dimension of the hidden representations. intermediate_size: Dimension of the SwiGLU representations. num_hidden_layers: Number of hidden layers in the Transformer. num_attention_heads: Number of attention heads for each attention layer in the Transformer. num_channels: Number of input channels. image_size: Image size. patch_size: Patch size. rms_norm_eps: Epsilon value used for the RMS normalization layer. attention_dropout: Dropout ratio for attention probabilities. projection_dropout: Dropout ratio for the projection layer after the attention. qkv_bias: Whether to add a bias to the queries, keys and values. use_bias: Whether to add a bias in the feed-forward and projection layers. kwargs: Keyword arguments for the [PretrainedConfig].
Source code in vllm/transformers_utils/configs/ovis.py
  
 __init__(
    hidden_size: int = 1024,
    intermediate_size: int = 2816,
    num_hidden_layers: int = 24,
    num_attention_heads: int = 8,
    num_channels: int = 3,
    image_size: int = 224,
    patch_size: int = 14,
    rms_norm_eps: float = 1e-05,
    attention_dropout: float = 0.0,
    projection_dropout: float = 0.0,
    qkv_bias: bool = False,
    use_bias: bool = False,
    **kwargs: Any,
)
Source code in vllm/transformers_utils/configs/ovis.py
  
  Bases: BaseVisualTokenizerConfig
Source code in vllm/transformers_utils/configs/ovis.py
  
    
  Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/ovis.py
  
 __init__(
    vocab_size=16384,
    tokenize_function="softmax",
    tau=1.0,
    depths=None,
    drop_cls_token=False,
    backbone_config: PretrainedConfig | dict | None = None,
    hidden_stride: int = 1,
    **kwargs,
)
Source code in vllm/transformers_utils/configs/ovis.py
  
  Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/ovis.py
  instance-attribute  ¶
   
 __init__(
    llm_config: PretrainedConfig | dict | None = None,
    visual_tokenizer_config: PretrainedConfig
    | dict
    | None = None,
    multimodal_max_length=8192,
    hidden_size=None,
    conversation_formatter_class=None,
    llm_attn_implementation=None,
    disable_tie_weight=False,
    **kwargs,
)
Source code in vllm/transformers_utils/configs/ovis.py
  
  Bases: BaseVisualTokenizerConfig