Radio vision model configuration
 module-attribute  ¶
 VIT_TIMM_DIM_BY_NAME: dict[
    str, tuple[int, int, int, int]
] = {
    "vit_small_patch16_224": (384, 12, 6, 1536),
    "vit_base_patch16_224": (768, 12, 12, 3072),
    "vit_large_patch16_224": (1024, 24, 16, 4096),
    "vit_huge_patch16_224": (1280, 32, 16, 5120),
}
 
  Bases: PretrainedConfig
This is the configuration class to store the configuration of a Radio vision model. It is used to instantiate a Radio model according to the specified arguments, defining the model architecture.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| model_name | str | Name of the vision transformer model (e.g., "vit_base_patch16_224"). Used to determine architecture dimensions from  | required | 
| image_size | int | The size (resolution) of each image. | 224 | 
| patch_size | int | The size (resolution) of each patch. | 16 | 
| qkv_bias | bool | Whether to add a bias to the queries, keys and values. | True | 
| qk_normalization | bool | Whether to apply normalization to queries and keys. | False | 
| norm_type | str | The normalization type to use. | 'layer_norm' | 
| layer_norm_eps | float | The epsilon used by the layer normalization layers. | 1e-06 | 
| initializer_factor | float | A factor for initializing all weight matrices. | 1.0 | 
| hidden_act | str | The non-linear activation function in the encoder. | 'gelu' | 
| max_img_size | int | Maximum image size for position embeddings. | 2048 | 
| norm_mean | tuple[float, float, float] | list | Mean values for image normalization (RGB channels). Defaults to (0.48145466, 0.4578275, 0.40821073)). | OPENAI_CLIP_MEAN | 
| norm_std | tuple[float, float, float] | list | Standard deviation values for image normalization (RGB channels). Defaults to (0.26862954, 0.26130258, 0.27577711)). | OPENAI_CLIP_STD | 
| reg_tokens | int | None | Number of register tokens to use. | None | 
Source code in vllm/transformers_utils/configs/radio.py
  instance-attribute  ¶
 norm_mean = (
    list(norm_mean)
    if isinstance(norm_mean, (tuple, list))
    else norm_mean
)
 instance-attribute  ¶
 norm_std = (
    list(norm_std)
    if isinstance(norm_std, (tuple, list))
    else norm_std
)
 
 __init__(
    model_name: str,
    image_size: int = 224,
    patch_size: int = 16,
    qkv_bias: bool = True,
    qk_normalization: bool = False,
    norm_type: str = "layer_norm",
    layer_norm_eps: float = 1e-06,
    initializer_factor: float = 1.0,
    hidden_act: str = "gelu",
    max_img_size: int = 2048,
    norm_mean: tuple[float, float, float]
    | list = OPENAI_CLIP_MEAN,
    norm_std: tuple[float, float, float]
    | list = OPENAI_CLIP_STD,
    reg_tokens: int | None = None,
    **kwargs,
)