swin_v2_base¶

lucid.models.swin_v2_base(img_size: int = 224, num_classes: int = 1000, **kwargs) → SwinTransformer_V2¶

The swin_v2_base function initializes a base-sized Swin Transformer V2 model with a predefined architecture. This model leverages advanced normalization techniques and an optimized shifted window mechanism, making it highly effective for vision tasks such as image classification, object detection, and segmentation.

Total Parameters: 87,922,400

Function Signature¶

@register_model
def swin_v2_base(img_size: int = 224, num_classes: int = 1000, **kwargs) -> SwinTransformer_V2

Parameters¶

img_size (int, optional): The size of the input image (assumes square images). Default is 224.
num_classes (int, optional): The number of output classes for classification. Default is 1000.
kwargs (dict, optional): Additional parameters for customization, including:
- depths (list[int]): A list specifying the number of transformer blocks in each stage. The typical default configuration is [2, 2, 18, 2], indicating that the model has 4 stages with 2, 2, 18, and 2 blocks respectively.
- num_heads (list[int]): A list specifying the number of attention heads in each stage. The common default is [4, 8, 16, 32], allowing the model to effectively capture multi-scale dependencies and spatial context.

Returns¶

SwinTransformer_V2: An instance of the SwinTransformer_V2 class configured as a powerful vision transformer.

Examples¶

>>> import lucid.models as models
>>> model = models.swin_v2_base()
>>> print(model)
SwinTransformer_V2(img_size=224, num_classes=1000, ...)