swin_base¶

lucid.models.swin_base(img_size: int = 224, num_classes: int = 1000, **kwargs) → SwinTransformer¶

The swin_base function instantiates a base Swin Transformer model with a predefined architecture. This model leverages the shifted window mechanism to efficiently capture both local and global dependencies, making it suitable for image recognition and dense prediction tasks.

Total Parameters: 87,768,224

Function Signature¶

@register_model
def swin_base(img_size: int = 224, num_classes: int = 1000, **kwargs) -> SwinTransformer

Parameters¶

img_size (int, optional): The size of the input image (assumes square images). Default is 224.
num_classes (int, optional): The number of output classes for classification. Default is 1000.
kwargs (dict, optional): Additional parameters for customization, including:
- embed_dim (int): The dimension of the embedding for the first stage. The typical default for the base model is 128.
- depths (list[int]): A list specifying the number of transformer blocks in each stage. The typical default configuration is [2, 2, 18, 2], indicating that the model has 4 stages with 2, 2, 18, and 2 blocks respectively.
- num_heads (list[int]): A list specifying the number of attention heads in each stage. The common default for the base model is [4, 8, 16, 32], corresponding to the number of heads used in each stage, which enables the model to capture multi-scale contextual information.

Returns¶

SwinTransformer: An instance of the SwinTransformer class configured as a base vision transformer.

Examples¶

>>> import lucid.models as models
>>> model = models.swin_base()
>>> print(model)
SwinTransformer(img_size=224, num_classes=1000, embed_dim=128, ...)