pvt_tiny¶

lucid.models.pvt_tiny(img_size: int = 224, num_classes: int = 1000, **kwargs) → PVT¶

The pvt_tiny function instantiates the PVT-Tiny model, a lightweight variant of the Pyramid Vision Transformer (PVT). PVT-Tiny utilizes a hierarchical transformer architecture with progressive spatial reduction, enabling efficient global feature learning while maintaining computational efficiency.

Total Parameters: 12,457,192

Function Signature¶

@register_model
def pvt_tiny(img_size: int = 224, num_classes: int = 1000, **kwargs) -> PVT

Parameters¶

img_size (int, optional): The input image size. Default is 224.
num_classes (int, optional): The number of output classes for classification. Default is 1000.
kwargs (dict, optional): Additional parameters for customization, including:
- embed_dims (list[int]): A list specifying the embedding dimensions for different stages. The default for PVT-Tiny is [64, 128, 320, 512], indicating the number of channels at each stage of the model.
- depths (list[int]): A list specifying the number of transformer blocks in each stage. The default configuration for PVT-Tiny is [2, 2, 2, 2], representing the depth of the transformer at each stage.
- num_heads (list[int]): A list specifying the number of attention heads in each stage. The default values for PVT-Tiny are [1, 2, 5, 8], enabling multi-head self-attention at different scales.
- mlp_ratios (list[float]): The expansion ratio for the MLP layers in each stage. Default is [8.0, 8.0, 4.0, 4.0].
- sr_ratios (list[float]): A list specifying the spatial reduction ratio for key and value projections in each stage. Default is [8.0, 4.0, 2.0, 1.0].

Returns¶

PVT: An instance of the PVT class configured as a PVT-Tiny vision transformer.

Examples¶

>>> import lucid.models as models
>>> model = models.pvt_tiny()
>>> print(model)
PVT(img_size=224, num_classes=1000, embed_dims=[64, 128, 320, 512],
    depths=[2, 2, 2, 2], num_heads=[1, 2, 5, 8], sr_ratios=[8.0, 4.0, 2.0, 1.0], ...)