MaxViTConfig¶

class lucid.models.MaxViTConfig(in_channels: int = 3, depths: tuple[int, ...] | list[int] = (2, 2, 5, 2), channels: tuple[int, ...] | list[int] = (64, 128, 256, 512), num_classes: int = 1000, embed_dim: int = 64, num_heads: int = 32, grid_window_size: tuple[int, int] | list[int] = (7, 7), attn_drop: float = 0.0, drop: float = 0.0, drop_path: float = 0.0, mlp_ratio: float = 4.0, act_layer: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.activation.GELU'>, norm_layer: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.norm.BatchNorm2d'>, norm_layer_tf: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.norm.LayerNorm'>)¶

MaxViTConfig stores the stage layout and attention settings used by lucid.models.MaxViT. It defines the stem width, per-stage depths and channels, shared attention head count, window size, dropout settings, and classifier size.

Class Signature¶

@dataclass
class MaxViTConfig:
    in_channels: int = 3
    depths: tuple[int, ...] | list[int] = (2, 2, 5, 2)
    channels: tuple[int, ...] | list[int] = (64, 128, 256, 512)
    num_classes: int = 1000
    embed_dim: int = 64
    num_heads: int = 32
    grid_window_size: tuple[int, int] | list[int] = (7, 7)
    attn_drop: float = 0.0
    drop: float = 0.0
    drop_path: float = 0.0
    mlp_ratio: float = 4.0
    act_layer: type[nn.Module] = nn.GELU
    norm_layer: type[nn.Module] = nn.BatchNorm2d
    norm_layer_tf: type[nn.Module] = nn.LayerNorm

Parameters¶

in_channels (int): Number of input image channels.
depths: Number of MaxViT blocks in each stage.
channels: Output channel width for each stage.
num_classes (int): Number of output classes. Set to 0 to keep an identity classifier.
embed_dim (int): Width of the convolutional stem.
num_heads (int): Shared attention head count for window and grid attention.
grid_window_size: Window size used by both attention partitioning schemes.
attn_drop, drop, drop_path: Attention, projection, and stochastic depth dropout settings.
mlp_ratio (float): Hidden width multiplier for transformer MLP layers.
act_layer, norm_layer, norm_layer_tf: Activation and normalization modules used by the stem, MBConv path, and transformer blocks.

Validation¶

in_channels, embed_dim, and num_heads must be greater than 0.
depths must contain at least one positive integer.
channels must contain one positive width per stage.
Each channel width must be divisible by num_heads.
num_classes must be greater than or equal to 0.
grid_window_size must contain exactly two positive integers.
attn_drop, drop, and drop_path must each be in [0, 1).
mlp_ratio must be greater than 0.

Usage¶

import lucid.models as models

config = models.MaxViTConfig(
    in_channels=1,
    depths=(1, 1),
    channels=(16, 32),
    num_classes=10,
    embed_dim=16,
    num_heads=4,
    grid_window_size=(1, 1),
    mlp_ratio=2.0,
)
model = models.MaxViT(config)