MaxViTConfig¶
- class lucid.models.MaxViTConfig(in_channels: int = 3, depths: tuple[int, ...] | list[int] = (2, 2, 5, 2), channels: tuple[int, ...] | list[int] = (64, 128, 256, 512), num_classes: int = 1000, embed_dim: int = 64, num_heads: int = 32, grid_window_size: tuple[int, int] | list[int] = (7, 7), attn_drop: float = 0.0, drop: float = 0.0, drop_path: float = 0.0, mlp_ratio: float = 4.0, act_layer: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.activation.GELU'>, norm_layer: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.norm.BatchNorm2d'>, norm_layer_tf: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.norm.LayerNorm'>)¶
MaxViTConfig stores the stage layout and attention settings used by
lucid.models.MaxViT. It defines the stem width, per-stage depths and
channels, shared attention head count, window size, dropout settings, and
classifier size.
Class Signature¶
@dataclass
class MaxViTConfig:
in_channels: int = 3
depths: tuple[int, ...] | list[int] = (2, 2, 5, 2)
channels: tuple[int, ...] | list[int] = (64, 128, 256, 512)
num_classes: int = 1000
embed_dim: int = 64
num_heads: int = 32
grid_window_size: tuple[int, int] | list[int] = (7, 7)
attn_drop: float = 0.0
drop: float = 0.0
drop_path: float = 0.0
mlp_ratio: float = 4.0
act_layer: type[nn.Module] = nn.GELU
norm_layer: type[nn.Module] = nn.BatchNorm2d
norm_layer_tf: type[nn.Module] = nn.LayerNorm
Parameters¶
in_channels (int): Number of input image channels.
depths: Number of MaxViT blocks in each stage.
channels: Output channel width for each stage.
num_classes (int): Number of output classes. Set to 0 to keep an identity classifier.
embed_dim (int): Width of the convolutional stem.
num_heads (int): Shared attention head count for window and grid attention.
grid_window_size: Window size used by both attention partitioning schemes.
attn_drop, drop, drop_path: Attention, projection, and stochastic depth dropout settings.
mlp_ratio (float): Hidden width multiplier for transformer MLP layers.
act_layer, norm_layer, norm_layer_tf: Activation and normalization modules used by the stem, MBConv path, and transformer blocks.
Validation¶
in_channels, embed_dim, and num_heads must be greater than 0.
depths must contain at least one positive integer.
channels must contain one positive width per stage.
Each channel width must be divisible by num_heads.
num_classes must be greater than or equal to 0.
grid_window_size must contain exactly two positive integers.
attn_drop, drop, and drop_path must each be in [0, 1).
mlp_ratio must be greater than 0.
Usage¶
import lucid.models as models
config = models.MaxViTConfig(
in_channels=1,
depths=(1, 1),
channels=(16, 32),
num_classes=10,
embed_dim=16,
num_heads=4,
grid_window_size=(1, 1),
mlp_ratio=2.0,
)
model = models.MaxViT(config)