EfficientFormerConfig¶

class lucid.models.EfficientFormerConfig(depths: tuple[int, ...] | list[int], embed_dims: tuple[int, ...] | list[int], in_channels: int = 3, num_classes: int = 1000, global_pool: bool = True, downsamples: tuple[bool, ...] | list[bool] | None = None, num_vit: int = 0, mlp_ratios: float = 4.0, pool_size: int = 3, layer_scale_init_value: float = 1e-05, act_layer: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.activation.GELU'>, norm_layer: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.norm.BatchNorm2d'>, norm_layer_cl: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.norm.LayerNorm'>, drop_rate: float = 0.0, proj_drop_rate: float = 0.0, drop_path_rate: float = 0.0)¶

EfficientFormerConfig stores the stage layout and classifier settings used by lucid.models.EfficientFormer. It defines the stage depths, embedding widths, downsampling schedule, number of transformer-style blocks in the final stage, and classifier/dropout behavior.

Class Signature¶

@dataclass
class EfficientFormerConfig:
    depths: tuple[int, ...] | list[int]
    embed_dims: tuple[int, ...] | list[int]
    in_channels: int = 3
    num_classes: int = 1000
    global_pool: bool = True
    downsamples: tuple[bool, ...] | list[bool] | None = None
    num_vit: int = 0
    mlp_ratios: float = 4.0
    pool_size: int = 3
    layer_scale_init_value: float = 1e-5
    act_layer: type[nn.Module] = nn.GELU
    norm_layer: type[nn.Module] = nn.BatchNorm2d
    norm_layer_cl: type[nn.Module] = nn.LayerNorm
    drop_rate: float = 0.0
    proj_drop_rate: float = 0.0
    drop_path_rate: float = 0.0

Parameters¶

depths: Number of blocks in each stage.
embed_dims: Embedding width for each stage.
in_channels (int): Number of input image channels.
num_classes (int): Number of output classes. Set to 0 to keep an identity classifier.
global_pool (bool): Whether to average the final token sequence before classification.
downsamples: Optional explicit per-stage downsampling schedule.
num_vit (int): Number of transformer-style blocks used in the final stage.
mlp_ratios (float): Hidden width multiplier for MLP layers.
pool_size (int): Pooling kernel size used by convolutional MetaBlocks.
layer_scale_init_value (float): Initial value for layer scale parameters.
act_layer, norm_layer, norm_layer_cl: Activation and normalization modules used by the stem, convolutional blocks, and final token blocks.
drop_rate, proj_drop_rate, drop_path_rate: Head dropout, projection dropout, and stochastic depth settings.

Validation¶

depths must contain at least one positive integer.
embed_dims must contain one positive width per stage.
in_channels must be greater than 0.
num_classes must be greater than or equal to 0.
If provided, downsamples must match the number of stages.
num_vit must be greater than or equal to 0.
mlp_ratios and pool_size must be greater than 0.
layer_scale_init_value must be non-negative.
drop_rate, proj_drop_rate, and drop_path_rate must each be in [0, 1).

Usage¶

import lucid.models as models

config = models.EfficientFormerConfig(
    depths=(1, 1, 1, 1),
    embed_dims=(16, 32, 48, 64),
    in_channels=1,
    num_classes=10,
    num_vit=1,
    mlp_ratios=2.0,
)
model = models.EfficientFormer(config)