EfficientFormerConfig

class lucid.models.EfficientFormerConfig(depths: tuple[int, ...] | list[int], embed_dims: tuple[int, ...] | list[int], in_channels: int = 3, num_classes: int = 1000, global_pool: bool = True, downsamples: tuple[bool, ...] | list[bool] | None = None, num_vit: int = 0, mlp_ratios: float = 4.0, pool_size: int = 3, layer_scale_init_value: float = 1e-05, act_layer: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.activation.GELU'>, norm_layer: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.norm.BatchNorm2d'>, norm_layer_cl: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.norm.LayerNorm'>, drop_rate: float = 0.0, proj_drop_rate: float = 0.0, drop_path_rate: float = 0.0)

EfficientFormerConfig stores the stage layout and classifier settings used by lucid.models.EfficientFormer. It defines the stage depths, embedding widths, downsampling schedule, number of transformer-style blocks in the final stage, and classifier/dropout behavior.

Class Signature

@dataclass
class EfficientFormerConfig:
    depths: tuple[int, ...] | list[int]
    embed_dims: tuple[int, ...] | list[int]
    in_channels: int = 3
    num_classes: int = 1000
    global_pool: bool = True
    downsamples: tuple[bool, ...] | list[bool] | None = None
    num_vit: int = 0
    mlp_ratios: float = 4.0
    pool_size: int = 3
    layer_scale_init_value: float = 1e-5
    act_layer: type[nn.Module] = nn.GELU
    norm_layer: type[nn.Module] = nn.BatchNorm2d
    norm_layer_cl: type[nn.Module] = nn.LayerNorm
    drop_rate: float = 0.0
    proj_drop_rate: float = 0.0
    drop_path_rate: float = 0.0

Parameters

  • depths: Number of blocks in each stage.

  • embed_dims: Embedding width for each stage.

  • in_channels (int): Number of input image channels.

  • num_classes (int): Number of output classes. Set to 0 to keep an identity classifier.

  • global_pool (bool): Whether to average the final token sequence before classification.

  • downsamples: Optional explicit per-stage downsampling schedule.

  • num_vit (int): Number of transformer-style blocks used in the final stage.

  • mlp_ratios (float): Hidden width multiplier for MLP layers.

  • pool_size (int): Pooling kernel size used by convolutional MetaBlocks.

  • layer_scale_init_value (float): Initial value for layer scale parameters.

  • act_layer, norm_layer, norm_layer_cl: Activation and normalization modules used by the stem, convolutional blocks, and final token blocks.

  • drop_rate, proj_drop_rate, drop_path_rate: Head dropout, projection dropout, and stochastic depth settings.

Validation

  • depths must contain at least one positive integer.

  • embed_dims must contain one positive width per stage.

  • in_channels must be greater than 0.

  • num_classes must be greater than or equal to 0.

  • If provided, downsamples must match the number of stages.

  • num_vit must be greater than or equal to 0.

  • mlp_ratios and pool_size must be greater than 0.

  • layer_scale_init_value must be non-negative.

  • drop_rate, proj_drop_rate, and drop_path_rate must each be in [0, 1).

Usage

import lucid.models as models

config = models.EfficientFormerConfig(
    depths=(1, 1, 1, 1),
    embed_dims=(16, 32, 48, 64),
    in_channels=1,
    num_classes=10,
    num_vit=1,
    mlp_ratios=2.0,
)
model = models.EfficientFormer(config)