MaskFormerConfig

class lucid.models.MaskFormerConfig(num_labels: int, fpn_feature_size: int, mask_feature_size: int, backbone_config: dict | None = None, num_channels: int = 3, num_queries: int = 100, encoder_layer: int = 6, encoder_ffn_dim: int = 2048, encoder_attention_heads: int = 8, decoder_config: dict | None = None, decoder_layers: int = 6, decoder_ffn_dim: int = 2048, decoder_attention_heads: int = 8, decoder_hidden_size: int | None = None, decoder_num_queries: int | None = None, encoder_layerdrop: float = 0.0, decoder_layerdrop: float = 0.0, is_encoder_decoder: bool = True, activation_function: str = 'relu', d_model: int = 256, dropout: float = 0.1, attention_dropout: float = 0.1, activation_dropout: float = 0.0, init_std: float = 0.02, init_xavier_std: float = 1.0, dilation: bool = False, class_cost: float = 1.0, mask_loss_coefficient: float = 1.0, dice_loss_coefficient: float = 1.0, eos_coefficient: float = 0.1, no_object_weight: float = 0.1, output_attentions: bool = False, output_hidden_states: bool = False)

MaskFormerConfig stores the full model setup used by lucid.models.MaskFormer. It includes output space, backbone, decoder shape, and loss-related coefficients.

Class Signature

@dataclass
class MaskFormerConfig:
    num_labels: int
    fpn_feature_size: int
    mask_feature_size: int
    backbone_config: dict | None = None
    num_channels: int = 3
    num_queries: int = 100
    encoder_layer: int = 6
    encoder_ffn_dim: int = 2048
    encoder_attention_heads: int = 8
    decoder_config: dict | None = None
    decoder_layers: int = 6
    decoder_ffn_dim: int = 2048
    decoder_attention_heads: int = 8
    decoder_hidden_size: int | None = None
    decoder_num_queries: int | None = None
    encoder_layerdrop: float = 0.0
    decoder_layerdrop: float = 0.0
    is_encoder_decoder: bool = True
    activation_function: str = "relu"
    d_model: int = 256
    dropout: float = 0.1
    attention_dropout: float = 0.1
    activation_dropout: float = 0.0
    init_std: float = 0.02
    init_xavier_std: float = 1.0
    dilation: bool = False
    class_cost: float = 1.0
    mask_loss_coefficient: float = 1.0
    dice_loss_coefficient: float = 1.0
    eos_coefficient: float = 0.1
    no_object_weight: float = 0.1
    output_attentions: bool = False
    output_hidden_states: bool = False

Parameters

  • num_labels (int): Number of semantic classes (foreground categories).

  • fpn_feature_size (int): Pyramid feature channel width.

  • mask_feature_size (int): Hidden width for mask embedding MLP head.

  • backbone_config (dict | None): Backbone metadata (model_type, depths, hidden_sizes).

  • num_channels (int): Input channel count.

  • num_queries (int): Number of segmentation queries.

  • encoder_layer (int): Number of encoder blocks.

  • encoder_ffn_dim (int): Encoder MLP hidden width.

  • encoder_attention_heads (int): Encoder attention heads.

  • decoder_config (dict | None): Decoder preset config (DETR-style).

  • decoder_layers (int): Number of decoder layers.

  • decoder_ffn_dim (int): Decoder MLP hidden width.

  • decoder_attention_heads (int): Decoder attention heads.

  • decoder_hidden_size (int | None): Optional decoder hidden size override.

  • decoder_num_queries (int | None): Optional query count override.

  • encoder_layerdrop (float): Layer drop probability for encoder.

  • decoder_layerdrop (float): Layer drop probability for decoder.

  • is_encoder_decoder (bool): If True, treats model as encoder-decoder style.

  • activation_function (str): Activation function name.

  • d_model (int): Transformer model width.

  • dropout (float): Dropout probability.

  • attention_dropout (float): Attention dropout probability.

  • activation_dropout (float): Activation dropout probability.

  • init_std (float): Normal init standard deviation.

  • init_xavier_std (float): Xavier gain.

  • dilation (bool): Dilated backbone option (reserved in this implementation).

  • class_cost (float): Weight for classification loss in Hungarian matching.

  • mask_loss_coefficient (float): Weight for mask loss term.

  • dice_loss_coefficient (float): Weight for dice loss term.

  • eos_coefficient (float): Weight for end-of-object class.

  • no_object_weight (float): No-object weight for matcher/class losses.

  • output_attentions (bool): Whether to return attention maps.

  • output_hidden_states (bool): Whether to return hidden states.

Usage

import lucid.models as models

cfg = models.MaskFormerConfig(
    num_labels=150,
    fpn_feature_size=256,
    mask_feature_size=256,
    backbone_config={"model_type": "resnet", "depths": [3, 4, 6, 3], "hidden_sizes": [256, 512, 1024, 2048]},
    num_queries=100,
    encoder_layer=6,
    decoder_layers=6,
    decoder_attention_heads=8,
)

model = models.MaskFormer(cfg)