Mask2Former¶
Transformer Segmentation Transformer
- class lucid.models.Mask2Former(config: Mask2FormerConfig, backbone: Module | None = None)¶
Mask2Former extends mask-classification segmentation with masked attention and multi-scale features. In this lucid implementation, it supports both ResNet and Swin backbones through preset builders.
%%{init: {"flowchart":{"curve":"monotoneX","nodeSpacing":50,"rankSpacing":50}} }%%
flowchart LR
linkStyle default stroke-width:2.0px
subgraph sg_m0["<span style='font-size:20px;font-weight:700'>Mask2Former</span>"]
style sg_m0 fill:#000000,fill-opacity:0.05,stroke:#000000,stroke-opacity:0.75,stroke-width:1px
subgraph sg_m1["_Mask2FormerModel"]
style sg_m1 fill:#000000,fill-opacity:0.05,stroke:#000000,stroke-opacity:0.75,stroke-width:1px
subgraph sg_m2["_Mask2FormerPixelLevelModule"]
direction TB;
style sg_m2 fill:#000000,fill-opacity:0.05,stroke:#000000,stroke-opacity:0.75,stroke-width:1px
subgraph sg_m3["_Mask2FormerSwinBackbone"]
direction TB;
style sg_m3 fill:#000000,fill-opacity:0.05,stroke:#000000,stroke-opacity:0.75,stroke-width:1px
m4["Module"];
m5["Dropout"];
m6(["Module x 2"]);
end
subgraph sg_m7["_Mask2FormerPixelDecoder"]
direction TB;
style sg_m7 fill:#000000,fill-opacity:0.05,stroke:#000000,stroke-opacity:0.75,stroke-width:1px
m8["_Mask2FormerSinePositionEmbedding"];
m9["ModuleList"];
m10["_Mask2FormerPixelDecoderEncoderOnly"];
m11["Conv2d"];
m12(["Sequential x 2"]);
end
end
subgraph sg_m13["_Mask2FormerTransformerModule"]
direction TB;
style sg_m13 fill:#000000,fill-opacity:0.05,stroke:#000000,stroke-opacity:0.75,stroke-width:1px
m14["_Mask2FormerSinePositionEmbedding"];
m15(["Embedding x 2"]);
subgraph sg_m16["_Mask2FormerMaskedAttentionDecoder"]
direction TB;
style sg_m16 fill:#000000,fill-opacity:0.05,stroke:#000000,stroke-opacity:0.75,stroke-width:1px
m17["ModuleList"];
m18["LayerNorm"];
m19["_Mask2FormerMaskPredictor"];
end
m20["Embedding"];
end
end
m21["Linear"];
subgraph sg_m22["_Mask2FormerLoss"]
style sg_m22 fill:#000000,fill-opacity:0.05,stroke:#000000,stroke-opacity:0.75,stroke-width:1px
m23["_Mask2FormerHungarianMatcher"];
end
end
input["Input"];
output["Output"];
style input fill:#fff3cd,stroke:#a67c00,stroke-width:1px;
style output fill:#fff3cd,stroke:#a67c00,stroke-width:1px;
style m5 fill:#edf2f7,stroke:#4a5568,stroke-width:1px;
style m11 fill:#ffe8e8,stroke:#c53030,stroke-width:1px;
style m15 fill:#f1f5f9,stroke:#475569,stroke-width:1px;
style m18 fill:#e6fffa,stroke:#2c7a7b,stroke-width:1px;
style m20 fill:#f1f5f9,stroke:#475569,stroke-width:1px;
style m21 fill:#ebf8ff,stroke:#2b6cb0,stroke-width:1px;
input --> m4;
m10 --> m12;
m11 --> m14;
m12 --> m11;
m14 -.-> m18;
m17 -.-> m18;
m18 --> m19;
m19 --> m17;
m19 --> m21;
m21 --> output;
m4 --> m5;
m5 --> m6;
m6 -.-> m9;
m8 --> m10;
m8 -.-> m9;
m9 --> m8;
Class Signature¶
class Mask2Former(PreTrainedModelMixin, nn.Module):
def __init__(
self,
config: Mask2FormerConfig,
backbone: nn.Module | None = None,
) -> None
Parameters¶
config (Mask2FormerConfig): Model hyperparameters including backbone metadata, decoder depth, and losses.
backbone (nn.Module | None, optional): Feature extractor for the pixel-level module. If None, a supported backbone can be inferred from config.backbone_config.
Methods¶
- Mask2Former.forward(pixel_values: Tensor, mask_labels: list[Tensor] | None = None, class_labels: list[Tensor] | None = None, pixel_mask: Tensor | None = None, output_hidden_states: bool | None = None, output_auxiliary_logits: bool | None = None, output_attentions: bool | None = None, **kwargs) dict[str, Any]¶
- Mask2Former.predict(pixel_values: Tensor, pixel_mask: Tensor | None = None, output_size: tuple[int, int] | None = None, top_k_queries: int | None = None, return_logits: bool = False, return_scores: bool = False) Tensor | dict[str, Tensor]¶
- Mask2Former.get_auxiliary_logits(classes: tuple[Tensor, ...], output_masks: tuple[Tensor, ...]) list[dict[str, Tensor]]¶
- Mask2Former.get_loss_dict(masks_queries_logits: Tensor, class_queries_logits: Tensor, mask_labels: list[Tensor], class_labels: list[Tensor], auxiliary_predictions: list[dict[str, Tensor]] | None) dict[str, Tensor]¶
- Mask2Former.from_pretrained(weights: WeightEntry, strict: bool = True) Self¶
Examples¶
Build from Swin Preset
from lucid.models.vision.mask2former import mask2former_swin_small
import lucid
model = mask2former_swin_small(num_labels=150)
x = lucid.random.randn(1, 3, 224, 224)
out = model(x)
print(out["class_queries_logits"].shape)
print(out["masks_queries_logits"].shape)
Load Pretrained Lucid Weights
import lucid.models as models
import lucid.weights as W
weight = W.Mask2Former_Swin_Small_Weights.ADE20K_SEMANTIC
config = models.Mask2FormerConfig(**weight.config)
model = models.Mask2Former(config).from_pretrained(weight)
Load with Builder Shortcut
import lucid.models as models
import lucid.weights as W
model = models.mask2former_swin_tiny(
num_labels=150,
weights=W.Mask2Former_Swin_Tiny_Weights.ADE20K_SEMANTIC,
)
Swin-Base/Large Input Resolution
import lucid
import lucid.models as models
import lucid.weights as W
model = models.mask2former_swin_base(
num_labels=150,
weights=W.Mask2Former_Swin_Base_Weights.ADE20K_SEMANTIC,
)
x = lucid.random.randn(1, 3, 384, 384)
out = model(x)
print(out["masks_queries_logits"].shape)
Task-Specific ADE20K Tags
import lucid.weights as W
# semantic checkpoints (tiny/small/base/large)
sem = W.Mask2Former_Swin_Large_Weights.ADE20K_SEMANTIC
# panoptic checkpoint (currently large only)
pan = W.Mask2Former_Swin_Large_Weights.ADE20K_PANOPTIC