PVTV2Config

class lucid.models.PVTV2Config(img_size: int = 224, patch_size: int = 7, in_channels: int = 3, num_classes: int = 1000, embed_dims: tuple[int, ...] | list[int] = (64, 128, 256, 512), num_heads: tuple[int, ...] | list[int] = (1, 2, 4, 8), mlp_ratios: tuple[int, ...] | list[int] = (4, 4, 4, 4), qkv_bias: bool = False, qk_scale: float | None = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_layer: type[lucid.nn.module.Module] = <class 'lucid.nn.modules.norm.LayerNorm'>, depths: tuple[int, ...] | list[int] = (3, 4, 6, 3), sr_ratios: tuple[int, ...] | list[int] = (8, 4, 2, 1), num_stages: int = 4, linear: bool = False)

PVTV2Config stores the stage layout and classifier settings used by lucid.models.PVT_V2. It defines the overlap patch embedding, stage depths, attention heads, spatial reduction ratios, and whether linear attention is enabled.

Class Signature

@dataclass
class PVTV2Config:
    img_size: int = 224
    patch_size: int = 7
    in_channels: int = 3
    num_classes: int = 1000
    embed_dims: tuple[int, ...] | list[int] = (64, 128, 256, 512)
    num_heads: tuple[int, ...] | list[int] = (1, 2, 4, 8)
    mlp_ratios: tuple[int, ...] | list[int] = (4, 4, 4, 4)
    qkv_bias: bool = False
    qk_scale: float | None = None
    drop_rate: float = 0.0
    attn_drop_rate: float = 0.0
    drop_path_rate: float = 0.0
    norm_layer: type[nn.Module] = nn.LayerNorm
    depths: tuple[int, ...] | list[int] = (3, 4, 6, 3)
    sr_ratios: tuple[int, ...] | list[int] = (8, 4, 2, 1)
    num_stages: int = 4
    linear: bool = False

Parameters

  • img_size (int): Input image size. PVT-v2 assumes square inputs.

  • patch_size (int): Patch size for the first overlap patch embedding stage.

  • in_channels (int): Number of input image channels.

  • num_classes (int): Number of output classes. Set to 0 to keep the identity head.

  • embed_dims, num_heads, mlp_ratios, depths, sr_ratios: Per-stage embedding widths, head counts, feedforward ratios, block counts, and spatial reduction ratios.

  • qkv_bias (bool): Whether query, key, and value projections use bias.

  • qk_scale (float | None): Optional attention scaling override.

  • drop_rate, attn_drop_rate, drop_path_rate: Dropout and stochastic depth settings.

  • norm_layer (type[nn.Module]): Normalization layer used throughout the model.

  • num_stages (int): Number of hierarchical stages.

  • linear (bool): Whether to use the linear attention path in PVT-v2 blocks.

Validation

  • img_size, patch_size, in_channels, and num_stages must be greater than 0.

  • patch_size must be greater than 4 for the first overlap patch embedding.

  • num_classes must be greater than or equal to 0.

  • embed_dims, num_heads, mlp_ratios, depths, and sr_ratios must each contain exactly num_stages values.

  • Embedding widths, head counts, depths, and spatial reduction ratios must be positive.

  • Each embedding width must be divisible by the corresponding head count.

  • Dropout rates must each be in [0, 1).

  • The configured image size must leave enough spatial resolution for all stages.

Usage

import lucid.models as models

config = models.PVTV2Config(
    img_size=32,
    patch_size=7,
    in_channels=1,
    num_classes=10,
    embed_dims=(8, 16, 32, 64),
    num_heads=(1, 2, 4, 8),
    mlp_ratios=(2, 2, 2, 2),
    depths=(1, 1, 1, 1),
    sr_ratios=(8, 4, 2, 1),
    drop_path_rate=0.0,
)
model = models.PVT_V2(config)