crossvit_9_dagger

lucid.models.crossvit_9_dagger(num_classes: int = 1000, **kwargs) CrossViT

The crossvit_9_dagger function provides a convenient way to create an instance of the CrossViT module with an enhanced 9-layer configuration, following the dagger (\(^\dagger\)) variant described in the CrossViT paper.

Total Parameters: 8,776,592

Function Signature

@register_model
def crossvit_9_dagger(num_classes: int = 1000, **kwargs) -> CrossViT

Parameters

  • num_classes (int, optional): The number of output classes for classification. Default is 1000.

  • kwargs (dict, optional): Additional keyword arguments to customize the CrossViT module.

Returns

  • CrossViT: An instance of the CrossViT module configured with the 9-layer dagger variant settings.

Specifications

  • img_size: [240, 224]

  • embed_dim: [128, 256]

  • depth: [[1, 3, 0], [1, 3, 0], [1, 3, 0]]

  • num_heads: [4, 4]

  • mlp_ratio: [3, 3, 1]

  • qkv_bias: True

  • multi_conv: True

About Dagger (\(^\dagger\)) Variants

The dagger (\(^\dagger\)) variant, as introduced in the CrossViT paper, refers to models that use an enhanced patch embedding process.

Instead of a single convolutional layer for patch embedding, these models employ multiple convolutional layers in a hierarchical fashion. Specifically:

  • Standard variants use a single large convolution to project image patches to embedding space

  • Dagger variants use multiple smaller convolutions with stride=2, creating a more CNN-like hierarchical feature extraction

  • This hierarchical approach helps the model capture both global patterns and local details more effectively

  • The multi-scale approach provides better inductive bias for image processing tasks, improving performance particularly for smaller models

The multi-convolution approach is activated by setting the multi_conv=True parameter, which replaces the standard patch embedding with a sequence of smaller convolutional operations, similar to the approach used in ResNet and other CNN architectures.

Examples

Creating a Default CrossViT-9-Dagger Model

import lucid.models as models

# Create a CrossViT-9-Dagger model with 1000 output classes
model = models.crossvit_9_dagger()

print(model)  # Displays the CrossViT-9-Dagger architecture

Custom Number of Classes

# Create a CrossViT-9-Dagger model with 10 output classes
model = models.crossvit_9_dagger(num_classes=10)

print(model)  # Displays the CrossViT-9-Dagger architecture with modified output