Conv2d
ModuleConv2d(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d | str = 0, dilation: _Size2d = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device: DeviceLike = None, dtype: DTypeLike = None)Applies a 2D convolution over a batch of images or feature maps.
This module computes the 2D cross-correlation between the input and a set of learnable filters. For a single output channel and a single input channel the operation is:
where is the stride, is the dilation factor, and is the number of groups.
Parameters
in_channelsintout_channelsintkernel_sizeint or tuple[int, int]int is broadcast
to (kernel_size, kernel_size).strideint or tuple[int, int]= 11.paddingint, tuple[int, int], or str= 0"same" pads
so the output spatial size equals ceil(H_in / s) (requires
stride=1); "valid" means no padding. Default: 0.dilationint or tuple[int, int]= 11.groupsint= 1in_channels and out_channels must be
divisible by groups. groups = in_channels gives
depthwise convolution. Default: 1.biasbool= TrueTrue, adds a learnable bias to the output. Default: True.padding_modestr= 'zeros'"zeros", "reflect", "replicate", or "circular".
Default: "zeros".deviceDeviceLike= NoneNone.dtypeDTypeLike= NoneNone.Attributes
weightParameter(out_channels, in_channels // groups, K_H, K_W).
Initialized with Kaiming uniform using :
biasParameter or None(out_channels,), or None.Notes
Input: Output: where
$$
H_{\text{out}} = \left\lfloor \frac{H + 2p_h - d_h(K_H - 1) - 1}{s_h} + 1 \right\rfloor, \quad W_{\text{out}} = \left\lfloor \frac{W + 2p_w - d_w(K_W - 1) - 1}{s_w} + 1 \right\rfloor
Groups and depthwise convolution. When groups = in_channels
each input channel is convolved with its own filter, yielding
depthwise convolution. This is the building block of
MobileNet-style architectures. A subsequent groups=1 Conv2d
with kernel size 1 (pointwise convolution) forms a
depthwise-separable block.
Dilated (atrous) convolution. dilation > 1 enlarges the
receptive field of each kernel tap without increasing the number of
parameters or reducing the spatial resolution. Widely used in
semantic segmentation (DeepLab) and generative models.
padding="same". Mimics the SAME padding convention: output
is spatially identical in size to the input. When the required
total padding is odd, the extra pixel is added on the bottom/right
side (low side gets pad_total // 2). Requires stride=1.
Examples
Basic image convolution:
>>> import lucid
>>> import lucid.nn as nn
>>> conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)
>>> x = lucid.zeros(8, 3, 32, 32) # (N, C_in, H, W)
>>> y = conv(x)
>>> y.shape
(8, 64, 32, 32)
Depthwise separable convolution block:
>>> import lucid
>>> import lucid.nn as nn
>>> depthwise = nn.Conv2d(32, 32, kernel_size=3, padding=1, groups=32)
>>> pointwise = nn.Conv2d(32, 64, kernel_size=1)
>>> x = lucid.zeros(4, 32, 16, 16)
>>> y = pointwise(depthwise(x))
>>> y.shape
(4, 64, 16, 16)
Dilated convolution (receptive field 9×9 with only 3×3 parameters):
>>> import lucid
>>> import lucid.nn as nn
>>> dilated = nn.Conv2d(1, 1, kernel_size=3, padding=4, dilation=4)
>>> x = lucid.zeros(1, 1, 16, 16)
>>> y = dilated(x)
>>> y.shape
(1, 1, 16, 16)
Convolution without bias:
>>> import lucid
>>> import lucid.nn as nn
>>> conv_no_bias = nn.Conv2d(16, 32, kernel_size=1, bias=False)
>>> x = lucid.zeros(2, 16, 8, 8)
>>> y = conv_no_bias(x)
>>> y.shape
(2, 32, 8, 8)Methods (3)
__init__
→None__init__(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d | str = 0, dilation: _Size2d = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device: DeviceLike = None, dtype: DTypeLike = None)Initialise the Conv2d module. See the class docstring for parameter semantics.
forward
→Tensorforward(x: Tensor)Apply the convolution to the input tensor.
Parameters
inputTensorReturns
TensorOutput tensor of shape with spatial dimensions determined by stride, padding, dilation, and kernel size.
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.