class

Conv2d

extendsModule
Conv2d(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d | str = 0, dilation: _Size2d = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device: DeviceLike = None, dtype: DTypeLike = None)
source

Applies a 2D convolution over a batch of images or feature maps.

This module computes the 2D cross-correlation between the input and a set of learnable filters. For a single output channel and a single input channel the operation is:

y[n,cout,h,w]=cin=0Cin/g1kh=0KH1kw=0KW1x ⁣[n,  cin,  hsh+khdh,  wsw+kwdw]W ⁣[cout,  cin,  kh,  kw]+b ⁣[cout]y[n, c_{\text{out}}, h, w] = \sum_{c_{\text{in}}=0}^{C_{\text{in}}/g - 1} \sum_{k_h=0}^{K_H-1} \sum_{k_w=0}^{K_W-1} x\!\left[n,\; c_{\text{in}},\; h \cdot s_h + k_h \cdot d_h,\; w \cdot s_w + k_w \cdot d_w\right] \cdot W\!\left[c_{\text{out}},\; c_{\text{in}},\; k_h,\; k_w\right] + b\!\left[c_{\text{out}}\right]

where (sh,sw)(s_h, s_w) is the stride, (dh,dw)(d_h, d_w) is the dilation factor, and gg is the number of groups.

Parameters

in_channelsint
Number of channels in the input image.
out_channelsint
Number of channels produced by the convolution.
kernel_sizeint or tuple[int, int]
Size of the convolving kernel. A single int is broadcast to (kernel_size, kernel_size).
strideint or tuple[int, int]= 1
Stride of the convolution. Default: 1.
paddingint, tuple[int, int], or str= 0
Padding added to all four sides of the input. "same" pads so the output spatial size equals ceil(H_in / s) (requires stride=1); "valid" means no padding. Default: 0.
dilationint or tuple[int, int]= 1
Spacing between kernel elements (atrous / dilated convolution). Default: 1.
groupsint= 1
Number of blocked connections from input channels to output channels. Both in_channels and out_channels must be divisible by groups. groups = in_channels gives depthwise convolution. Default: 1.
biasbool= True
If True, adds a learnable bias to the output. Default: True.
padding_modestr= 'zeros'
"zeros", "reflect", "replicate", or "circular". Default: "zeros".
deviceDeviceLike= None
Device on which to allocate parameters. Default: None.
dtypeDTypeLike= None
Data type for the parameters. Default: None.

Attributes

weightParameter
Learnable filter tensor of shape (out_channels, in_channels // groups, K_H, K_W). Initialized with Kaiming uniform using a=5a = \sqrt{5}: fan_in=CingKHKW,WU ⁣[6fan_in,  6fan_in]\text{fan\_in} = \frac{C_{\text{in}}}{g} \cdot K_H \cdot K_W, \quad W \sim \mathcal{U}\!\left[ -\sqrt{\tfrac{6}{\text{fan\_in}}},\; \sqrt{\tfrac{6}{\text{fan\_in}}} \right]
biasParameter or None
Learnable bias of shape (out_channels,), or None.

Notes

Input: (N,Cin,H,W)(N, C_{\text{in}}, H, W) Output: (N,Cout,Hout,Wout)(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}}) where

$$

H_{\text{out}} = \left\lfloor \frac{H + 2p_h - d_h(K_H - 1) - 1}{s_h} + 1 \right\rfloor, \quad W_{\text{out}} = \left\lfloor \frac{W + 2p_w - d_w(K_W - 1) - 1}{s_w} + 1 \right\rfloor

Groups and depthwise convolution. When groups = in_channels each input channel is convolved with its own filter, yielding depthwise convolution. This is the building block of MobileNet-style architectures. A subsequent groups=1 Conv2d with kernel size 1 (pointwise convolution) forms a depthwise-separable block.

Dilated (atrous) convolution. dilation > 1 enlarges the receptive field of each kernel tap without increasing the number of parameters or reducing the spatial resolution. Widely used in semantic segmentation (DeepLab) and generative models.

padding="same". Mimics the SAME padding convention: output is spatially identical in size to the input. When the required total padding is odd, the extra pixel is added on the bottom/right side (low side gets pad_total // 2). Requires stride=1.

Examples

Basic image convolution:
>>> import lucid
>>> import lucid.nn as nn
>>> conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)
>>> x = lucid.zeros(8, 3, 32, 32)   # (N, C_in, H, W)
>>> y = conv(x)
>>> y.shape
(8, 64, 32, 32)
Depthwise separable convolution block:
>>> import lucid
>>> import lucid.nn as nn
>>> depthwise  = nn.Conv2d(32, 32, kernel_size=3, padding=1, groups=32)
>>> pointwise  = nn.Conv2d(32, 64, kernel_size=1)
>>> x = lucid.zeros(4, 32, 16, 16)
>>> y = pointwise(depthwise(x))
>>> y.shape
(4, 64, 16, 16)
Dilated convolution (receptive field 9×9 with only 3×3 parameters):
>>> import lucid
>>> import lucid.nn as nn
>>> dilated = nn.Conv2d(1, 1, kernel_size=3, padding=4, dilation=4)
>>> x = lucid.zeros(1, 1, 16, 16)
>>> y = dilated(x)
>>> y.shape
(1, 1, 16, 16)
Convolution without bias:
>>> import lucid
>>> import lucid.nn as nn
>>> conv_no_bias = nn.Conv2d(16, 32, kernel_size=1, bias=False)
>>> x = lucid.zeros(2, 16, 8, 8)
>>> y = conv_no_bias(x)
>>> y.shape
(2, 32, 8, 8)

Methods (3)

dunder

__init__

None
__init__(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d | str = 0, dilation: _Size2d = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device: DeviceLike = None, dtype: DTypeLike = None)
source

Initialise the Conv2d module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor)
source

Apply the convolution to the input tensor.

Parameters

inputTensor
Input tensor of shape (N,Cin,)(N, C_{\text{in}}, *).

Returns

Tensor

Output tensor of shape (N,Cout,)(N, C_{\text{out}}, *) with spatial dimensions determined by stride, padding, dilation, and kernel size.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.