class

Conv2d

extendsModule

Conv2d(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d | str = 0, dilation: _Size2d = 1, groups: int = 1, bias: bool = True, padding_mode: PaddingMode = 'zeros', device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Implementing kernel

C++ConvNdBackwardclass

Applies a 2D convolution over a batch of images or feature maps.

This module computes the 2D cross-correlation between the input and a set of learnable filters. For a single output channel and a single input channel the operation is:

y[n, c_{\text{out}}, h, w] = \sum_{c_{\text{in}}=0}^{C_{\text{in}}/g - 1} \sum_{k_h=0}^{K_H-1} \sum_{k_w=0}^{K_W-1} x\!\left[n,\; c_{\text{in}},\; h \cdot s_h + k_h \cdot d_h,\; w \cdot s_w + k_w \cdot d_w\right] \cdot W\!\left[c_{\text{out}},\; c_{\text{in}},\; k_h,\; k_w\right] + b\!\left[c_{\text{out}}\right]

where $(s_h, s_w)$ is the stride, $(d_h, d_w)$ is the dilation factor, and $g$ is the number of groups.

Parameters

in_channelsint

Number of channels in the input image.

out_channelsint

Number of channels produced by the convolution.

kernel_sizeint or tuple[int, int]

Size of the convolving kernel. A single int is broadcast to (kernel_size, kernel_size).

strideint or tuple[int, int]= 1

Stride of the convolution. Default: 1.

paddingint, tuple[int, int], or str= 0

Padding added to all four sides of the input. "same" pads so the output spatial size equals ceil(H_in / s) (requires stride=1); "valid" means no padding. Default: 0.

dilationint or tuple[int, int]= 1

Spacing between kernel elements (atrous / dilated convolution). Default: 1.

groupsint= 1

Number of blocked connections from input channels to output channels. Both in_channels and out_channels must be divisible by groups. groups = in_channels gives depthwise convolution. Default: 1.

biasbool= True

If True, adds a learnable bias to the output. Default: True.

padding_modestr= 'zeros'

"zeros", "reflect", "replicate", or "circular". Default: "zeros".

deviceDeviceLike= None

Device on which to allocate parameters. Default: None.

dtypeDTypeLike= None

Data type for the parameters. Default: None.

Attributes

weightParameter

Learnable filter tensor of shape (out_channels, in_channels // groups, K_H, K_W). Initialized with Kaiming uniform using

a = \sqrt{5}

\text{fan\_in} = \frac{C_{\text{in}}}{g} \cdot K_H \cdot K_W, \quad W \sim \mathcal{U}\!\left[ -\sqrt{\tfrac{6}{\text{fan\_in}}},\; \sqrt{\tfrac{6}{\text{fan\_in}}} \right]

biasParameter or None

Learnable bias of shape (out_channels,), or None.

Notes

Input: $(N, C_{\text{in}}, H, W)$ Output: $(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})$ where

H_{\text{out}} = \left\lfloor \frac{H + 2p_h - d_h(K_H - 1) - 1}{s_h} + 1 \right\rfloor, \quad W_{\text{out}} = \left\lfloor \frac{W + 2p_w - d_w(K_W - 1) - 1}{s_w} + 1 \right\rfloor

Groups and depthwise convolution. When groups = in_channels each input channel is convolved with its own filter, yielding depthwise convolution. This is the building block of MobileNet-style architectures. A subsequent groups=1 Conv2d with kernel size 1 (pointwise convolution) forms a depthwise-separable block.

Dilated (atrous) convolution. dilation > 1 enlarges the receptive field of each kernel tap without increasing the number of parameters or reducing the spatial resolution. Widely used in semantic segmentation (DeepLab) and generative models.

padding="same". Mimics the SAME padding convention: output is spatially identical in size to the input. When the required total padding is odd, the extra pixel is added on the bottom/right side (low side gets pad_total // 2). Requires stride=1.

Examples

Basic image convolution:
>>> import lucid
>>> import lucid.nn as nn
>>> conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)
>>> x = lucid.zeros(8, 3, 32, 32)   # (N, C_in, H, W)
>>> y = conv(x)
>>> y.shape
(8, 64, 32, 32)
Depthwise separable convolution block:
>>> import lucid
>>> import lucid.nn as nn
>>> depthwise  = nn.Conv2d(32, 32, kernel_size=3, padding=1, groups=32)
>>> pointwise  = nn.Conv2d(32, 64, kernel_size=1)
>>> x = lucid.zeros(4, 32, 16, 16)
>>> y = pointwise(depthwise(x))
>>> y.shape
(4, 64, 16, 16)
Dilated convolution (receptive field 9×9 with only 3×3 parameters):
>>> import lucid
>>> import lucid.nn as nn
>>> dilated = nn.Conv2d(1, 1, kernel_size=3, padding=4, dilation=4)
>>> x = lucid.zeros(1, 1, 16, 16)
>>> y = dilated(x)
>>> y.shape
(1, 1, 16, 16)
Convolution without bias:
>>> import lucid
>>> import lucid.nn as nn
>>> conv_no_bias = nn.Conv2d(16, 32, kernel_size=1, bias=False)
>>> x = lucid.zeros(2, 16, 8, 8)
>>> y = conv_no_bias(x)
>>> y.shape
(2, 32, 8, 8)

Used by 2

Constructors

dunder

init

→None

__init__(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d | str = 0, dilation: _Size2d = 1, groups: int = 1, bias: bool = True, padding_mode: PaddingMode = 'zeros', device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Initialise the Conv2d module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor)

source edit

Apply the convolution to the input tensor.

Parameters

inputTensor

Input tensor of shape

(N, C_{\text{in}}, *)

Returns

Tensor

Output tensor of shape $(N, C_{\text{out}}, *)$ with spatial dimensions determined by stride, padding, dilation, and kernel size.

Conv2d(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d | str = 0, dilation: _Size2d = 1, groups: int = 1, bias: bool = True, padding_mode: PaddingMode = 'zeros', device: DeviceLike = None, dtype: DTypeLike = None)

Basic image convolution: >>> import lucid >>> import lucid.nn as nn >>> conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1) >>> x = lucid.zeros(8, 3, 32, 32) # (N, C_in, H, W) >>> y = conv(x) >>> y.shape (8, 64, 32, 32) Depthwise separable convolution block: >>> import lucid >>> import lucid.nn as nn >>> depthwise = nn.Conv2d(32, 32, kernel_size=3, padding=1, groups=32) >>> pointwise = nn.Conv2d(32, 64, kernel_size=1) >>> x = lucid.zeros(4, 32, 16, 16) >>> y = pointwise(depthwise(x)) >>> y.shape (4, 64, 16, 16) Dilated convolution (receptive field 9×9 with only 3×3 parameters): >>> import lucid >>> import lucid.nn as nn >>> dilated = nn.Conv2d(1, 1, kernel_size=3, padding=4, dilation=4) >>> x = lucid.zeros(1, 1, 16, 16) >>> y = dilated(x) >>> y.shape (1, 1, 16, 16) Convolution without bias: >>> import lucid >>> import lucid.nn as nn >>> conv_no_bias = nn.Conv2d(16, 32, kernel_size=1, bias=False) >>> x = lucid.zeros(2, 16, 8, 8) >>> y = conv_no_bias(x) >>> y.shape (2, 32, 8, 8)

__init__(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d | str = 0, dilation: _Size2d = 1, groups: int = 1, bias: bool = True, padding_mode: PaddingMode = 'zeros', device: DeviceLike = None, dtype: DTypeLike = None)