fn

conv2d

Tensor
conv2d(x: Tensor, weight: Tensor, bias: Tensor | None = None, stride: int | tuple[int, int] = 1, padding: int | tuple[int, int] = 0, dilation: int | tuple[int, int] = 1, groups: int = 1)
source

2-D cross-correlation over batched 4-D input.

Despite the name, computes cross-correlation rather than strict mathematical convolution (no kernel flip). The kernel slides over the input applying a learned linear combination at each spatial position; channel mixing happens through the in_channels dimension of weight. This is the workhorse op of modern image CNNs (ResNet, ConvNeXt, EfficientNet, ...).

Parameters

xTensor
Input of shape (N, C_in, H, W).
weightTensor
Filters of shape (C_out, C_in/groups, kH, kW).
biasTensor= None
Per-output-channel bias of shape (C_out,).
strideint or (int, int)= 1
Step between adjacent kernel positions (default 1).
paddingint or (int, int)= 0
Zero padding on each spatial side.
dilationint or (int, int)= 1
Spacing between kernel taps (atrous convolution). Default 1.
groupsint= 1
Split the channels into groups independent groups. Setting groups == C_in gives a depthwise convolution.

Returns

Tensor

Output of shape (N, C_out, H_out, W_out) where

Hout=H+2pHdH(kH1)1sH+1Wout=W+2pWdW(kW1)1sW+1\begin{aligned} H_{\text{out}} &= \left\lfloor \frac{H + 2 p_H - d_H (k_H - 1) - 1}{s_H} + 1 \right\rfloor \\ W_{\text{out}} &= \left\lfloor \frac{W + 2 p_W - d_W (k_W - 1) - 1}{s_W} + 1 \right\rfloor \end{aligned}

Notes

Math:

yi,co,h,w=bco+ci,m,nwco,ci,m,nxi,ci,sHh+dHm,sWw+dWny_{i,\,c_o,\,h,\,w} = b_{c_o} + \sum_{c_i, m, n} w_{c_o,\,c_i,\,m,\,n} \cdot x_{i,\,c_i,\,s_H h + d_H m,\,s_W w + d_W n}

Backward is well-known; gradients w.r.t. x, weight, and bias flow through automatically. groups > 1 yields grouped convolution (channel-blocks computed independently); groups == C_in plus C_out == C_in is depthwise convolution. Dilation enlarges the receptive field without inflating parameter count.

Examples

>>> import lucid
>>> from lucid.nn.functional import conv2d
>>> x = lucid.randn(1, 3, 32, 32)
>>> w = lucid.randn(8, 3, 3, 3)
>>> y = conv2d(x, w, stride=1, padding=1)
>>> y.shape
(1, 8, 32, 32)