fn

conv3d

Tensor
conv3d(x: Tensor, weight: Tensor, bias: Tensor | None = None, stride: int | tuple[int, int, int] = 1, padding: int | tuple[int, int, int] = 0, dilation: int | tuple[int, int, int] = 1, groups: int = 1)
source

3-D cross-correlation over batched 5-D input.

Extends conv2d to volumetric data (depth × height × width). Standard in medical imaging (CT, MRI), video understanding (I3D, 3D-ResNet, SlowFast) and any setting where the input has three spatial axes. As with the 1-D and 2-D variants, this is technically cross-correlation rather than strict convolution.

Parameters

xTensor
Input of shape (N, C_in, D, H, W).
weightTensor
Filters of shape (C_out, C_in/groups, kD, kH, kW).
biasTensor= None
Per-output-channel bias of shape (C_out,).
strideint or (int, int, int)= 1
Step between adjacent kernel positions per axis (default 1).
paddingint or (int, int, int)= 0
Zero padding on each spatial side.
dilationint or (int, int, int)= 1
Spacing between kernel taps (atrous convolution). Default 1.
groupsint= 1
Split channels into groups independent groups.

Returns

Tensor

Output of shape (N, C_out, D_out, H_out, W_out) where each spatial size obeys

Dout=D+2pDdD(kD1)1sD+1D_{\text{out}} = \left\lfloor \frac{D + 2 p_D - d_D (k_D - 1) - 1}{s_D} + 1 \right\rfloor

(analogously for H and W).

Notes

Math:

yi,co,d,h,w=bco+ci,m,n,pwco,ci,m,n,pxi,ci,sDd+dDm,sHh+dHn,sWw+dWpy_{i,\,c_o,\,d,\,h,\,w} = b_{c_o} + \sum_{c_i,\,m,\,n,\,p} w_{c_o,\,c_i,\,m,\,n,\,p} \cdot x_{i,\,c_i,\,s_D d + d_D m,\,s_H h + d_H n,\,s_W w + d_W p}

3-D convolution has cubic kernel cost in k — for large kernels, consider factorised variants (e.g. (1, k, k) followed by (k, 1, 1)) which trade expressivity for compute.

Examples

>>> import lucid
>>> from lucid.nn.functional import conv3d
>>> x = lucid.randn(1, 1, 16, 32, 32)
>>> w = lucid.randn(4, 1, 3, 3, 3)
>>> y = conv3d(x, w, padding=1)
>>> y.shape
(1, 4, 16, 32, 32)