class

Conv3d

extendsModule
Conv3d(in_channels: int, out_channels: int, kernel_size: _Size3d, stride: _Size3d = 1, padding: _Size3d | str = 0, dilation: _Size3d = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device: DeviceLike = None, dtype: DTypeLike = None)
source

Applies a 3D convolution over volumetric data (e.g. video or medical scans).

Computes the 3D cross-correlation of the input with a bank of learnable 3D filters:

y[n,cout,d,h,w]=cin=0Cin/g1kd,kh,kwx ⁣[n,  cin,  dsd+kddd,  hsh+khdh,  wsw+kwdw]W ⁣[cout,  cin,  kd,  kh,  kw]+b ⁣[cout]y[n, c_{\text{out}}, d, h, w] = \sum_{c_{\text{in}}=0}^{C_{\text{in}}/g - 1} \sum_{k_d, k_h, k_w} x\!\left[n,\; c_{\text{in}},\; d \cdot s_d + k_d \cdot d_d,\; h \cdot s_h + k_h \cdot d_h,\; w \cdot s_w + k_w \cdot d_w\right] \cdot W\!\left[c_{\text{out}},\; c_{\text{in}},\; k_d,\; k_h,\; k_w\right] + b\!\left[c_{\text{out}}\right]

Parameters

in_channelsint
Number of channels in the input volume.
out_channelsint
Number of channels produced by the convolution.
kernel_sizeint or tuple[int, int, int]
Size of the 3D convolving kernel (K_D, K_H, K_W). A single int is broadcast to all three dimensions.
strideint or tuple[int, int, int]= 1
Stride along each spatial dimension. Default: 1.
paddingint, tuple[int, int, int], or str= 0
Zero-padding added on both sides along each spatial dimension. Accepts "same" (requires stride=1) or "valid". Default: 0.
dilationint or tuple[int, int, int]= 1
Spacing between kernel elements. Default: 1.
groupsint= 1
Number of blocked connections. groups = in_channels gives depthwise 3D convolution. Default: 1.
biasbool= True
If True, adds a learnable bias. Default: True.
padding_modestr= 'zeros'
"zeros", "reflect", "replicate", or "circular". Default: "zeros".
deviceDeviceLike= None
Device on which to allocate parameters. Default: None.
dtypeDTypeLike= None
Data type for the parameters. Default: None.

Attributes

weightParameter
Learnable filter tensor of shape (out_channels, in_channels // groups, K_D, K_H, K_W). Initialized with Kaiming uniform: fan_in=CingKDKHKW,WU ⁣[6fan_in,  6fan_in]\text{fan\_in} = \frac{C_{\text{in}}}{g} \cdot K_D \cdot K_H \cdot K_W, \quad W \sim \mathcal{U}\!\left[ -\sqrt{\tfrac{6}{\text{fan\_in}}},\; \sqrt{\tfrac{6}{\text{fan\_in}}} \right]
biasParameter or None
Learnable bias of shape (out_channels,), or None.

Notes

Input: (N,Cin,D,H,W)(N, C_{\text{in}}, D, H, W) Output: (N,Cout,Dout,Hout,Wout)(N, C_{\text{out}}, D_{\text{out}}, H_{\text{out}}, W_{\text{out}}) where

$$

X_{\text{out}} = \left\lfloor \frac{X + 2p_x - d_x(K_X - 1) - 1}{s_x} + 1 \right\rfloor \quad \text{for } X \in {D, H, W}

Typical use cases. Conv3d is the standard building block for video understanding (3D ResNets, SlowFast), medical image analysis (CT/MRI volumetric segmentation), and point-cloud processing. It is computationally heavier than Conv2d by a factor of roughly KDK_D per layer; factorised (2+1)D convolutions are a common approximation.

Memory. A single 3D feature map can be large; consider groups > 1 or smaller kernel_size when memory is a concern.

Examples

Basic volumetric convolution:
>>> import lucid
>>> import lucid.nn as nn
>>> conv3 = nn.Conv3d(in_channels=1, out_channels=16,
...                   kernel_size=3, padding=1)
>>> x = lucid.zeros(2, 1, 16, 32, 32)   # (N, C, D, H, W)
>>> y = conv3(x)
>>> y.shape
(2, 16, 16, 32, 32)
Strided 3D convolution for spatial downsampling:
>>> import lucid
>>> import lucid.nn as nn
>>> conv3_stride = nn.Conv3d(16, 32, kernel_size=3, stride=2, padding=1)
>>> x = lucid.zeros(2, 16, 16, 32, 32)
>>> y = conv3_stride(x)
>>> y.shape
(2, 32, 8, 16, 16)

Methods (3)

dunder

__init__

None
__init__(in_channels: int, out_channels: int, kernel_size: _Size3d, stride: _Size3d = 1, padding: _Size3d | str = 0, dilation: _Size3d = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device: DeviceLike = None, dtype: DTypeLike = None)
source

Initialise the Conv3d module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor)
source

Apply the convolution to the input tensor.

Parameters

inputTensor
Input tensor of shape (N,Cin,)(N, C_{\text{in}}, *).

Returns

Tensor

Output tensor of shape (N,Cout,)(N, C_{\text{out}}, *) with spatial dimensions determined by stride, padding, dilation, and kernel size.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.