class

ConvTranspose2d

extendsModule

ConvTranspose2d(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d = 0, output_padding: _Size2d = 0, groups: int = 1, bias: bool = True, dilation: _Size2d = 1, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Implementing kernel

C++ConvTransposeNdBackwardclass

Applies a 2D transposed convolution (fractionally-strided convolution).

Also known as a fractionally-strided convolution, this module is commonly used as the spatial upsampling primitive in generative models (VAEs, GANs), dense prediction decoders (U-Net), and super-resolution networks. It is the transpose (adjoint) of Conv2d.

The output spatial dimensions satisfy:

H_{\text{out}} = (H_{\text{in}} - 1) \cdot s_h - 2p_h + d_h(K_H - 1) + p^{\text{out}}_h + 1 W_{\text{out}} = (W_{\text{in}} - 1) \cdot s_w - 2p_w + d_w(K_W - 1) + p^{\text{out}}_w + 1

Parameters

in_channelsint

Number of channels in the input feature map.

out_channelsint

Number of channels produced by the transposed convolution.

kernel_sizeint or tuple[int, int]

Size of the convolving kernel.

strideint or tuple[int, int]= 1

Stride. Values > 1 upsample the spatial dimensions. Default: 1.

paddingint or tuple[int, int]= 0

dilation * (kernel_size - 1) - padding zero-padding is added to both sides of each spatial dimension. Default: 0.

output_paddingint or tuple[int, int]= 0

Additional size added to one side of each spatial dimension of the output. Must satisfy 0 <= output_padding < max(stride, dilation) along each axis. Default: 0.

groupsint= 1

Number of blocked connections. Default: 1.

biasbool= True

If True, adds a learnable bias. Default: True.

dilationint or tuple[int, int]= 1

Spacing between kernel elements. Default: 1.

deviceDeviceLike= None

Device on which to allocate parameters. Default: None.

dtypeDTypeLike= None

Data type for the parameters. Default: None.

Attributes

weightParameter

Learnable kernel of shape (in_channels, out_channels // groups, K_H, K_W). The leading axis is in_channels — the reverse of Conv2d. Initialized with Kaiming uniform (

a = \sqrt{5}

biasParameter or None

Learnable bias of shape (out_channels,), or None.

Notes

Input: $(N, C_{\text{in}}, H, W)$ Output: $(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})$ as given by the formulas above.

Checkerboard artefacts. Transposed convolutions with stride > 1 can produce characteristic checkerboard patterns in the output when kernel size is not divisible by stride. A common mitigation is to use kernel_size = stride * n for some integer n, or to replace the transposed conv with bilinear upsampling followed by a regular convolution.

output_padding. When stride > 1 the output size formula is not injective: multiple input sizes map to the same output size. output_padding resolves this ambiguity and must be set consistently with the encoder stride to reconstruct the exact spatial dimensions.

Examples

VAE decoder: upsample 4×4 latent to 8×8:
>>> import lucid
>>> import lucid.nn as nn
>>> decoder = nn.ConvTranspose2d(
...     in_channels=128, out_channels=64,
...     kernel_size=4, stride=2, padding=1
... )
>>> z = lucid.zeros(8, 128, 4, 4)
>>> y = decoder(z)
>>> y.shape
(8, 64, 8, 8)
U-Net upsampling block:
>>> import lucid
>>> import lucid.nn as nn
>>> up = nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2)
>>> x = lucid.zeros(4, 256, 16, 16)
>>> y = up(x)
>>> y.shape
(4, 128, 32, 32)

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d = 0, output_padding: _Size2d = 0, groups: int = 1, bias: bool = True, dilation: _Size2d = 1, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Initialise the ConvTranspose2d module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor)

source edit

Apply the convolution to the input tensor.

Parameters

inputTensor

Input tensor of shape

(N, C_{\text{in}}, *)

Returns

Tensor

Output tensor of shape $(N, C_{\text{out}}, *)$ with spatial dimensions determined by stride, padding, dilation, and kernel size.

ConvTranspose2d(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d = 0, output_padding: _Size2d = 0, groups: int = 1, bias: bool = True, dilation: _Size2d = 1, device: DeviceLike = None, dtype: DTypeLike = None)

VAE decoder: upsample 4×4 latent to 8×8: >>> import lucid >>> import lucid.nn as nn >>> decoder = nn.ConvTranspose2d( ... in_channels=128, out_channels=64, ... kernel_size=4, stride=2, padding=1 ... ) >>> z = lucid.zeros(8, 128, 4, 4) >>> y = decoder(z) >>> y.shape (8, 64, 8, 8) U-Net upsampling block: >>> import lucid >>> import lucid.nn as nn >>> up = nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2) >>> x = lucid.zeros(4, 256, 16, 16) >>> y = up(x) >>> y.shape (4, 128, 32, 32)

__init__(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d = 0, output_padding: _Size2d = 0, groups: int = 1, bias: bool = True, dilation: _Size2d = 1, device: DeviceLike = None, dtype: DTypeLike = None)