ConvTranspose2d
ModuleConvTranspose2d(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d = 0, output_padding: _Size2d = 0, groups: int = 1, bias: bool = True, dilation: _Size2d = 1, device: DeviceLike = None, dtype: DTypeLike = None)Applies a 2D transposed convolution (fractionally-strided convolution).
Also known as a fractionally-strided convolution, this module is
commonly used as the spatial upsampling primitive in generative
models (VAEs, GANs), dense prediction decoders (U-Net), and
super-resolution networks. It is the transpose (adjoint) of
Conv2d.
The output spatial dimensions satisfy:
W_{\text{out}} = (W_{\text{in}} - 1) \cdot s_w - 2p_w
+ d_w(K_W - 1) + p^{\text{out}}_w + 1
Parameters
in_channelsintout_channelsintkernel_sizeint or tuple[int, int]strideint or tuple[int, int]= 1> 1 upsample the spatial dimensions.
Default: 1.paddingint or tuple[int, int]= 0dilation * (kernel_size - 1) - padding zero-padding is added
to both sides of each spatial dimension. Default: 0.output_paddingint or tuple[int, int]= 00 <= output_padding < max(stride, dilation) along each axis.
Default: 0.groupsint= 11.biasbool= TrueTrue, adds a learnable bias. Default: True.dilationint or tuple[int, int]= 11.deviceDeviceLike= NoneNone.dtypeDTypeLike= NoneNone.Attributes
weightParameter(in_channels, out_channels // groups, K_H, K_W).
The leading axis is in_channels — the reverse of
Conv2d.
Initialized with Kaiming uniform ().biasParameter or None(out_channels,), or None.Notes
Input: Output: as given by the formulas above.
Checkerboard artefacts. Transposed convolutions with
stride > 1 can produce characteristic checkerboard patterns in
the output when kernel size is not divisible by stride. A common
mitigation is to use kernel_size = stride * n for some integer
n, or to replace the transposed conv with bilinear upsampling
followed by a regular convolution.
output_padding. When stride > 1 the output size formula is
not injective: multiple input sizes map to the same output size.
output_padding resolves this ambiguity and must be set
consistently with the encoder stride to reconstruct the exact spatial
dimensions.
Examples
VAE decoder: upsample 4×4 latent to 8×8:
>>> import lucid
>>> import lucid.nn as nn
>>> decoder = nn.ConvTranspose2d(
... in_channels=128, out_channels=64,
... kernel_size=4, stride=2, padding=1
... )
>>> z = lucid.zeros(8, 128, 4, 4)
>>> y = decoder(z)
>>> y.shape
(8, 64, 8, 8)
U-Net upsampling block:
>>> import lucid
>>> import lucid.nn as nn
>>> up = nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2)
>>> x = lucid.zeros(4, 256, 16, 16)
>>> y = up(x)
>>> y.shape
(4, 128, 32, 32)Methods (3)
__init__
→None__init__(in_channels: int, out_channels: int, kernel_size: _Size2d, stride: _Size2d = 1, padding: _Size2d = 0, output_padding: _Size2d = 0, groups: int = 1, bias: bool = True, dilation: _Size2d = 1, device: DeviceLike = None, dtype: DTypeLike = None)Initialise the ConvTranspose2d module. See the class docstring for parameter semantics.
forward
→Tensorforward(x: Tensor)Apply the convolution to the input tensor.
Parameters
inputTensorReturns
TensorOutput tensor of shape with spatial dimensions determined by stride, padding, dilation, and kernel size.
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.