fn
fold
→Tensorfold(x: Tensor, output_size: tuple[int, int], kernel_size: int | tuple[int, int], dilation: int | tuple[int, int] = 1, padding: int | tuple[int, int] = 0, stride: int | tuple[int, int] = 1)Combine an array of sliding local blocks back into an image (col2im).
Inverse of unfold. Given an (N, C·kH·kW, L) tensor of
column-vectorised blocks, scatter-adds each block into its place in
a fresh (N, C, outH, outW) canvas. Overlapping positions are
summed — which is precisely what is needed for the gradient of
convolution and for transposed-convolution-style upsampling.
Parameters
xTensorBlock tensor of shape
(N, C·kH·kW, L).output_size(int, int)Spatial size
(outH, outW) of the destination canvas.kernel_sizeint or (int, int)Spatial size of each block. Must match the
kernel_size used
to produce x.dilationint or (int, int)= 1Spacing between elements within a block.
paddingint or (int, int)= 0Implicit zero padding to subtract from the destination canvas
(mirrors the
padding argument of unfold).strideint or (int, int)= 1Step between block positions on the destination canvas.
Returns
TensorReconstructed tensor of shape (N, C, outH, outW).
Notes
CPU path uses a scatter-add loop; the GPU path emits a single
scatter_add_axis over precomputed flat destination indices, with
no host round-trip. In conjunction with unfold,
fold lets you implement arbitrary local linear operators as
plain matrix multiplications.
Examples
>>> import lucid
>>> from lucid.nn.functional import unfold, fold
>>> x = lucid.randn(1, 3, 8, 8)
>>> u = unfold(x, kernel_size=3, stride=3)
>>> y = fold(u, output_size=(8, 8), kernel_size=3, stride=3)
>>> y.shape
(1, 3, 8, 8)