fold

→Tensor

fold(x: Tensor, output_size: tuple[int, int], kernel_size: int | tuple[int, int], dilation: int | tuple[int, int] = 1, padding: int | tuple[int, int] = 0, stride: int | tuple[int, int] = 1)

source edit

Implementing kernel

C++fold_opfree fn

Combine an array of sliding local blocks back into an image (col2im).

Inverse of unfold. Given an (N, C·kH·kW, L) tensor of column-vectorised blocks, scatter-adds each block into its place in a fresh (N, C, outH, outW) canvas. Overlapping positions are summed — which is precisely what is needed for the gradient of convolution and for transposed-convolution-style upsampling.

Parameters

xTensor

Block tensor of shape (N, C·kH·kW, L).

output_size(int, int)

Spatial size (outH, outW) of the destination canvas.

kernel_sizeint or (int, int)

Spatial size of each block. Must match the kernel_size used to produce x.

dilationint or (int, int)= 1

Spacing between elements within a block.

paddingint or (int, int)= 0

Implicit zero padding to subtract from the destination canvas (mirrors the padding argument of unfold).

strideint or (int, int)= 1

Step between block positions on the destination canvas.

Returns

Tensor

Reconstructed tensor of shape (N, C, outH, outW).

Notes

CPU path uses a scatter-add loop; the GPU path emits a single scatter_add_axis over precomputed flat destination indices, with no host round-trip. In conjunction with unfold, fold lets you implement arbitrary local linear operators as plain matrix multiplications.

Examples

>>> import lucid
>>> from lucid.nn.functional import unfold, fold
>>> x = lucid.randn(1, 3, 8, 8)
>>> u = unfold(x, kernel_size=3, stride=3)
>>> y = fold(u, output_size=(8, 8), kernel_size=3, stride=3)
>>> y.shape
(1, 3, 8, 8)

Used by 2