class
Unfold
extends
ModuleUnfold(kernel_size: _Size2d, dilation: _Size2d = 1, padding: _Size2d = 0, stride: _Size2d = 1)Extract sliding local blocks (patches) from a batched 4-D input tensor.
Unfold performs the im2col operation: it tiles a sliding window of
shape across the spatial dimensions of the input and
stacks all window contents into columns, producing a 3-D output.
where is the total number of windows (blocks):
The Fold module performs the inverse operation (col2im).
Parameters
kernel_sizeint or tuple[int, int]Size of the sliding window . A single
int
is broadcast to both dimensions.dilationint or tuple[int, int]= 1Spacing between kernel elements (default
1).
Dilation > 1 corresponds to an atrous (dilated) window.paddingint or tuple[int, int]= 0Zero-padding added to both sides of each spatial dimension
(default
0).strideint or tuple[int, int]= 1Stride of the sliding window (default
1).Attributes
kernel_sizeint or tuple[int, int]Stored value of the
kernel_size constructor argument.dilationint or tuple[int, int]Stored value of the
dilation constructor argument.paddingint or tuple[int, int]Stored value of the
padding constructor argument.strideint or tuple[int, int]Stored value of the
stride constructor argument.Notes
- Input: .
- Output: .
- This module wraps
nn.functional.unfold. - A manual convolution can be implemented as
(weight.view(C_out, -1) @ Unfold(kH, kW)(x)).view(N, C_out, L_H, L_W). - In Vision Transformers (ViT)
Unfoldis used to extract non-overlapping patches before the patch-embedding linear layer (stride == kernel_size).
Examples
**Patch extraction for a Vision Transformer-style encoder:**
>>> import lucid
>>> import lucid.nn as nn
>>>
>>> patch_size = 16
>>> unfold = nn.Unfold(kernel_size=patch_size, stride=patch_size)
>>> x = lucid.zeros(2, 3, 224, 224)
>>> patches = unfold(x)
>>> patches.shape
(2, 768, 196) # 768 = 3*16*16, 196 = (224//16)^2
**Im2col for manual convolution:**
>>> import lucid
>>> import lucid.nn as nn
>>>
>>> unfold = nn.Unfold(kernel_size=3, padding=1)
>>> x = lucid.randn(1, 1, 5, 5)
>>> cols = unfold(x)
>>> cols.shape
(1, 9, 25) # 9 = 1*3*3, 25 = 5*5 windows (stride=1, pad=1)Methods (3)
dunder
__init__
→None__init__(kernel_size: _Size2d, dilation: _Size2d = 1, padding: _Size2d = 0, stride: _Size2d = 1)Initialise the Unfold module. See the class docstring for parameter semantics.
fn
forward
→Tensorforward(x: Tensor)Flatten (or unflatten) the specified dimensions of the input.
Parameters
inputTensorInput tensor.
Returns
TensorTensor with the configured dimensions flattened or unflattened.
fn
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.