class

Unfold

extendsModule

Unfold(kernel_size: _Size2d, dilation: _Size2d = 1, padding: _Size2d = 0, stride: _Size2d = 1)

source edit

Implementing kernel

C++UnfoldBackwardclass

Extract sliding local blocks (patches) from a batched 4-D input tensor.

Unfold performs the im2col operation: it tiles a sliding window of shape $(k_H, k_W)$ across the spatial dimensions of the input and stacks all window contents into columns, producing a 3-D output.

\text{input:} \quad (N,\, C,\, H,\, W) \;\longrightarrow\; \text{output:} \quad (N,\, C \cdot k_H \cdot k_W,\, L)

where $L$ is the total number of windows (blocks):

L = \left\lfloor \frac{H + 2p_H - d_H(k_H - 1) - 1}{s_H} + 1 \right\rfloor \times \left\lfloor \frac{W + 2p_W - d_W(k_W - 1) - 1}{s_W} + 1 \right\rfloor

The Fold module performs the inverse operation (col2im).

Parameters

kernel_sizeint or tuple[int, int]

Size of the sliding window

(k_H, k_W)

. A single int is broadcast to both dimensions.

dilationint or tuple[int, int]= 1

Spacing between kernel elements

(d_H, d_W)

(default 1). Dilation > 1 corresponds to an atrous (dilated) window.

paddingint or tuple[int, int]= 0

Zero-padding added to both sides of each spatial dimension

(p_H, p_W)

(default 0).

strideint or tuple[int, int]= 1

Stride of the sliding window

(s_H, s_W)

(default 1).

Attributes

kernel_sizeint or tuple[int, int]

Stored value of the kernel_size constructor argument.

dilationint or tuple[int, int]

Stored value of the dilation constructor argument.

paddingint or tuple[int, int]

Stored value of the padding constructor argument.

strideint or tuple[int, int]

Stored value of the stride constructor argument.

Notes

Input: $(N, C, H, W)$ .
Output: $(N, C \cdot k_H \cdot k_W, L)$ .

This module wraps nn.functional.unfold.
A manual convolution can be implemented as (weight.view(C_out, -1) @ Unfold(kH, kW)(x)).view(N, C_out, L_H, L_W).
In Vision Transformers (ViT) Unfold is used to extract non-overlapping patches before the patch-embedding linear layer (stride == kernel_size).

Examples

**Patch extraction for a Vision Transformer-style encoder:**
>>> import lucid
>>> import lucid.nn as nn
>>>
>>> patch_size = 16
>>> unfold = nn.Unfold(kernel_size=patch_size, stride=patch_size)
>>> x = lucid.zeros(2, 3, 224, 224)
>>> patches = unfold(x)
>>> patches.shape
(2, 768, 196)   # 768 = 3*16*16, 196 = (224//16)^2
**Im2col for manual convolution:**
>>> import lucid
>>> import lucid.nn as nn
>>>
>>> unfold = nn.Unfold(kernel_size=3, padding=1)
>>> x = lucid.randn(1, 1, 5, 5)
>>> cols = unfold(x)
>>> cols.shape
(1, 9, 25)   # 9 = 1*3*3, 25 = 5*5 windows (stride=1, pad=1)

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(kernel_size: _Size2d, dilation: _Size2d = 1, padding: _Size2d = 0, stride: _Size2d = 1)

source edit

Initialise the Unfold module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor)

source edit

Flatten (or unflatten) the specified dimensions of the input.

Parameters

inputTensor

Input tensor.

Returns

Tensor

Tensor with the configured dimensions flattened or unflattened.

**Patch extraction for a Vision Transformer-style encoder:** >>> import lucid >>> import lucid.nn as nn >>> >>> patch_size = 16 >>> unfold = nn.Unfold(kernel_size=patch_size, stride=patch_size) >>> x = lucid.zeros(2, 3, 224, 224) >>> patches = unfold(x) >>> patches.shape (2, 768, 196) # 768 = 3*16*16, 196 = (224//16)^2 **Im2col for manual convolution:** >>> import lucid >>> import lucid.nn as nn >>> >>> unfold = nn.Unfold(kernel_size=3, padding=1) >>> x = lucid.randn(1, 1, 5, 5) >>> cols = unfold(x) >>> cols.shape (1, 9, 25) # 9 = 1*3*3, 25 = 5*5 windows (stride=1, pad=1)