class

Unfold

extendsModule
Unfold(kernel_size: _Size2d, dilation: _Size2d = 1, padding: _Size2d = 0, stride: _Size2d = 1)
source

Extract sliding local blocks (patches) from a batched 4-D input tensor.

Unfold performs the im2col operation: it tiles a sliding window of shape (kH,kW)(k_H, k_W) across the spatial dimensions of the input and stacks all window contents into columns, producing a 3-D output.

input:(N,C,H,W)    output:(N,CkHkW,L)\text{input:} \quad (N,\, C,\, H,\, W) \;\longrightarrow\; \text{output:} \quad (N,\, C \cdot k_H \cdot k_W,\, L)

where LL is the total number of windows (blocks):

L=H+2pHdH(kH1)1sH+1×W+2pWdW(kW1)1sW+1L = \left\lfloor \frac{H + 2p_H - d_H(k_H - 1) - 1}{s_H} + 1 \right\rfloor \times \left\lfloor \frac{W + 2p_W - d_W(k_W - 1) - 1}{s_W} + 1 \right\rfloor

The Fold module performs the inverse operation (col2im).

Parameters

kernel_sizeint or tuple[int, int]
Size of the sliding window (kH,kW)(k_H, k_W). A single int is broadcast to both dimensions.
dilationint or tuple[int, int]= 1
Spacing between kernel elements (dH,dW)(d_H, d_W) (default 1). Dilation > 1 corresponds to an atrous (dilated) window.
paddingint or tuple[int, int]= 0
Zero-padding added to both sides of each spatial dimension (pH,pW)(p_H, p_W) (default 0).
strideint or tuple[int, int]= 1
Stride of the sliding window (sH,sW)(s_H, s_W) (default 1).

Attributes

kernel_sizeint or tuple[int, int]
Stored value of the kernel_size constructor argument.
dilationint or tuple[int, int]
Stored value of the dilation constructor argument.
paddingint or tuple[int, int]
Stored value of the padding constructor argument.
strideint or tuple[int, int]
Stored value of the stride constructor argument.

Notes

  • Input: (N,C,H,W)(N, C, H, W).
  • Output: (N,CkHkW,L)(N, C \cdot k_H \cdot k_W, L).
  • This module wraps nn.functional.unfold.
  • A manual convolution can be implemented as (weight.view(C_out, -1) @ Unfold(kH, kW)(x)).view(N, C_out, L_H, L_W).
  • In Vision Transformers (ViT) Unfold is used to extract non-overlapping patches before the patch-embedding linear layer (stride == kernel_size).

Examples

**Patch extraction for a Vision Transformer-style encoder:**
>>> import lucid
>>> import lucid.nn as nn
>>>
>>> patch_size = 16
>>> unfold = nn.Unfold(kernel_size=patch_size, stride=patch_size)
>>> x = lucid.zeros(2, 3, 224, 224)
>>> patches = unfold(x)
>>> patches.shape
(2, 768, 196)   # 768 = 3*16*16, 196 = (224//16)^2
**Im2col for manual convolution:**
>>> import lucid
>>> import lucid.nn as nn
>>>
>>> unfold = nn.Unfold(kernel_size=3, padding=1)
>>> x = lucid.randn(1, 1, 5, 5)
>>> cols = unfold(x)
>>> cols.shape
(1, 9, 25)   # 9 = 1*3*3, 25 = 5*5 windows (stride=1, pad=1)

Methods (3)

dunder

__init__

None
__init__(kernel_size: _Size2d, dilation: _Size2d = 1, padding: _Size2d = 0, stride: _Size2d = 1)
source

Initialise the Unfold module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor)
source

Flatten (or unflatten) the specified dimensions of the input.

Parameters

inputTensor
Input tensor.

Returns

Tensor

Tensor with the configured dimensions flattened or unflattened.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.