GRU
_CellNamingMixinModuleGRU(input_size: int, hidden_size: int, num_layers: int = 1, bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)Multi-layer Gated Recurrent Unit (GRU) recurrent layer.
Applies a stack of GRU cells over an input sequence. At each time
step the following equations are evaluated (see
GRUCell for the full derivation):
The output of layer is used as the input of layer
. When bidirectional=True two GRUs process the
sequence in opposite directions and their outputs are concatenated
along the feature axis at every time step.
Inter-layer dropout (probability dropout) is applied between
adjacent layers during training, but not after the final layer.
Parameters
input_sizeinthidden_sizeintnum_layersint= 11.biasbool= TrueFalse, all bias parameters are omitted. Default: True.batch_firstbool= FalseTrue the input/output tensors have shape
(N, L, *) instead of the default (L, N, *).
Default: False.dropoutfloat= 0.00.0 disables dropout. Default: 0.0.bidirectionalbool= FalseTrue, a bidirectional GRU is used; the output feature
dimension becomes 2 * hidden_size. Default: False.deviceDeviceLike= NonedtypeDTypeLike= NoneNotes
- Input
x:(L, N, input_size)or(N, L, input_size)whenbatch_first=True. - h_0 (optional):
(D * num_layers, N, H)whereD = 2if bidirectional else1. Defaults to zeros. - output:
(L, N, D * H)or(N, L, D * H). - h_n:
(D * num_layers, N, H)— hidden state at the final time step for each layer and direction.
Internally this module stores one GRUCell sub-module per
layer per direction (named cell_l{layer} and
cell_l{layer}_reverse). The _CellNamingMixin flattens
these into weight_ih_l{layer} etc. for checkpoint compatibility
with the reference framework.
flatten_parameters is a no-op retained for API
compatibility.
PackedSequence input is not yet supported.
GRUCell : Single time-step GRU cell. LSTM : Long Short-Term Memory (carries a separate cell state). RNN : Vanilla Elman RNN (no gating).
Examples
Two-layer GRU, batch-first:
>>> import lucid, lucid.nn as nn
>>> gru = nn.GRU(8, 16, num_layers=2, batch_first=True)
>>> x = lucid.randn(2, 5, 8) # (N=2, L=5, I=8)
>>> out, h_n = gru(x)
>>> out.shape, h_n.shape
((2, 5, 16), (2, 2, 16))
Bidirectional GRU:
>>> gru_bi = nn.GRU(8, 16, bidirectional=True, batch_first=True)
>>> x2 = lucid.randn(3, 10, 8)
>>> out2, h_n2 = gru_bi(x2)
>>> out2.shape # D*H = 2*16 = 32
(3, 10, 32)
>>> h_n2.shape # D*num_layers = 2*1 = 2
(2, 3, 16)Methods (4)
__init__
→None__init__(input_size: int, hidden_size: int, num_layers: int = 1, bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)Initialise the GRU module. See the class docstring for parameter semantics.
flatten_parameters
→Noneflatten_parameters()No-op for API compatibility (see LSTM.flatten_parameters).
forward
→Tensor or tuple of Tensorforward(x: Tensor, hx: Tensor | None = None)Run the recurrent forward pass.
Parameters
xTensorhxTensor= NoneReturns
Tensor or tuple of TensorOutput and (optionally) the new hidden state; see the class docstring.
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.