class

GRUCell

extendsModule

GRUCell(input_size: int, hidden_size: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source

Single time-step Gated Recurrent Unit (GRU) cell.

Computes one recurrent update using the three-gate GRU equations:

\begin{aligned} r_t &= \sigma(W_{ir}\,x_t + b_{ir} + W_{hr}\,h_{t-1} + b_{hr}) & &\text{(reset gate)}\\ z_t &= \sigma(W_{iz}\,x_t + b_{iz} + W_{hz}\,h_{t-1} + b_{hz}) & &\text{(update gate)}\\ n_t &= \tanh\!\left(W_{in}\,x_t + b_{in} + r_t \odot (W_{hn}\,h_{t-1} + b_{hn})\right) & &\text{(new gate)}\\ h_t &= (1 - z_t) \odot h_{t-1} + z_t \odot n_t \end{aligned}

The reset gate $r_t$ controls how much of the previous hidden state leaks into the candidate $n_t$ ; setting it near zero makes the cell ignore past context. The update gate $z_t$ interpolates between the old hidden state and the candidate, allowing the cell to retain information over many steps without an explicit cell state.

The three gate weight matrices are stacked into single parameters of shape (3H, *) in gate order [r; z; n].

Parameters

input_sizeint

Number of features in the input vector

x_t

hidden_sizeint

Number of features in the hidden state

h_t

biasbool= True

If False, no bias terms are used. Default: True.

deviceDeviceLike= None

Device for weight allocation.

dtypeDTypeLike= None

Data type for weight tensors.

Attributes

weight_ihParameter, shape ``(3 * hidden_size, input_size)``

Stacked input–hidden weight matrices

[W_{ir}; W_{iz}; W_{in}]

weight_hhParameter, shape ``(3 * hidden_size, hidden_size)``

Stacked hidden–hidden weight matrices

[W_{hr}; W_{hz}; W_{hn}]

bias_ihParameter or None, shape ``(3 * hidden_size,)``

Stacked input–hidden biases

[b_{ir}; b_{iz}; b_{in}]

. None when bias=False.

bias_hhParameter or None, shape ``(3 * hidden_size,)``

Stacked hidden–hidden biases

[b_{hr}; b_{hz}; b_{hn}]

. None when bias=False.

Notes

x: (N, input_size) — batch of input vectors.
hx (optional): (N, hidden_size) — initial hidden state. Defaults to zeros when None.
Output h_t: (N, hidden_size).

Weights are initialised from $\mathcal{U}(-1/\sqrt{H},\, 1/\sqrt{H})$ .

The GRU has fewer parameters than the LSTM (no cell state, three gates instead of four) and often converges faster on shorter sequences while matching LSTM quality on many benchmarks.

GRU : Multi-layer, multi-step GRU module. LSTMCell : Single-step LSTM cell with separate cell state. RNNCell : Vanilla single-step cell without gating.

Examples

Manual sequence loop:
>>> import lucid, lucid.nn as nn
>>> cell = nn.GRUCell(input_size=8, hidden_size=16)
>>> x_seq = lucid.randn(6, 4, 8)    # (L=6, N=4, I=8)
>>> h = lucid.zeros(4, 16)
>>> for t in range(6):
...     h = cell(x_seq[t], h)
>>> h.shape
(4, 16)
No explicit initial state (defaults to zeros):
>>> cell2 = nn.GRUCell(4, 12)
>>> h2 = cell2(lucid.randn(3, 4))
>>> h2.shape
(3, 12)

Methods (3)

dunder

init

→None

__init__(input_size: int, hidden_size: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source

Initialise the GRUCell module. See the class docstring for parameter semantics.

forward

→Tensor or tuple of Tensor

forward(x: Tensor, hx: Tensor | None = None)

source

Run the recurrent forward pass.

Parameters

xTensor

See the class docstring.

hxTensor= None

See the class docstring.

Returns

Tensor or tuple of Tensor

Output and (optionally) the new hidden state; see the class docstring.

extra_repr

→str

extra_repr()

source

Return a string representation of the layer's configuration.