class

GRUCell

extendsModule
GRUCell(input_size: int, hidden_size: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Single time-step Gated Recurrent Unit (GRU) cell.

Computes one recurrent update using the three-gate GRU equations:

rt=σ(Wirxt+bir+Whrht1+bhr)(reset gate)zt=σ(Wizxt+biz+Whzht1+bhz)(update gate)nt=tanh ⁣(Winxt+bin+rt(Whnht1+bhn))(new gate)ht=(1zt)ht1+ztnt\begin{aligned} r_t &= \sigma(W_{ir}\,x_t + b_{ir} + W_{hr}\,h_{t-1} + b_{hr}) & &\text{(reset gate)}\\ z_t &= \sigma(W_{iz}\,x_t + b_{iz} + W_{hz}\,h_{t-1} + b_{hz}) & &\text{(update gate)}\\ n_t &= \tanh\!\left(W_{in}\,x_t + b_{in} + r_t \odot (W_{hn}\,h_{t-1} + b_{hn})\right) & &\text{(new gate)}\\ h_t &= (1 - z_t) \odot h_{t-1} + z_t \odot n_t \end{aligned}

The reset gate rtr_t controls how much of the previous hidden state leaks into the candidate ntn_t; setting it near zero makes the cell ignore past context. The update gate ztz_t interpolates between the old hidden state and the candidate, allowing the cell to retain information over many steps without an explicit cell state.

The three gate weight matrices are stacked into single parameters of shape (3H, *) in gate order [r; z; n].

Parameters

input_sizeint
Number of features in the input vector xtx_t.
hidden_sizeint
Number of features in the hidden state hth_t.
biasbool= True
If False, no bias terms are used. Default: True.
deviceDeviceLike= None
Device for weight allocation.
dtypeDTypeLike= None
Data type for weight tensors.

Attributes

weight_ihParameter, shape ``(3 * hidden_size, input_size)``
Stacked input–hidden weight matrices [Wir;Wiz;Win][W_{ir}; W_{iz}; W_{in}].
weight_hhParameter, shape ``(3 * hidden_size, hidden_size)``
Stacked hidden–hidden weight matrices [Whr;Whz;Whn][W_{hr}; W_{hz}; W_{hn}].
bias_ihParameter or None, shape ``(3 * hidden_size,)``
Stacked input–hidden biases [bir;biz;bin][b_{ir}; b_{iz}; b_{in}]. None when bias=False.
bias_hhParameter or None, shape ``(3 * hidden_size,)``
Stacked hidden–hidden biases [bhr;bhz;bhn][b_{hr}; b_{hz}; b_{hn}]. None when bias=False.

Notes

  • x: (N, input_size) — batch of input vectors.
  • hx (optional): (N, hidden_size) — initial hidden state. Defaults to zeros when None.
  • Output h_t: (N, hidden_size).

Weights are initialised from U(1/H,1/H)\mathcal{U}(-1/\sqrt{H},\, 1/\sqrt{H}).

The GRU has fewer parameters than the LSTM (no cell state, three gates instead of four) and often converges faster on shorter sequences while matching LSTM quality on many benchmarks.

GRU : Multi-layer, multi-step GRU module. LSTMCell : Single-step LSTM cell with separate cell state. RNNCell : Vanilla single-step cell without gating.

Examples

Manual sequence loop:
>>> import lucid, lucid.nn as nn
>>> cell = nn.GRUCell(input_size=8, hidden_size=16)
>>> x_seq = lucid.randn(6, 4, 8)    # (L=6, N=4, I=8)
>>> h = lucid.zeros(4, 16)
>>> for t in range(6):
...     h = cell(x_seq[t], h)
>>> h.shape
(4, 16)
No explicit initial state (defaults to zeros):
>>> cell2 = nn.GRUCell(4, 12)
>>> h2 = cell2(lucid.randn(3, 4))
>>> h2.shape
(3, 12)

Methods (3)

dunder

__init__

None
__init__(input_size: int, hidden_size: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Initialise the GRUCell module. See the class docstring for parameter semantics.

fn

forward

Tensor or tuple of Tensor
forward(x: Tensor, hx: Tensor | None = None)
source

Run the recurrent forward pass.

Parameters

xTensor
See the class docstring.
hxTensor= None
See the class docstring.

Returns

Tensor or tuple of Tensor

Output and (optionally) the new hidden state; see the class docstring.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.