class

RNN

extends_CellNamingMixinModule

RNN(input_size: int, hidden_size: int, num_layers: int = 1, nonlinearity: str = 'tanh', bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)

source

Multi-layer Elman recurrent neural network (RNN).

Applies a stack of Elman RNN cells over an input sequence. At each time step $t$ the hidden state is updated by:

h_t = \phi\!\left(W_{ih}\,x_t + b_{ih} + W_{hh}\,h_{t-1} + b_{hh}\right)

where $\phi$ is either $\tanh$ (default) or $\text{ReLU}$ , controlled by nonlinearity.

The output of layer $\ell$ is fed as the input of layer $\ell + 1$ . When bidirectional=True, two RNNs process the sequence in opposite directions and their outputs are concatenated along the feature axis at each time step.

Inter-layer dropout (probability dropout) is applied between adjacent layers during training.

.. warning::

Vanilla RNNs are prone to the vanishing gradient problem: gradients are multiplied by $W_{hh}$ at every time step, causing them to shrink exponentially for sequences longer than ~10–20 steps. For longer sequences, prefer LSTM or GRU, which use gating mechanisms to maintain gradient flow.

Parameters

input_sizeint

Number of expected features in the input

x_t

hidden_sizeint

Number of features in the hidden state

h_t

(denoted

H

below).

num_layersint= 1

Number of stacked RNN layers. Default: 1.

nonlinearity(tanh, relu)= 'tanh'

Activation function

\phi

. 'tanh' is recommended for most use cases; 'relu' can help when gradients vanish with tanh. Default: 'tanh'.

biasbool= True

If False, all bias parameters are omitted. Default: True.

batch_firstbool= False

If True, input/output tensors are (N, L, *) instead of the default (L, N, *). Default: False.

dropoutfloat= 0.0

Dropout probability applied after each layer except the last. 0.0 disables dropout. Default: 0.0.

bidirectionalbool= False

If True, use a bidirectional RNN; the output feature dimension becomes 2 * hidden_size. Default: False.

deviceDeviceLike= None

Device for weight allocation.

dtypeDTypeLike= None

Data type for weight tensors.

Notes

Input x: (L, N, input_size) or (N, L, input_size) when batch_first=True.
h_0 (optional): (D * num_layers, N, H) where D = 2 if bidirectional else 1. Defaults to zeros.
output: (L, N, D * H) or (N, L, D * H).
h_n: (D * num_layers, N, H) — hidden state at the final time step for each layer and direction.

Internally this module stores one RNNCell sub-module per layer per direction. The _CellNamingMixin flattens these into weight_ih_l{layer} etc. for checkpoint compatibility with the reference framework.

flatten_parameters is a no-op kept for API compatibility.

PackedSequence input is not yet supported.

RNNCell : Single time-step Elman cell. LSTM : Gated RNN with a separate cell state (better for long seqs). GRU : Gated RNN without a separate cell state.

Examples

Two-layer RNN, batch-first:
>>> import lucid, lucid.nn as nn
>>> rnn = nn.RNN(8, 16, num_layers=2, batch_first=True)
>>> x = lucid.randn(2, 5, 8)       # (N=2, L=5, I=8)
>>> out, h_n = rnn(x)
>>> out.shape, h_n.shape
((2, 5, 16), (2, 2, 16))
Bidirectional RNN with ReLU activation:
>>> rnn_bi = nn.RNN(
...     8, 16, nonlinearity='relu',
...     bidirectional=True, batch_first=True,
... )
>>> x2 = lucid.randn(3, 7, 8)
>>> out2, h_n2 = rnn_bi(x2)
>>> out2.shape    # D*H = 2*16 = 32
(3, 7, 32)
>>> h_n2.shape    # D*num_layers = 2*1 = 2
(2, 3, 16)

Methods (4)

dunder

init

→None

__init__(input_size: int, hidden_size: int, num_layers: int = 1, nonlinearity: str = 'tanh', bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)

source

Initialise the RNN module. See the class docstring for parameter semantics.

flatten_parameters

→None

flatten_parameters()

source

No-op for API compatibility (see LSTM.flatten_parameters).

forward

→Tensor or tuple of Tensor

forward(x: Tensor, hx: Tensor | None = None)

source

Run the recurrent forward pass.

Parameters

xTensor

See the class docstring.

hxTensor= None

See the class docstring.

Returns

Tensor or tuple of Tensor

Output and (optionally) the new hidden state; see the class docstring.

extra_repr

→str

extra_repr()

source

Return a string representation of the layer's configuration.