class

RNN

extends_CellNamingMixinModule
RNN(input_size: int, hidden_size: int, num_layers: int = 1, nonlinearity: str = 'tanh', bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)
source

Multi-layer Elman recurrent neural network (RNN).

Applies a stack of Elman RNN cells over an input sequence. At each time step tt the hidden state is updated by:

ht=ϕ ⁣(Wihxt+bih+Whhht1+bhh)h_t = \phi\!\left(W_{ih}\,x_t + b_{ih} + W_{hh}\,h_{t-1} + b_{hh}\right)

where ϕ\phi is either tanh\tanh (default) or ReLU\text{ReLU}, controlled by nonlinearity.

The output of layer \ell is fed as the input of layer +1\ell + 1. When bidirectional=True, two RNNs process the sequence in opposite directions and their outputs are concatenated along the feature axis at each time step.

Inter-layer dropout (probability dropout) is applied between adjacent layers during training.

.. warning::

Vanilla RNNs are prone to the vanishing gradient problem: gradients are multiplied by WhhW_{hh} at every time step, causing them to shrink exponentially for sequences longer than ~10–20 steps. For longer sequences, prefer LSTM or GRU, which use gating mechanisms to maintain gradient flow.

Parameters

input_sizeint
Number of expected features in the input xtx_t.
hidden_sizeint
Number of features in the hidden state hth_t (denoted HH below).
num_layersint= 1
Number of stacked RNN layers. Default: 1.
nonlinearity(tanh, relu)= 'tanh'
Activation function ϕ\phi. 'tanh' is recommended for most use cases; 'relu' can help when gradients vanish with tanh. Default: 'tanh'.
biasbool= True
If False, all bias parameters are omitted. Default: True.
batch_firstbool= False
If True, input/output tensors are (N, L, *) instead of the default (L, N, *). Default: False.
dropoutfloat= 0.0
Dropout probability applied after each layer except the last. 0.0 disables dropout. Default: 0.0.
bidirectionalbool= False
If True, use a bidirectional RNN; the output feature dimension becomes 2 * hidden_size. Default: False.
deviceDeviceLike= None
Device for weight allocation.
dtypeDTypeLike= None
Data type for weight tensors.

Notes

  • Input x: (L, N, input_size) or (N, L, input_size) when batch_first=True.
  • h_0 (optional): (D * num_layers, N, H) where D = 2 if bidirectional else 1. Defaults to zeros.
  • output: (L, N, D * H) or (N, L, D * H).
  • h_n: (D * num_layers, N, H) — hidden state at the final time step for each layer and direction.

Internally this module stores one RNNCell sub-module per layer per direction. The _CellNamingMixin flattens these into weight_ih_l{layer} etc. for checkpoint compatibility with the reference framework.

flatten_parameters is a no-op kept for API compatibility.

PackedSequence input is not yet supported.

RNNCell : Single time-step Elman cell. LSTM : Gated RNN with a separate cell state (better for long seqs). GRU : Gated RNN without a separate cell state.

Examples

Two-layer RNN, batch-first:
>>> import lucid, lucid.nn as nn
>>> rnn = nn.RNN(8, 16, num_layers=2, batch_first=True)
>>> x = lucid.randn(2, 5, 8)       # (N=2, L=5, I=8)
>>> out, h_n = rnn(x)
>>> out.shape, h_n.shape
((2, 5, 16), (2, 2, 16))
Bidirectional RNN with ReLU activation:
>>> rnn_bi = nn.RNN(
...     8, 16, nonlinearity='relu',
...     bidirectional=True, batch_first=True,
... )
>>> x2 = lucid.randn(3, 7, 8)
>>> out2, h_n2 = rnn_bi(x2)
>>> out2.shape    # D*H = 2*16 = 32
(3, 7, 32)
>>> h_n2.shape    # D*num_layers = 2*1 = 2
(2, 3, 16)

Methods (4)

dunder

__init__

None
__init__(input_size: int, hidden_size: int, num_layers: int = 1, nonlinearity: str = 'tanh', bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)
source

Initialise the RNN module. See the class docstring for parameter semantics.

fn

flatten_parameters

None
flatten_parameters()
source

No-op for API compatibility (see LSTM.flatten_parameters).

fn

forward

Tensor or tuple of Tensor
forward(x: Tensor, hx: Tensor | None = None)
source

Run the recurrent forward pass.

Parameters

xTensor
See the class docstring.
hxTensor= None
See the class docstring.

Returns

Tensor or tuple of Tensor

Output and (optionally) the new hidden state; see the class docstring.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.