RNN
_CellNamingMixinModuleRNN(input_size: int, hidden_size: int, num_layers: int = 1, nonlinearity: str = 'tanh', bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)Multi-layer Elman recurrent neural network (RNN).
Applies a stack of Elman RNN cells over an input sequence. At each time step the hidden state is updated by:
where is either (default) or
, controlled by nonlinearity.
The output of layer is fed as the input of layer
. When bidirectional=True, two RNNs process the
sequence in opposite directions and their outputs are concatenated
along the feature axis at each time step.
Inter-layer dropout (probability dropout) is applied between
adjacent layers during training.
.. warning::
Vanilla RNNs are prone to the vanishing gradient problem:
gradients are multiplied by at every time step,
causing them to shrink exponentially for sequences longer than
~10–20 steps. For longer sequences, prefer LSTM or
GRU, which use gating mechanisms to maintain gradient
flow.
Parameters
input_sizeinthidden_sizeintnum_layersint= 11.nonlinearity(tanh, relu)= 'tanh''tanh' is recommended for
most use cases; 'relu' can help when gradients vanish with
tanh. Default: 'tanh'.biasbool= TrueFalse, all bias parameters are omitted. Default: True.batch_firstbool= FalseTrue, input/output tensors are (N, L, *) instead of
the default (L, N, *). Default: False.dropoutfloat= 0.00.0 disables dropout. Default: 0.0.bidirectionalbool= FalseTrue, use a bidirectional RNN; the output feature
dimension becomes 2 * hidden_size. Default: False.deviceDeviceLike= NonedtypeDTypeLike= NoneNotes
- Input
x:(L, N, input_size)or(N, L, input_size)whenbatch_first=True. - h_0 (optional):
(D * num_layers, N, H)whereD = 2if bidirectional else1. Defaults to zeros. - output:
(L, N, D * H)or(N, L, D * H). - h_n:
(D * num_layers, N, H)— hidden state at the final time step for each layer and direction.
Internally this module stores one RNNCell sub-module per
layer per direction. The _CellNamingMixin flattens these
into weight_ih_l{layer} etc. for checkpoint compatibility with
the reference framework.
flatten_parameters is a no-op kept for API compatibility.
PackedSequence input is not yet supported.
RNNCell : Single time-step Elman cell. LSTM : Gated RNN with a separate cell state (better for long seqs). GRU : Gated RNN without a separate cell state.
Examples
Two-layer RNN, batch-first:
>>> import lucid, lucid.nn as nn
>>> rnn = nn.RNN(8, 16, num_layers=2, batch_first=True)
>>> x = lucid.randn(2, 5, 8) # (N=2, L=5, I=8)
>>> out, h_n = rnn(x)
>>> out.shape, h_n.shape
((2, 5, 16), (2, 2, 16))
Bidirectional RNN with ReLU activation:
>>> rnn_bi = nn.RNN(
... 8, 16, nonlinearity='relu',
... bidirectional=True, batch_first=True,
... )
>>> x2 = lucid.randn(3, 7, 8)
>>> out2, h_n2 = rnn_bi(x2)
>>> out2.shape # D*H = 2*16 = 32
(3, 7, 32)
>>> h_n2.shape # D*num_layers = 2*1 = 2
(2, 3, 16)Methods (4)
__init__
→None__init__(input_size: int, hidden_size: int, num_layers: int = 1, nonlinearity: str = 'tanh', bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)Initialise the RNN module. See the class docstring for parameter semantics.
flatten_parameters
→Noneflatten_parameters()No-op for API compatibility (see LSTM.flatten_parameters).
forward
→Tensor or tuple of Tensorforward(x: Tensor, hx: Tensor | None = None)Run the recurrent forward pass.
Parameters
xTensorhxTensor= NoneReturns
Tensor or tuple of TensorOutput and (optionally) the new hidden state; see the class docstring.
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.