class

LSTMCell

extendsModule
LSTMCell(input_size: int, hidden_size: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Single time-step Long Short-Term Memory (LSTM) cell.

Computes one recurrent update using the full LSTM gating equations:

it=σ(Wiixt+bii+Whiht1+bhi)(input gate)ft=σ(Wifxt+bif+Whfht1+bhf)(forget gate)gt=tanh(Wigxt+big+Whght1+bhg)(cell gate)ot=σ(Wioxt+bio+Whoht1+bho)(output gate)ct=ftct1+itgtht=ottanh(ct)\begin{aligned} i_t &= \sigma(W_{ii}\,x_t + b_{ii} + W_{hi}\,h_{t-1} + b_{hi}) & &\text{(input gate)}\\ f_t &= \sigma(W_{if}\,x_t + b_{if} + W_{hf}\,h_{t-1} + b_{hf}) & &\text{(forget gate)}\\ g_t &= \tanh(W_{ig}\,x_t + b_{ig} + W_{hg}\,h_{t-1} + b_{hg}) & &\text{(cell gate)}\\ o_t &= \sigma(W_{io}\,x_t + b_{io} + W_{ho}\,h_{t-1} + b_{ho}) & &\text{(output gate)}\\ c_t &= f_t \odot c_{t-1} + i_t \odot g_t\\ h_t &= o_t \odot \tanh(c_t) \end{aligned}

The four weight matrices for the gates are stored as a single vertically-stacked parameter of shape (4H, *), in gate order [i; f; g; o] (i.e. the first H rows correspond to the input gate, the next H to the forget gate, and so on).

Parameters

input_sizeint
Number of features in the input vector xtx_t.
hidden_sizeint
Number of features in the hidden/cell states ht,cth_t, c_t.
biasbool= True
If False, all bias terms are omitted. Default: True.
deviceDeviceLike= None
Device for weight allocation.
dtypeDTypeLike= None
Data type for weight tensors.

Attributes

weight_ihParameter, shape ``(4 * hidden_size, input_size)``
Stacked input–hidden weight matrices [Wii;Wif;Wig;Wio][W_{ii}; W_{if}; W_{ig}; W_{io}].
weight_hhParameter, shape ``(4 * hidden_size, hidden_size)``
Stacked hidden–hidden weight matrices [Whi;Whf;Whg;Who][W_{hi}; W_{hf}; W_{hg}; W_{ho}].
bias_ihParameter or None, shape ``(4 * hidden_size,)``
Stacked input–hidden biases. None when bias=False.
bias_hhParameter or None, shape ``(4 * hidden_size,)``
Stacked hidden–hidden biases. None when bias=False.

Notes

  • x: (N, input_size) — batch of input vectors.
  • hx (optional): tuple (h_0, c_0) each of shape (N, hidden_size). Defaults to zero tensors when None.
  • Output: tuple (h_t, c_t) each of shape (N, hidden_size).

Weights are initialised from U(1/H,1/H)\mathcal{U}(-1/\sqrt{H},\, 1/\sqrt{H}).

The forget-gate bias is not initialised to 1 by default (unlike some implementations). If you observe vanishing gradients on medium-length sequences, consider manually setting cell.bias_ih.data[H:2H] = 1.0 after construction.

LSTM : Multi-layer, multi-step LSTM module. RNNCell : Simpler single-step cell without gating. GRUCell : Single-step GRU cell (fewer gates, no separate cell state).

Examples

Single-step update, carrying state across a manual loop:
>>> import lucid, lucid.nn as nn
>>> cell = nn.LSTMCell(input_size=10, hidden_size=20)
>>> h, c = lucid.zeros(3, 20), lucid.zeros(3, 20)
>>> x_seq = lucid.randn(7, 3, 10)    # (L=7, N=3, I=10)
>>> for t in range(7):
...     h, c = cell(x_seq[t], (h, c))
>>> h.shape, c.shape
((3, 20), (3, 20))
Without providing an explicit initial state (zeros used):
>>> cell2 = nn.LSTMCell(4, 8)
>>> h2, c2 = cell2(lucid.randn(5, 4))
>>> h2.shape
(5, 8)

Methods (3)

dunder

__init__

None
__init__(input_size: int, hidden_size: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Initialise the LSTMCell module. See the class docstring for parameter semantics.

fn

forward

Tensor or tuple of Tensor
forward(x: Tensor, hx: tuple[Tensor, Tensor] | None = None)
source

Run the recurrent forward pass.

Parameters

xTensor
See the class docstring.
hxTensor= None
See the class docstring.

Returns

Tensor or tuple of Tensor

Output and (optionally) the new hidden state; see the class docstring.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.