class

LSTMCell

extendsModule

LSTMCell(input_size: int, hidden_size: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source

Single time-step Long Short-Term Memory (LSTM) cell.

Computes one recurrent update using the full LSTM gating equations:

\begin{aligned} i_t &= \sigma(W_{ii}\,x_t + b_{ii} + W_{hi}\,h_{t-1} + b_{hi}) & &\text{(input gate)}\\ f_t &= \sigma(W_{if}\,x_t + b_{if} + W_{hf}\,h_{t-1} + b_{hf}) & &\text{(forget gate)}\\ g_t &= \tanh(W_{ig}\,x_t + b_{ig} + W_{hg}\,h_{t-1} + b_{hg}) & &\text{(cell gate)}\\ o_t &= \sigma(W_{io}\,x_t + b_{io} + W_{ho}\,h_{t-1} + b_{ho}) & &\text{(output gate)}\\ c_t &= f_t \odot c_{t-1} + i_t \odot g_t\\ h_t &= o_t \odot \tanh(c_t) \end{aligned}

The four weight matrices for the gates are stored as a single vertically-stacked parameter of shape (4H, *), in gate order [i; f; g; o] (i.e. the first H rows correspond to the input gate, the next H to the forget gate, and so on).

Parameters

input_sizeint

Number of features in the input vector

x_t

hidden_sizeint

Number of features in the hidden/cell states

h_t, c_t

biasbool= True

If False, all bias terms are omitted. Default: True.

deviceDeviceLike= None

Device for weight allocation.

dtypeDTypeLike= None

Data type for weight tensors.

Attributes

weight_ihParameter, shape ``(4 * hidden_size, input_size)``

Stacked input–hidden weight matrices

[W_{ii}; W_{if}; W_{ig}; W_{io}]

weight_hhParameter, shape ``(4 * hidden_size, hidden_size)``

Stacked hidden–hidden weight matrices

[W_{hi}; W_{hf}; W_{hg}; W_{ho}]

bias_ihParameter or None, shape ``(4 * hidden_size,)``

Stacked input–hidden biases. None when bias=False.

bias_hhParameter or None, shape ``(4 * hidden_size,)``

Stacked hidden–hidden biases. None when bias=False.

Notes

x: (N, input_size) — batch of input vectors.
hx (optional): tuple (h_0, c_0) each of shape (N, hidden_size). Defaults to zero tensors when None.
Output: tuple (h_t, c_t) each of shape (N, hidden_size).

Weights are initialised from $\mathcal{U}(-1/\sqrt{H},\, 1/\sqrt{H})$ .

The forget-gate bias is not initialised to 1 by default (unlike some implementations). If you observe vanishing gradients on medium-length sequences, consider manually setting cell.bias_ih.data[H:2H] = 1.0 after construction.

LSTM : Multi-layer, multi-step LSTM module. RNNCell : Simpler single-step cell without gating. GRUCell : Single-step GRU cell (fewer gates, no separate cell state).

Examples

Single-step update, carrying state across a manual loop:
>>> import lucid, lucid.nn as nn
>>> cell = nn.LSTMCell(input_size=10, hidden_size=20)
>>> h, c = lucid.zeros(3, 20), lucid.zeros(3, 20)
>>> x_seq = lucid.randn(7, 3, 10)    # (L=7, N=3, I=10)
>>> for t in range(7):
...     h, c = cell(x_seq[t], (h, c))
>>> h.shape, c.shape
((3, 20), (3, 20))
Without providing an explicit initial state (zeros used):
>>> cell2 = nn.LSTMCell(4, 8)
>>> h2, c2 = cell2(lucid.randn(5, 4))
>>> h2.shape
(5, 8)

Methods (3)

dunder

init

→None

__init__(input_size: int, hidden_size: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source

Initialise the LSTMCell module. See the class docstring for parameter semantics.

forward

→Tensor or tuple of Tensor

forward(x: Tensor, hx: tuple[Tensor, Tensor] | None = None)

source

Run the recurrent forward pass.

Parameters

xTensor

See the class docstring.

hxTensor= None

See the class docstring.

Returns

Tensor or tuple of Tensor

Output and (optionally) the new hidden state; see the class docstring.

extra_repr

→str

extra_repr()

source

Return a string representation of the layer's configuration.