fn

ctc_loss

Tensor
ctc_loss(log_probs: Tensor, targets: Tensor, input_lengths: Tensor, target_lengths: Tensor, blank: int = 0, reduction: str = 'mean', zero_infinity: bool = False)
source

Connectionist Temporal Classification (CTC) loss.

The standard training objective for unaligned sequence prediction — used in speech recognition, handwriting recognition, and any task where the input sequence is much longer than the target and no per-frame alignment is provided. Introduced by Graves et al. 2006.

Internally marginalises over every valid alignment of a TT-frame prediction onto an SS-symbol target by inserting "blank" symbols and allowing each target symbol to span one or more frames, computing the negative log of the total path probability via dynamic programming.

Parameters

log_probsTensor
Log-probabilities of shape (T,N,C)(T, N, C) where TT is the input sequence length, NN is the batch size, and CC is the number of classes (including the blank). Typically produced by lucid.nn.functional.log_softmax over the class axis.
targetsTensor
Target indices, shape (N,S)(N, S) (padded) or (itarget_lengthsi,)(\sum_i \text{target\_lengths}_i,) (concatenated). int32.
input_lengthsTensor
Effective input lengths (N,)(N,), int32. Enables padding-aware batching.
target_lengthsTensor
Effective target lengths (N,)(N,), int32.
blankint= 0
Index of the blank symbol (default 0).
reductionstr= 'mean'
"mean" (default), "sum", or "none". Under "mean", the per-sample loss is averaged across the batch.
zero_infinitybool= False
When True, infinite losses (which arise when a target cannot fit in the available input frames) and their gradients are set to zero, effectively skipping those samples (default False).

Returns

Tensor

Scalar ("mean" / "sum") or per-sample tensor of shape (N,)(N,).

Notes

The CTC objective is the negative log of the total alignment probability:

L=logπB1(y)t=1Tpt(πt),L = -\log \sum_{\pi \in \mathcal{B}^{-1}(\mathbf{y})} \prod_{t=1}^{T} p_t(\pi_t),

where B\mathcal{B} is the "many-to-one" alignment map that collapses repeats and removes blanks. The forward DP runs in log-domain (Accelerate arithmetic on CPU); the GPU stream currently falls back to CPU.

Examples

>>> import lucid
>>> from lucid.nn.functional import ctc_loss, log_softmax
>>> # T=4 frames, N=1 batch, C=3 classes (blank=0)
>>> logits = lucid.randn(4, 1, 3)
>>> log_p = log_softmax(logits, dim=2)
>>> targets = lucid.tensor([[1, 2]], dtype=lucid.int32)
>>> il = lucid.tensor([4], dtype=lucid.int32)
>>> tl = lucid.tensor([2], dtype=lucid.int32)
>>> ctc_loss(log_p, targets, il, tl)
Tensor(...)