ctc_loss

→Tensor

ctc_loss(log_probs: Tensor, targets: Tensor, input_lengths: Tensor, target_lengths: Tensor, blank: int = 0, reduction: Reduction = 'mean', zero_infinity: bool = False)

source edit

Implementing kernel

C++ctc_loss_opfree fn

Connectionist Temporal Classification (CTC) loss.

The standard training objective for unaligned sequence prediction — used in speech recognition, handwriting recognition, and any task where the input sequence is much longer than the target and no per-frame alignment is provided. Introduced by Graves et al. 2006.

Internally marginalises over every valid alignment of a $T$ -frame prediction onto an $S$ -symbol target by inserting "blank" symbols and allowing each target symbol to span one or more frames, computing the negative log of the total path probability via dynamic programming.

Parameters

log_probsTensor

Log-probabilities of shape

(T, N, C)

where

T

is the input sequence length,

N

is the batch size, and

C

is the number of classes (including the blank). Typically produced by lucid.nn.functional.log_softmax over the class axis.

targetsTensor

Target indices, shape

(N, S)

(padded) or

(\sum_i \text{target\_lengths}_i,)

(concatenated). int32.

input_lengthsTensor

Effective input lengths

(N,)

, int32. Enables padding-aware batching.

target_lengthsTensor

Effective target lengths

(N,)

, int32.

blankint= 0

Index of the blank symbol (default 0).

reductionstr= 'mean'

"mean" (default), "sum", or "none". Under "mean", the per-sample loss is averaged across the batch.

zero_infinitybool= False

When True, infinite losses (which arise when a target cannot fit in the available input frames) and their gradients are set to zero, effectively skipping those samples (default False).

Returns

Tensor

Scalar ("mean" / "sum") or per-sample tensor of shape $(N,)$ .

Notes

The CTC objective is the negative log of the total alignment probability:

L = -\log \sum_{\pi \in \mathcal{B}^{-1}(\mathbf{y})} \prod_{t=1}^{T} p_t(\pi_t),

where $\mathcal{B}$ is the "many-to-one" alignment map that collapses repeats and removes blanks. The forward DP runs in log-domain (Accelerate arithmetic on CPU); the GPU stream currently falls back to CPU.

Examples

>>> import lucid
>>> from lucid.nn.functional import ctc_loss, log_softmax
>>> # T=4 frames, N=1 batch, C=3 classes (blank=0)
>>> logits = lucid.randn(4, 1, 3)
>>> log_p = log_softmax(logits, dim=2)
>>> targets = lucid.tensor([[1, 2]], dtype=lucid.int32)
>>> il = lucid.tensor([4], dtype=lucid.int32)
>>> tl = lucid.tensor([2], dtype=lucid.int32)
>>> ctc_loss(log_p, targets, il, tl)
Tensor(...)

Used by 2

>>> import lucid >>> from lucid.nn.functional import ctc_loss, log_softmax >>> # T=4 frames, N=1 batch, C=3 classes (blank=0) >>> logits = lucid.randn(4, 1, 3) >>> log_p = log_softmax(logits, dim=2) >>> targets = lucid.tensor([[1, 2]], dtype=lucid.int32) >>> il = lucid.tensor([4], dtype=lucid.int32) >>> tl = lucid.tensor([2], dtype=lucid.int32) >>> ctc_loss(log_p, targets, il, tl) Tensor(...)