ctc_loss
→Tensorctc_loss(log_probs: Tensor, targets: Tensor, input_lengths: Tensor, target_lengths: Tensor, blank: int = 0, reduction: str = 'mean', zero_infinity: bool = False)Connectionist Temporal Classification (CTC) loss.
The standard training objective for unaligned sequence prediction — used in speech recognition, handwriting recognition, and any task where the input sequence is much longer than the target and no per-frame alignment is provided. Introduced by Graves et al. 2006.
Internally marginalises over every valid alignment of a -frame prediction onto an -symbol target by inserting "blank" symbols and allowing each target symbol to span one or more frames, computing the negative log of the total path probability via dynamic programming.
Parameters
log_probsTensorlucid.nn.functional.log_softmax over the class
axis.targetsTensorint32.input_lengthsTensorint32. Enables
padding-aware batching.target_lengthsTensorint32.blankint= 00).reductionstr= 'mean'"mean" (default), "sum", or "none". Under
"mean", the per-sample loss is averaged across the
batch.zero_infinitybool= FalseTrue, infinite losses (which arise when a target
cannot fit in the available input frames) and their
gradients are set to zero, effectively skipping those
samples (default False).Returns
TensorScalar ("mean" / "sum") or per-sample tensor of
shape .
Notes
The CTC objective is the negative log of the total alignment probability:
where is the "many-to-one" alignment map that collapses repeats and removes blanks. The forward DP runs in log-domain (Accelerate arithmetic on CPU); the GPU stream currently falls back to CPU.
Examples
>>> import lucid
>>> from lucid.nn.functional import ctc_loss, log_softmax
>>> # T=4 frames, N=1 batch, C=3 classes (blank=0)
>>> logits = lucid.randn(4, 1, 3)
>>> log_p = log_softmax(logits, dim=2)
>>> targets = lucid.tensor([[1, 2]], dtype=lucid.int32)
>>> il = lucid.tensor([4], dtype=lucid.int32)
>>> tl = lucid.tensor([2], dtype=lucid.int32)
>>> ctc_loss(log_p, targets, il, tl)
Tensor(...)