class

CTCLoss

extendsModule
CTCLoss(blank: int = 0, reduction: str = 'mean', zero_infinity: bool = False)
source

Connectionist Temporal Classification (CTC) loss.

Computes the CTC loss for sequence-to-sequence learning where the alignment between the input and the target is unknown. The loss marginalises over all valid monotonic alignments between the input sequence and the target sequence:

L=logπB1(y)t=1Tp ⁣(πtxt)\mathcal{L} = -\log \sum_{\pi \in \mathcal{B}^{-1}(y)} \prod_{t=1}^{T} p\!\left(\pi_t \mid x_t\right)

where B\mathcal{B} is the CTC collapsing function (removes blanks and repeated tokens) and the sum is over all valid paths π\pi that decode to the target sequence yy.

Parameters

blankint= 0
Index of the blank label. Default 0.
reductionstr= 'mean'
'none' | 'mean' (default) | 'sum'.
zero_infinitybool= False
If True, infinite losses and their gradients are set to zero. Prevents instability for very long sequences or mismatched lengths. Default False.

Attributes

blankint
The blank label index.
reductionstr
The reduction mode.
zero_infinitybool
Whether to zero out infinite-valued losses.

Notes

  • log_probs : (T,N,C)(T, N, C) — log-probabilities over the alphabet (including blank), typically output of LogSoftmax. TT = input length, NN = batch size, CC = number of classes.
  • targets : (N,S)(N, S) or (Sn,)(\sum S_n,) — target sequences (without blank labels).
  • input_lengths : (N,)(N,) — length of each input sequence.
  • target_lengths : (N,)(N,) — length of each target sequence.
  • Output : scalar for 'mean' / 'sum'; (N,)(N,) for 'none'.
  • CTC is widely used in automatic speech recognition (ASR) and optical character recognition (OCR) because it does not require aligned training data.
  • The input length for each sample must satisfy TnSnT_n \geq S_n (input cannot be shorter than the target).
  • input_lengths and target_lengths must be integer tensors.

Examples

Single-sample sequence with 5 frames, 3 target characters, 6 classes:
>>> import lucid
>>> import lucid.nn as nn
>>> import lucid.nn.functional as F
>>> T, N, C = 5, 1, 6
>>> criterion = nn.CTCLoss(blank=0)
>>> log_probs = F.log_softmax(
...     lucid.zeros(T, N, C), dim=2
... )
>>> targets        = lucid.tensor([[1, 2, 3]])
>>> input_lengths  = lucid.tensor([T])
>>> target_lengths = lucid.tensor([3])
>>> loss = criterion(log_probs, targets, input_lengths, target_lengths)
With ``zero_infinity=True`` for robustness:
>>> import lucid
>>> import lucid.nn as nn
>>> criterion = nn.CTCLoss(blank=0, zero_infinity=True, reduction="sum")
>>> log_probs = lucid.zeros(10, 2, 8)
>>> targets        = lucid.tensor([[1, 2, 3], [4, 5, 6]])
>>> input_lengths  = lucid.tensor([10, 10])
>>> target_lengths = lucid.tensor([ 3,  3])
>>> loss = criterion(log_probs, targets, input_lengths, target_lengths)

Methods (3)

dunder

__init__

None
__init__(blank: int = 0, reduction: str = 'mean', zero_infinity: bool = False)
source

Initialise the CTCLoss module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(log_probs: Tensor, targets: Tensor, input_lengths: Tensor, target_lengths: Tensor)
source

Compute the loss between predictions and targets.

Parameters

log_probsTensor
Input tensor.
targetsTensor
Input tensor.
input_lengthsTensor
Input tensor.
target_lengthsTensor
Input tensor.

Returns

Tensor

Scalar loss (or unreduced tensor depending on reduction).

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.