class
CTCLoss
extends
ModuleCTCLoss(blank: int = 0, reduction: str = 'mean', zero_infinity: bool = False)Connectionist Temporal Classification (CTC) loss.
Computes the CTC loss for sequence-to-sequence learning where the alignment between the input and the target is unknown. The loss marginalises over all valid monotonic alignments between the input sequence and the target sequence:
where is the CTC collapsing function (removes blanks and repeated tokens) and the sum is over all valid paths that decode to the target sequence .
Parameters
blankint= 0Index of the blank label. Default
0.reductionstr= 'mean''none' | 'mean' (default) | 'sum'.zero_infinitybool= FalseIf
True, infinite losses and their gradients are set to zero.
Prevents instability for very long sequences or mismatched lengths.
Default False.Attributes
blankintThe blank label index.
reductionstrThe reduction mode.
zero_infinityboolWhether to zero out infinite-valued losses.
Notes
- log_probs : — log-probabilities over the
alphabet (including blank), typically output of
LogSoftmax. = input length, = batch size, = number of classes. - targets : or — target sequences (without blank labels).
- input_lengths : — length of each input sequence.
- target_lengths : — length of each target sequence.
- Output : scalar for
'mean'/'sum'; for'none'.
- CTC is widely used in automatic speech recognition (ASR) and optical character recognition (OCR) because it does not require aligned training data.
- The input length for each sample must satisfy (input cannot be shorter than the target).
input_lengthsandtarget_lengthsmust be integer tensors.
Examples
Single-sample sequence with 5 frames, 3 target characters, 6 classes:
>>> import lucid
>>> import lucid.nn as nn
>>> import lucid.nn.functional as F
>>> T, N, C = 5, 1, 6
>>> criterion = nn.CTCLoss(blank=0)
>>> log_probs = F.log_softmax(
... lucid.zeros(T, N, C), dim=2
... )
>>> targets = lucid.tensor([[1, 2, 3]])
>>> input_lengths = lucid.tensor([T])
>>> target_lengths = lucid.tensor([3])
>>> loss = criterion(log_probs, targets, input_lengths, target_lengths)
With ``zero_infinity=True`` for robustness:
>>> import lucid
>>> import lucid.nn as nn
>>> criterion = nn.CTCLoss(blank=0, zero_infinity=True, reduction="sum")
>>> log_probs = lucid.zeros(10, 2, 8)
>>> targets = lucid.tensor([[1, 2, 3], [4, 5, 6]])
>>> input_lengths = lucid.tensor([10, 10])
>>> target_lengths = lucid.tensor([ 3, 3])
>>> loss = criterion(log_probs, targets, input_lengths, target_lengths)Methods (3)
dunder
__init__
→None__init__(blank: int = 0, reduction: str = 'mean', zero_infinity: bool = False)Initialise the CTCLoss module. See the class docstring for parameter semantics.
fn
forward
→Tensorforward(log_probs: Tensor, targets: Tensor, input_lengths: Tensor, target_lengths: Tensor)Compute the loss between predictions and targets.
Parameters
log_probsTensorInput tensor.
targetsTensorInput tensor.
input_lengthsTensorInput tensor.
target_lengthsTensorInput tensor.
Returns
TensorScalar loss (or unreduced tensor depending on reduction).
fn
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.