class

CTCLoss

extendsModule

CTCLoss(blank: int = 0, reduction: Reduction = 'mean', zero_infinity: bool = False)

source edit

Connectionist Temporal Classification (CTC) loss.

Computes the CTC loss for sequence-to-sequence learning where the alignment between the input and the target is unknown. The loss marginalises over all valid monotonic alignments between the input sequence and the target sequence:

\mathcal{L} = -\log \sum_{\pi \in \mathcal{B}^{-1}(y)} \prod_{t=1}^{T} p\!\left(\pi_t \mid x_t\right)

where $\mathcal{B}$ is the CTC collapsing function (removes blanks and repeated tokens) and the sum is over all valid paths $\pi$ that decode to the target sequence $y$ .

Parameters

blankint= 0

Index of the blank label. Default 0.

reductionstr= 'mean'

'none' | 'mean' (default) | 'sum'.

zero_infinitybool= False

If True, infinite losses and their gradients are set to zero. Prevents instability for very long sequences or mismatched lengths. Default False.

Attributes

blankint

The blank label index.

reductionstr

The reduction mode.

zero_infinitybool

Whether to zero out infinite-valued losses.

Notes

log_probs : $(T, N, C)$ — log-probabilities over the alphabet (including blank), typically output of LogSoftmax. $T$ = input length, $N$ = batch size, $C$ = number of classes.
targets : $(N, S)$ or $(\sum S_n,)$ — target sequences (without blank labels).
input_lengths : $(N,)$ — length of each input sequence.
target_lengths : $(N,)$ — length of each target sequence.
Output : scalar for 'mean' / 'sum'; $(N,)$ for 'none'.

CTC is widely used in automatic speech recognition (ASR) and optical character recognition (OCR) because it does not require aligned training data.
The input length for each sample must satisfy $T_n \geq S_n$ (input cannot be shorter than the target).
input_lengths and target_lengths must be integer tensors.

Examples

Single-sample sequence with 5 frames, 3 target characters, 6 classes:
>>> import lucid
>>> import lucid.nn as nn
>>> import lucid.nn.functional as F
>>> T, N, C = 5, 1, 6
>>> criterion = nn.CTCLoss(blank=0)
>>> log_probs = F.log_softmax(
...     lucid.zeros(T, N, C), dim=2
... )
>>> targets        = lucid.tensor([[1, 2, 3]])
>>> input_lengths  = lucid.tensor([T])
>>> target_lengths = lucid.tensor([3])
>>> loss = criterion(log_probs, targets, input_lengths, target_lengths)
With zero_infinity=True for robustness:
>>> import lucid
>>> import lucid.nn as nn
>>> criterion = nn.CTCLoss(blank=0, zero_infinity=True, reduction="sum")
>>> log_probs = lucid.zeros(10, 2, 8)
>>> targets        = lucid.tensor([[1, 2, 3], [4, 5, 6]])
>>> input_lengths  = lucid.tensor([10, 10])
>>> target_lengths = lucid.tensor([ 3,  3])
>>> loss = criterion(log_probs, targets, input_lengths, target_lengths)

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(blank: int = 0, reduction: Reduction = 'mean', zero_infinity: bool = False)

source edit

Initialise the CTCLoss module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(log_probs: Tensor, targets: Tensor, input_lengths: Tensor, target_lengths: Tensor)

source edit

Compute the loss between predictions and targets.

Parameters

log_probsTensor

Input tensor.

targetsTensor

Input tensor.

input_lengthsTensor

Input tensor.

target_lengthsTensor

Input tensor.

Returns

Tensor

Scalar loss (or unreduced tensor depending on reduction).

Single-sample sequence with 5 frames, 3 target characters, 6 classes: >>> import lucid >>> import lucid.nn as nn >>> import lucid.nn.functional as F >>> T, N, C = 5, 1, 6 >>> criterion = nn.CTCLoss(blank=0) >>> log_probs = F.log_softmax( ... lucid.zeros(T, N, C), dim=2 ... ) >>> targets = lucid.tensor([[1, 2, 3]]) >>> input_lengths = lucid.tensor([T]) >>> target_lengths = lucid.tensor([3]) >>> loss = criterion(log_probs, targets, input_lengths, target_lengths) With zero_infinity=True for robustness: >>> import lucid >>> import lucid.nn as nn >>> criterion = nn.CTCLoss(blank=0, zero_infinity=True, reduction="sum") >>> log_probs = lucid.zeros(10, 2, 8) >>> targets = lucid.tensor([[1, 2, 3], [4, 5, 6]]) >>> input_lengths = lucid.tensor([10, 10]) >>> target_lengths = lucid.tensor([ 3, 3]) >>> loss = criterion(log_probs, targets, input_lengths, target_lengths)