KLDivLoss
ModuleKLDivLoss(reduction: str = 'mean', log_target: bool = False)Kullback–Leibler divergence loss.
Measures how one probability distribution diverges from a reference
distribution. The input x must be log-probabilities (e.g.
output of LogSoftmax), and the target y must be probabilities
(or log-probabilities when log_target=True).
With log_target=False (default):
With log_target=True (target already in log-space):
The scalar is obtained by reducing according to
reduction. Note that 'batchmean' (if supported) divides by
the batch size , which corresponds to the mathematical KL
definition.
Parameters
reductionstr= 'mean''none' | 'mean' (default) | 'sum' | 'batchmean'.log_targetbool= FalseTrue, target is interpreted as log-probabilities.
Default False.Attributes
reductionstrlog_targetboolNotes
- Input
x: — log-probabilities. - Target
y: — probabilities (or log-probabilities whenlog_target=True). - Output : scalar for
'mean'/'sum'/'batchmean'; for'none'.
- KL divergence is asymmetric: .
- Common applications include variational autoencoders (VAE), knowledge distillation, and training language models.
- Passing raw probabilities (non-log) as
xis a common mistake and will produce incorrect and potentially negative values.
Examples
Comparing two discrete distributions (``'batchmean'`` follows the KL
mathematical convention):
>>> import lucid
>>> import lucid.nn as nn
>>> import lucid.nn.functional as F
>>> criterion = nn.KLDivLoss(reduction="batchmean")
>>> log_pred = F.log_softmax(lucid.tensor([[0.5, 1.0, 0.2]]), dim=1)
>>> target = F.softmax(lucid.tensor([[0.3, 0.9, 0.4]]), dim=1)
>>> loss = criterion(log_pred, target)
With log-space target (knowledge distillation style):
>>> import lucid
>>> import lucid.nn as nn
>>> import lucid.nn.functional as F
>>> criterion = nn.KLDivLoss(reduction="batchmean", log_target=True)
>>> log_p = F.log_softmax(lucid.tensor([[1.0, 2.0, 0.5]]), dim=1)
>>> log_q = F.log_softmax(lucid.tensor([[0.8, 1.5, 0.9]]), dim=1)
>>> loss = criterion(log_p, log_q)Methods (3)
__init__
→None__init__(reduction: str = 'mean', log_target: bool = False)Initialise the KLDivLoss module. See the class docstring for parameter semantics.
forward
→Tensorforward(x: Tensor, target: Tensor)Compute the loss between predictions and targets.
Parameters
xTensortargetTensorReturns
TensorScalar loss (or unreduced tensor depending on reduction).
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.