class

KLDivLoss

extendsModule
KLDivLoss(reduction: str = 'mean', log_target: bool = False)
source

Kullback–Leibler divergence loss.

Measures how one probability distribution diverges from a reference distribution. The input x must be log-probabilities (e.g. output of LogSoftmax), and the target y must be probabilities (or log-probabilities when log_target=True).

With log_target=False (default):

(x,y)=y(logyx)\ell(x, y) = y \cdot (\log y - x)

With log_target=True (target already in log-space):

(x,y)=ey(yx)\ell(x, y) = e^{y} \cdot (y - x)

The scalar is obtained by reducing \ell according to reduction. Note that 'batchmean' (if supported) divides by the batch size NN, which corresponds to the mathematical KL definition.

Parameters

reductionstr= 'mean'
'none' | 'mean' (default) | 'sum' | 'batchmean'.
log_targetbool= False
If True, target is interpreted as log-probabilities. Default False.

Attributes

reductionstr
The reduction mode.
log_targetbool
Whether the target is in log-space.

Notes

  • Input x : ()(*) — log-probabilities.
  • Target y : ()(*) — probabilities (or log-probabilities when log_target=True).
  • Output : scalar for 'mean' / 'sum' / 'batchmean'; ()(*) for 'none'.
  • KL divergence is asymmetric: KL(PQ)KL(QP)\text{KL}(P \| Q) \neq \text{KL}(Q \| P).
  • Common applications include variational autoencoders (VAE), knowledge distillation, and training language models.
  • Passing raw probabilities (non-log) as x is a common mistake and will produce incorrect and potentially negative values.

Examples

Comparing two discrete distributions (``'batchmean'`` follows the KL
mathematical convention):
>>> import lucid
>>> import lucid.nn as nn
>>> import lucid.nn.functional as F
>>> criterion = nn.KLDivLoss(reduction="batchmean")
>>> log_pred = F.log_softmax(lucid.tensor([[0.5, 1.0, 0.2]]), dim=1)
>>> target   = F.softmax(lucid.tensor([[0.3, 0.9, 0.4]]),   dim=1)
>>> loss = criterion(log_pred, target)
With log-space target (knowledge distillation style):
>>> import lucid
>>> import lucid.nn as nn
>>> import lucid.nn.functional as F
>>> criterion = nn.KLDivLoss(reduction="batchmean", log_target=True)
>>> log_p = F.log_softmax(lucid.tensor([[1.0, 2.0, 0.5]]), dim=1)
>>> log_q = F.log_softmax(lucid.tensor([[0.8, 1.5, 0.9]]), dim=1)
>>> loss = criterion(log_p, log_q)

Methods (3)

dunder

__init__

None
__init__(reduction: str = 'mean', log_target: bool = False)
source

Initialise the KLDivLoss module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor, target: Tensor)
source

Compute the loss between predictions and targets.

Parameters

xTensor
Input tensor.
targetTensor
Input tensor.

Returns

Tensor

Scalar loss (or unreduced tensor depending on reduction).

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.