fn
kl_divergence
→Tensorkl_divergence(p: Distribution, q: Distribution)Compute the Kullback–Leibler divergence .
The KL divergence is the expected log-ratio of two probability measures and defined on the same sample space:
It is non-negative (, with equality iff almost everywhere), is not symmetric in general (), and does not satisfy the triangle inequality — so it is a divergence, not a metric.
Dispatch walks the registry built by register_kl: first an
exact class match, then the MRO of type(p) × type(q) for the
most-derived registered ancestor pair. Falls back to a single-sample
Monte Carlo estimate when is True
and no analytical formula is registered.
Parameters
pDistributionLeft-hand distribution.
qDistributionRight-hand distribution (must share the support / event shape
of
p).Returns
TensorNon-negative tensor of shape batch_shape giving the per-batch
KL divergence in nats.
Raises
NotImplementedErrorWhen no closed-form pair is registered and does not
support reparameterised sampling, so MC fall-back is unavailable.
Notes
KL divergence appears throughout machine learning:
- Variational inference: the ELBO equals .
- VAE training: the encoder regulariser is , available analytically for Normal-vs-Normal pairs.
- Maximum likelihood: minimising negative log-likelihood is equivalent to minimising .
- Mode-seeking vs mode-covering: is mode-covering in ; is mode-seeking — important for variational approximations.
Examples
>>> import lucid
>>> from lucid.distributions import Normal
>>> from lucid.distributions.kl import kl_divergence
>>> p = Normal(loc=0.0, scale=1.0)
>>> q = Normal(loc=1.0, scale=2.0)
>>> kl_divergence(p, q)
Tensor(...)