fn

gaussian_nll_loss

Tensor
gaussian_nll_loss(x: Tensor, target: Tensor, var: Tensor, full: bool = False, eps: float = 1e-06, reduction: str = 'mean')
source

Gaussian negative log-likelihood for heteroscedastic regression.

Maximum-likelihood objective when the prediction is a distribution N(μ,σ2)\mathcal{N}(\mu, \sigma^2) over the target rather than a point estimate. Training a network with two heads (one for μ\mu, one for σ2\sigma^2) against this loss recovers calibrated predictive uncertainty — useful for active learning, decision-aware regression, and Bayesian deep ensembles.

The variance var is clamped below by eps to prevent division by zero and runaway log-terms when the network initially predicts near-zero variance.

Parameters

xTensor
Predicted means μ\mu, any shape.
targetTensor
Observed values yy, broadcast-compatible with x.
varTensor
Predicted variances σ2>0\sigma^2 > 0, broadcast- compatible with x.
fullbool= False
Include the constant 12log(2π)\tfrac{1}{2}\log(2\pi) term in the loss value. Has no effect on gradients; useful only for reporting log-likelihoods. Not currently added.
epsfloat= 1e-06
Lower bound applied to var for numerical stability (default 1e-6).
reductionstr= 'mean'
"mean" (default), "sum", or "none".

Returns

Tensor

Scalar or full-shape per reduction.

Notes

Per-element loss (constant terms dropped):

Li=12 ⁣(logσi2+(yiμi)2σi2)L_i = \tfrac{1}{2}\!\left(\log \sigma_i^2 + \frac{(y_i - \mu_i)^2}{\sigma_i^2}\right)

The first term penalises over-confidence (small variance), the second term rewards accuracy weighted by precision. Together they give the model a clean trade-off: when it cannot reduce (yμ)2(y-\mu)^2, increasing σ2\sigma^2 decreases the loss — this is what produces calibrated uncertainty estimates.

Examples

>>> import lucid
>>> from lucid.nn.functional import gaussian_nll_loss
>>> mu = lucid.tensor([0.0, 1.0])
>>> y = lucid.tensor([0.5, 1.0])
>>> var = lucid.tensor([1.0, 0.25])
>>> gaussian_nll_loss(mu, y, var)
Tensor(-0.2218...)