class

Categorical

extendsDistribution
Categorical(probs: Tensor | None = None, logits: Tensor | None = None, validate_args: bool | None = None)
source

Categorical distribution — a discrete distribution over K labelled outcomes.

Categorical(probs=p) or Categorical(logits=l) defines a distribution over the integer set {0,1,,K1}\{0, 1, \ldots, K-1\} where KK is the number of categories. Exactly one of probs or logits must be given.

Parameters

probsTensor | None= None
Non-negative probability vector (or batch of vectors) of shape (..., K). Rows are automatically normalised to sum to 1. Mutually exclusive with logits.
logitsTensor | None= None
Unnormalised log-probabilities of shape (..., K). The distribution uses softmax\text{softmax} internally to convert to normalised probabilities. Mutually exclusive with probs.
validate_argsbool | None= None
If True, validate parameter constraints at construction time.

Attributes

probsTensor
Normalised probability vector (shape (..., K); present when constructed with probs).
logitsTensor
Unnormalised log-probability vector (shape (..., K); present when constructed with logits).

Notes

PMF:

P(X=k)=pk,k{0,1,,K1},kpk=1P(X = k) = p_k, \quad k \in \{0, 1, \ldots, K-1\}, \quad \sum_{k} p_k = 1

Parameterisations are related by:

pk=elkjelj,lk=logpk  (up to additive constant)p_k = \frac{e^{l_k}}{\sum_j e^{l_j}}, \qquad l_k = \log p_k \;(\text{up to additive constant})

Entropy:

H[X]=k=0K1pklogpkH[X] = -\sum_{k=0}^{K-1} p_k \log p_k

Mean is not well-defined for a general Categorical (the labels have no canonical metric), so mean returns a NaN tensor of the batch shape.

Sampling uses the Gumbel-max trick: add i.i.d. Gumbel(0,1)\operatorname{Gumbel}(0, 1) noise to the log-probabilities and take the argmax. This is equivalent to ancestral sampling and avoids cumulative-sum + binary-search.

The batch dimensions of the input correspond to independent distributions. For example, probs of shape (B, K) yields a batch of BB Categorical distributions.

Examples

>>> import lucid
>>> from lucid.distributions import Categorical
>>> # Uniform over 4 categories
>>> dist = Categorical(probs=lucid.tensor([0.25, 0.25, 0.25, 0.25]))
>>> samples = dist.sample((10,))
>>> # Batch of 2 distributions
>>> dist_b = Categorical(logits=lucid.zeros(2, 5))
>>> dist_b.batch_shape, dist_b.event_shape
((2,), ())

Methods (6)

dunder

__init__

None
__init__(probs: Tensor | None = None, logits: Tensor | None = None, validate_args: bool | None = None)
source

Initialise a Categorical distribution.

Parameters

probsTensor | None= None
Non-negative probability vector of shape (..., K). Rows are automatically normalised to sum to 1. Mutually exclusive with logits.
logitsTensor | None= None
Unnormalised log-probabilities of shape (..., K). Converted to probabilities via softmax internally. Mutually exclusive with probs.
validate_argsbool | None= None
If True, validate parameter constraints at construction time.

Raises

ValueError
If both or neither of probs and logits are provided.
prop

support

Constraint
support: Constraint
source

Support of the distribution: integer interval {0,,K1}\{0, \ldots, K-1\}.

Returns

Constraint

An integer_interval constraint from 0 to K - 1.

prop

mean

Tensor
mean: Tensor
source

Mean of the Categorical distribution (undefined — returns NaN).

The Categorical distribution assigns labels with no inherent ordering or metric, so the mean is not well-defined. This property returns a NaN tensor of the batch shape to match expected behaviour.

Returns

Tensor

Tensor of float('nan') values with shape batch_shape.

fn

sample

Tensor
sample(sample_shape: tuple[int, ...] = ())
source

Draw samples from the Categorical distribution.

Uses the Gumbel-max trick: add i.i.d. Gumbel(0,1)\operatorname{Gumbel}(0, 1) noise to the log-probabilities and take the argmax, which is equivalent to ancestral sampling but avoids cumulative-sum and binary search.

Parameters

sample_shapetuple[int, ...]= ()
Leading shape of the output sample batch.

Returns

Tensor

Integer tensor of shape (*sample_shape, *batch_shape) with values in {0,1,,K1}\{0, 1, \ldots, K-1\}. The result is detached (no gradients flow through discrete samples).

fn

log_prob

Tensor
log_prob(value: Tensor)
source

Log-probability of the given category indices.

For a one-hot index kk, the log-probability is:

logP(X=k)=logpk\log P(X = k) = \log p_k

Parameters

valueTensor
Integer tensor of category indices with shape compatible with batch_shape. Values must be in {0,,K1}\{0, \ldots, K-1\}.

Returns

Tensor

Log-probabilities of shape batch_shape.

fn

entropy

Tensor
entropy()
source

Shannon entropy of the Categorical distribution.

H[X]=k=0K1pklogpkH[X] = -\sum_{k=0}^{K-1} p_k \log p_k

Returns

Tensor

Entropy values of shape batch_shape (nats).