class

Categorical

extendsDistribution

Categorical(probs: Tensor | None = None, logits: Tensor | None = None, validate_args: bool | None = None)

source edit

Categorical distribution — a discrete distribution over K labelled outcomes.

Categorical(probs=p) or Categorical(logits=l) defines a distribution over the integer set $\{0, 1, \ldots, K-1\}$ where $K$ is the number of categories. Exactly one of probs or logits must be given.

Parameters

probsTensor | None= None

Non-negative probability vector (or batch of vectors) of shape (..., K). Rows are automatically normalised to sum to 1. Mutually exclusive with logits.

logitsTensor | None= None

Unnormalised log-probabilities of shape (..., K). The distribution uses

\text{softmax}

internally to convert to normalised probabilities. Mutually exclusive with probs.

validate_argsbool | None= None

If True, validate parameter constraints at construction time.

Attributes

probsTensor

Normalised probability vector (shape (..., K); present when constructed with probs).

logitsTensor

Unnormalised log-probability vector (shape (..., K); present when constructed with logits).

Notes

PMF:

P(X = k) = p_k, \quad k \in \{0, 1, \ldots, K-1\}, \quad \sum_{k} p_k = 1

Parameterisations are related by:

p_k = \frac{e^{l_k}}{\sum_j e^{l_j}}, \qquad l_k = \log p_k \;(\text{up to additive constant})

Entropy:

H[X] = -\sum_{k=0}^{K-1} p_k \log p_k

Mean is not well-defined for a general Categorical (the labels have no canonical metric), so mean returns a NaN tensor of the batch shape.

Sampling uses the Gumbel-max trick: add i.i.d. $\operatorname{Gumbel}(0, 1)$ noise to the log-probabilities and take the argmax. This is equivalent to ancestral sampling and avoids cumulative-sum + binary-search.

The batch dimensions of the input correspond to independent distributions. For example, probs of shape (B, K) yields a batch of $B$ Categorical distributions.

Examples

>>> import lucid
>>> from lucid.distributions import Categorical
>>> # Uniform over 4 categories
>>> dist = Categorical(probs=lucid.tensor([0.25, 0.25, 0.25, 0.25]))
>>> samples = dist.sample((10,))
>>> # Batch of 2 distributions
>>> dist_b = Categorical(logits=lucid.zeros(2, 5))
>>> dist_b.batch_shape, dist_b.event_shape
((2,), ())

Used by 4

Constructors

dunder

init

→None

__init__(probs: Tensor | None = None, logits: Tensor | None = None, validate_args: bool | None = None)

source edit

Initialise a Categorical distribution.

Parameters

probsTensor | None= None

Non-negative probability vector of shape (..., K). Rows are automatically normalised to sum to 1. Mutually exclusive with logits.

logitsTensor | None= None

Unnormalised log-probabilities of shape (..., K). Converted to probabilities via softmax internally. Mutually exclusive with probs.

validate_argsbool | None= None

If True, validate parameter constraints at construction time.

Raises

ValueError

If both or neither of probs and logits are provided.

Properties

prop

mean

→Tensor

mean: Tensor

source edit

Mean of the Categorical distribution (undefined — returns NaN).

The Categorical distribution assigns labels with no inherent ordering or metric, so the mean is not well-defined. This property returns a NaN tensor of the batch shape to match expected behaviour.

Returns

Tensor

Tensor of float('nan') values with shape batch_shape.

prop

support

→Constraint

support: Constraint

source edit

Support of the distribution: integer interval $\{0, \ldots, K-1\}$ .

Returns

Constraint

An integer_interval constraint from 0 to K - 1.

Instance methods

entropy

→Tensor

entropy()

source edit

Shannon entropy of the Categorical distribution.

H[X] = -\sum_{k=0}^{K-1} p_k \log p_k

Returns

Tensor

Entropy values of shape batch_shape (nats).

log_prob

→Tensor

log_prob(value: Tensor)

source edit

Log-probability of the given category indices.

For a one-hot index $k$ , the log-probability is:

\log P(X = k) = \log p_k

Parameters

valueTensor

Integer tensor of category indices with shape compatible with batch_shape. Values must be in

\{0, \ldots, K-1\}

Returns

Tensor

Log-probabilities of shape batch_shape.

sample

→Tensor

sample(sample_shape: tuple[int, ...] = ())

source edit

Draw samples from the Categorical distribution.

Uses the Gumbel-max trick: add i.i.d. $\operatorname{Gumbel}(0, 1)$ noise to the log-probabilities and take the argmax, which is equivalent to ancestral sampling but avoids cumulative-sum and binary search.

Parameters

sample_shapetuple[int, ...]= ()

Leading shape of the output sample batch.

Returns

Tensor

Integer tensor of shape (*sample_shape, *batch_shape) with values in $\{0, 1, \ldots, K-1\}$ . The result is detached (no gradients flow through discrete samples).

>>> import lucid >>> from lucid.distributions import Categorical >>> # Uniform over 4 categories >>> dist = Categorical(probs=lucid.tensor([0.25, 0.25, 0.25, 0.25])) >>> samples = dist.sample((10,)) >>> # Batch of 2 distributions >>> dist_b = Categorical(logits=lucid.zeros(2, 5)) >>> dist_b.batch_shape, dist_b.event_shape ((2,), ())