GPTLMHeadModel¶

class lucid.models.GPTLMHeadModel(config: GPTConfig)¶

The GPTLMHeadModel class applies a linear language modeling head to the GPT backbone for causal (autoregressive) next-token prediction. The output projection weight is tied to the input token embedding.

Class Signature¶

class GPTLMHeadModel(config: GPTConfig)

Parameters¶

config (GPTConfig): GPT configuration object.

Methods¶

GPTLMHeadModel.forward(input_ids: Tensor, attention_mask: Tensor | None = None, position_ids: Tensor | None = None, past_key_values: list[KVCache] | None = None, labels: Tensor | None = None, use_cache: bool = False) → tuple[Tensor | None, Tensor, list[KVCache] | None]

Compute per-token logits over the vocabulary. When labels are provided, also returns the shifted cross-entropy loss for next-token prediction.

GPTLMHeadModel.tie_weights() → None

Tie the lm_head projection weight to the input token embedding weight.

GPTLMHeadModel.get_input_embeddings() → Embedding

Return the token embedding layer.

GPTLMHeadModel.get_output_embeddings() → Linear | None

Return the lm_head linear projection.

Examples¶

>>> import lucid.models as models
>>> config = models.GPTConfig.base()
>>> model = models.GPTLMHeadModel(config)
>>> print(model)
GPTLMHeadModel(...)

>>> import lucid
>>> input_ids = lucid.randint(0, config.vocab_size, (2, 32))
>>> loss, logits, _ = model(input_ids, labels=input_ids)
>>> logits.shape
(2, 32, 40478)

>>> # Greedy autoregressive generation
>>> generated = input_ids
>>> for _ in range(20):
...     _, logits, _ = model(generated)
...     next_token = logits[:, -1, :].argmax(axis=-1, keepdims=True)
...     generated = lucid.cat([generated, next_token], axis=-1)