BERTConfig¶
- class lucid.models.BERTConfig(vocab_size: int, hidden_size: int, num_attention_heads: int, num_hidden_layers: int, intermediate_size: int, hidden_act: Callable[[lucid._tensor.tensor.Tensor], lucid._tensor.tensor.Tensor] | str, hidden_dropout_prob: float, attention_probs_dropout_prob: float, max_position_embeddings: int, tie_word_embedding: bool, type_vocab_size: int, initializer_range: float, layer_norm_eps: float, use_cache: bool, is_decoder: bool, add_cross_attention: bool, chunk_size_feed_forward: int, pad_token_id: int = 0, bos_token_id: int | None = None, eos_token_id: int | None = None, classifier_dropout: float | None = None, add_pooling_layer: bool = True)¶
The BERTConfig dataclass stores model hyperparameters used to build BERT backbones and task-specific wrappers.
Class Signature¶
@dataclass
class BERTConfig:
vocab_size: int
hidden_size: int
num_attention_heads: int
num_hidden_layers: int
intermediate_size: int
hidden_act: Callable[[Tensor], Tensor] | str
hidden_dropout_prob: float
attention_probs_dropout_prob: float
max_position_embeddings: int
tie_word_embedding: bool
type_vocab_size: int
initializer_range: float
layer_norm_eps: float
use_cache: bool
is_decoder: bool
add_cross_attention: bool
chunk_size_feed_forward: int
pad_token_id: int = 0
bos_token_id: int | None = None
eos_token_id: int | None = None
classifier_dropout: float | None = None
add_pooling_layer: bool = True
Parameters¶
vocab_size (int): Vocabulary size.
hidden_size (int): Hidden dimension of token states.
num_attention_heads (int): Number of attention heads.
num_hidden_layers (int): Number of Transformer blocks.
intermediate_size (int): Feed-forward inner dimension.
hidden_act (Callable): Activation used in feed-forward layers.
hidden_dropout_prob (float): Dropout probability in hidden layers.
attention_probs_dropout_prob (float): Dropout for attention probabilities.
max_position_embeddings (int): Maximum supported sequence length.
tie_word_embedding (bool): Whether to tie input/output token embeddings.
type_vocab_size (int): Token-type embedding size.
initializer_range (float): Std for weight initialization.
layer_norm_eps (float): Epsilon for layer normalization.
use_cache (bool): Whether caching is enabled for decoder usage.
is_decoder (bool): Whether to run as decoder.
add_cross_attention (bool): Whether to enable cross-attention blocks.
chunk_size_feed_forward (int): Chunk size for feed-forward computation.
pad_token_id (int, optional): Padding token id. Default is 0.
bos_token_id (int | None, optional): Beginning-of-sequence token id.
eos_token_id (int | None, optional): End-of-sequence token id.
classifier_dropout (float | None, optional): Dropout for classification heads.
add_pooling_layer (bool, optional): Whether to add BERT pooler. Default is True.
Preset Constructors¶
BERTConfig provides class methods for common presets:
BERTConfig.base(…): Returns a BERT-Base style config (hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072).
BERTConfig.large(…): Returns a BERT-Large style config (hidden_size=1024, num_hidden_layers=24, num_attention_heads=16, intermediate_size=4096).
Both methods support overrides via keyword arguments.
Basic Usage¶
from lucid.models import BERTConfig
base_cfg = BERTConfig.base()
large_cfg = BERTConfig.large()
custom_base = BERTConfig.base(vocab_size=32000, use_cache=True)