RoFormerConfig¶
- class lucid.models.RoFormerConfig(vocab_size: int, hidden_size: int, num_attention_heads: int, num_hidden_layers: int, intermediate_size: int, hidden_act: Callable[[lucid._tensor.tensor.Tensor], lucid._tensor.tensor.Tensor] | str, hidden_dropout_prob: float, attention_probs_dropout_prob: float, max_position_embeddings: int, tie_word_embedding: bool, type_vocab_size: int, initializer_range: float, layer_norm_eps: float, use_cache: bool, is_decoder: bool, add_cross_attention: bool, chunk_size_feed_forward: int, pad_token_id: int = 0, bos_token_id: int | None = None, eos_token_id: int | None = None, classifier_dropout: float | None = None, add_pooling_layer: bool = True, rotary_value: bool = False, rope_interleaved: bool = True)¶
The RoFormerConfig dataclass extends BERTConfig with rotary embedding controls used by RoFormer self-attention.
Class Signature¶
@dataclass
class RoFormerConfig(BERTConfig):
rotary_value: bool = False
rope_interleaved: bool = True
Additional Parameters¶
rotary_value (bool, optional): Whether RoPE is applied to value projections in addition to query/key. Default is False.
rope_interleaved (bool, optional): Whether to use interleaved rotary dimensions. Default is True.
Inherited Parameters¶
RoFormerConfig inherits all BERTConfig fields, including vocabulary size, hidden dimensions, layer count, dropout settings, and decoder/cache options.
Preset Constructors¶
RoFormerConfig also supports inherited presets:
RoFormerConfig.base(…): BERT-Base-like defaults with RoFormer fields.
RoFormerConfig.large(…): BERT-Large-like defaults with RoFormer fields.
Both methods support overrides via keyword arguments.
Basic Usage¶
from lucid.models import RoFormerConfig
cfg = RoFormerConfig.base(
vocab_size=50000,
rotary_value=False,
rope_interleaved=True,
)