Sequence-to-Sequence

Transformer

Transformer Sequence-to-Sequence

The Transformer is a deep learning architecture introduced by Vaswani et al. in 2017, designed for handling sequential data with self-attention mechanisms. It replaces traditional recurrent layers with attention-based mechanisms, enabling highly parallelized training and capturing long-range dependencies effectively.

Vaswani, Ashish, et al. “Attention Is All You Need.” Advances in Neural Information Processing Systems, 2017, pp. 5998-6008.

Name

Model

Input Shape

Parameter Count

FLOPs

Transformer-Base

transformer_base

\((N, L_{src})\), \((N, L_{tgt})\)

62,584,544

\(O(N \cdot d_{m} \cdot L_{src} \cdot L_{tgt})\)

Transformer-Big

transformer_big

\((N, L_{src})\), \((N, L_{tgt})\)

213,237,472

\(O(N \cdot d_{m} \cdot L_{src} \cdot L_{tgt})\)

To be implemented…🔮