Sequence-to-Sequence¶
Transformer¶
Transformer Sequence-to-Sequence
The Transformer is a deep learning architecture introduced by Vaswani et al. in 2017, designed for handling sequential data with self-attention mechanisms. It replaces traditional recurrent layers with attention-based mechanisms, enabling highly parallelized training and capturing long-range dependencies effectively.
Vaswani, Ashish, et al. “Attention Is All You Need.” Advances in Neural Information Processing Systems, 2017, pp. 5998-6008.
Name |
Model |
Input Shape |
Parameter Count |
FLOPs |
---|---|---|---|---|
Transformer-Base |
\((N, L_{src})\), \((N, L_{tgt})\) |
62,584,544 |
\(O(N \cdot d_{m} \cdot L_{src} \cdot L_{tgt})\) |
|
Transformer-Big |
\((N, L_{src})\), \((N, L_{tgt})\) |
213,237,472 |
\(O(N \cdot d_{m} \cdot L_{src} \cdot L_{tgt})\) |
To be implemented…🔮