Demystifying Efficient Self-Attention
A practical overview of efficient attention mechanisms that tackle the quadratic scaling problem.
A practical overview of efficient attention mechanisms that tackle the quadratic scaling problem.
Using extractive summarization to train Transformers on long documents efficiently.