Demystifying Efficient Self-Attention

A practical overview of efficient attention mechanisms that tackle the quadratic scaling problem.

November 7, 2022 · Thomas van Dongen

Overcoming Input Length Constraints of Transformers

Using extractive summarization to train Transformers on long documents efficiently.

December 14, 2021 · Thomas van Dongen