Technical writeups and notes.
Model2Vec: Distill a Small Fast Model from any Sentence Transformer
Distill small, fast static models from any Sentence Transformer without needing a dataset.
Technical writeups and notes.
Distill small, fast static models from any Sentence Transformer without needing a dataset.
A practical overview of efficient attention mechanisms that tackle the quadratic scaling problem.
Using extractive summarization to train Transformers on long documents efficiently.