Blog | thomasvd.dev

Technical writeups and notes.

Model2Vec: Distill a Small Fast Model from any Sentence Transformer

Distill small, fast static models from any Sentence Transformer without needing a dataset.

A practical overview of efficient attention mechanisms that tackle the quadratic scaling problem.

Using extractive summarization to train Transformers on long documents efficiently.