| Semble | Code search for agents that uses ~98% fewer tokens than grep+read. |
| Model2Vec | Distil any sentence transformer into a tiny, fast static embedding model. |
| SemHash | Semantic deduplication and dataset filtering across text, images, and audio. |
| Potion | Tiny state-of-the-art static embedding models for English, multilingual, and retrieval tasks. |
| Vicinity | Fast, lightweight nearest neighbor search with pluggable backends. |
| Pyversity | Diversify search and retrieval results to reduce redundancy and improve coverage. |
| Agentcheck | Scans your shell and reports what an AI agent could access, by severity. |
| Tokenlearn | Pre-train static embedding models for distillation pipelines. |
| Model2Vec-rs | A Rust port of Model2Vec. |