Projects

Project	Description
Semble	Code search for agents that uses ~98% fewer tokens than grep+read.
Model2Vec	Distil any sentence transformer into a tiny, fast static embedding model.
SemHash	Semantic deduplication and dataset filtering across text, images, and audio.
Potion	Tiny state-of-the-art static embedding models for English, multilingual, and retrieval tasks.
Vicinity	Fast, lightweight nearest neighbor search with pluggable backends.
Pyversity	Diversify search and retrieval results to reduce redundancy and improve coverage.
Agentcheck	Scans your shell and reports what an AI agent could access, by severity.
Tokenlearn	Pre-train static embedding models for distillation pipelines.
Model2Vec-rs	A Rust port of Model2Vec.