Cosine similarity
2022-05-17 → 2026-03-04
A common Similarity measure, especially for Continuous embedding vectors. $$\cos(\vec u , \vec v) = \frac{\vec u \cdot \vec v}{\lVert \vec u \rVert \lVert \vec v \rVert}$$
Ranges from $-1$ (opposite) to $1$ (identical direction). It ignores magnitude and only compares direction, which is the main reason it’s preferred over Euclidean distance in high-dimensional spaces where distances concentrate.
Problems#
- Not a true metric (violates triangle inequality).
- Unreliable for high-frequency words in embedding spaces — their representations cluster in a narrow cone, making cosine distances near-meaningless. Zhou2022cosine
- The cosine function’s derivative is 0 at 0, so it has poor resolution near 1 — often the most important region of interest. See Michael Trosset’s note (Sec. 1.3 and 2.7).
Cosine distance#
Cosine similarity is often converted into a “distance”: $$d(\vec u, \vec v) = 1 - \cos(\vec u, \vec v)$$ mapping $[-1, 1] \to [0, 2]$. But this inherits the resolution problem above.
Angular distance#
An alternative that avoids the resolution issue: $$d_\text{ang}(\vec u, \vec v) = \arccos\left(\frac{\vec u \cdot \vec v}{\lVert \vec u \rVert \lVert \vec v \rVert}\right)$$
Google’s Universal Sentence Encoder normalizes this into a similarity: $$\text{sim}(\vec u, \vec v) = 1 - \frac{1}{\pi}\arccos\left(\frac{\vec u \cdot \vec v}{\lVert \vec u \rVert \lVert \vec v \rVert}\right)$$