Cosine similarity

2022-05-17 → 2026-03-04

A common Similarity measure, especially for Continuous embedding vectors. $$\cos(\vec u , \vec v) = \frac{\vec u \cdot \vec v}{\lVert \vec u \rVert \lVert \vec v \rVert}$$

Ranges from $-1$ (opposite) to $1$ (identical direction). It ignores magnitude and only compares direction, which is the main reason it’s preferred over Euclidean distance in high-dimensional spaces where distances concentrate.

Problems#

Cosine distance#

Cosine similarity is often converted into a “distance”: $$d(\vec u, \vec v) = 1 - \cos(\vec u, \vec v)$$ mapping $[-1, 1] \to [0, 2]$. But this inherits the resolution problem above.

Angular distance#

An alternative that avoids the resolution issue: $$d_\text{ang}(\vec u, \vec v) = \arccos\left(\frac{\vec u \cdot \vec v}{\lVert \vec u \rVert \lVert \vec v \rVert}\right)$$

Google’s Universal Sentence Encoder normalizes this into a similarity: $$\text{sim}(\vec u, \vec v) = 1 - \frac{1}{\pi}\arccos\left(\frac{\vec u \cdot \vec v}{\lVert \vec u \rVert \lVert \vec v \rVert}\right)$$

Receive my updates

YY's Random Walks — Science, academia, and occasional rabbit holes.

YY's Bike Shed — Sustainable mobility, urbanism, and the details that matter.

×