Shall we take a random walk
into latent space?

On bumping into the same fundamental ideas, again and again

Yong-Yeol (YY) Ahn
School of Data Science, University of Virginia

Imagine one of your coauthors who's here at NetSci...

Do they have more coauthors and more citations than you?

Friendship paradox

Your friends have more friends than you.

Feld, American Journal of Sociology (1991)

Your friends are more cited, more prolific, more influential than you.

Eom & Jo, Scientific Reports (2014): generalized friendship paradox

Your friends have more friends than you.

Feld, American Journal of Sociology (1991)

Your friends are more cited, more prolific, more influential than you.

Eom & Jo, Scientific Reports (2014): generalized friendship paradox

And a random walk is basically repeated sampling through edges.

A network with a degree distribution  pkp_k.

Sampling through edges samples from:

kpk\sim k\,p_k

Degree bias from edge sampling

Heavy-tailed degree distributions that amplify it

a surprising amount of network science

This is NetSci 101!

The friendship paradox, random walks, and degree bias.

And yet, we still keep bumping into it.

So here are some of my random walks.

🐾 Stroll #1

A virus is a random walk that branches.

Epidemic spreading is sampling through edges

Each infection follows an edge, so it lands on a node in proportion to its degree.

Hubs light up first

Average degree of infected nodes is highest early in the outbreak

Early on, we're sampling from the edge-sampled degree distribution.

→ Hubs tend to be infected first.

Barthélemy et al., Phys. Rev. Lett. 92, 178701 (2004)

Hubs then hit everyone

Hubs are the easiest node to reach, and the one that infects the most.

So an epidemic is very hard to stop.

A simple, intuitive approximation

Mark each edge with probability TT: the chance it transmits.

And this becomes a bond-percolation problem.

One infection makes how many more?

The outbreak grows only if each new case makes, on average, more than one further case.

That's R0R_0.

Excess degree: how many possible further infections can a node make?

k ways out for a node with degree k+1

Edge-based sampling + "Remaining edges"

→ the excess-degree distribution

qk=(k+1)pk+1kkpk=(k+1)pk+1zq_k=\frac{(k+1)\,p_{k+1}}{\sum_k k\,p_k}=\frac{(k+1)\,p_{k+1}}{z}

Each remaining edge transmits with TT

R0  =  Tkkqk  =  Tk2kkR_0 \;=\; T\sum_k k\,q_k \;=\; T\,\frac{\langle k^2\rangle - \langle k\rangle}{\langle k\rangle}

To have R0>1R_0>1, TT must clear the threshold

Tc  =  kk2kT_c \;=\; \frac{\langle k\rangle}{\langle k^2\rangle - \langle k\rangle}

Heavy tails blow up k2\langle k^2\rangle, so Tc0T_c\to 0 in a scale-free network.

Newman, Phys. Rev. Lett. 95, 108701 (2005)  ·  Pastor-Satorras & Vespignani, Phys. Rev. Lett. 86, 3200 (2001)

We all know these classic results.

But how about contact tracing?

Korea and New Zealand kept deaths low with no real lockdown.

COVID-19 deaths per million

How could they do this "dance"?

Contact tracing

Digital tracking

Contact tracing is often described as

"going one step ahead"

Going one step ahead of the outbreak Going one step ahead of the outbreak

Does tracing just ride the friendship paradox?

Forward, "to-whom" tracing samples edges.

a random case is already: kpk\sim k\,p_k

forward trace through an edge: kpk\sim k\,p_k again

One event, half of all cases, revealed through investigation of the "source"

The Shincheonji cluster grew to 5,080 cases Timeline: Shincheonji cluster vs total confirmed cases in South Korea

The Shincheonji cluster led to 5,080 cases, more than half of South Korea's total after a month.

Can tracing backward to the source go beyond the friendship paradox?

"From whom" backward contact tracing is the friendship paradox, squared!

A random case → kpk\sim k\,p_k

A traced offspring ("to whom") → kpk\sim k\,p_k

The traced parent ("from whom") → k2pk\sim \textcolor{#EF4444}{k^2 p_k}

Three degree distributions

CCDF of degree: backward-sampled 'from whom' parents have a much heavier tail than the forward-traced 'to whom' infected

Kojaku, Hébert-Dufresne, Mones, Lehmann & Ahn, Nat. Phys. 17, 652 (2021)

First Stroll, recap

  • Spreading is a branching random walk: sampling through edges reaches nodes kpk\sim k\,p_k.
  • Contact tracing inherits this forward (kpk\sim k\,p_k); tracing backward ("from whom") adds an interesting twist k2pk\sim k^2 p_k.

🐾 Stroll #2

A walk to find missing links.

Idea: If you understand a network's organizing principles, you should be able to tell its real edges from random non-edges.

Community detection

Graph embedding

Graph neural networks

Wait, did you just say we are sampling positive edges?

And how about the negative edges?

Wait, did you just say we are sampling positive edges?

And how about the negative edges?

Nodes in the positive set: sampled from kpk\sim k\,p_k

Nodes in the negative set: sampled from pk\sim p_k

The positive and negative sets are sampled from different distributions!

Nodes in the positive set: sampled from kpk\sim k\,p_k

Nodes in the negative set: sampled from pk\sim p_k

The positive and negative sets are sampled from different distributions!

Can we just use degree?

sij    kikjs_{ij} \;\propto\; k_i \, k_j

... and this shortcut works

AUC-ROC of each method on each network versus degree heterogeneity; the degree-only PA score climbs toward the top as networks grow more heterogeneous

Each circle = AUC-ROC of a method on a network.

The degree-only (PA, orange) can outperform other fancy methods.

Aiyappa, Wang, Kim, Seckin, Ahn & Kojaku, "Implicit Degree Bias in the Link Prediction Task," ICML 2025

Degree is all you need

AUC-ROC versus degree heterogeneity sigma: empirical PA scores (orange dots) track the expected Phi(sigma) curve (dashed line)

(In log-normal), larger heterogeneity σ\sigma → easier to predict with degree alone.

AUCPA=Φ(σ)\mathrm{AUC}_{\text{PA}} = \Phi(\sigma)

Clever Hans

sij    kikjs_{ij} \;\propto\; k_i \, k_j

Clever Hans for link prediction.

How do we fix this?

A simple solution: a degree-corrected benchmark

Degree-corrected benchmark: positives are sampled uniformly from the edge set; negatives are drawn from a node list where a degree-k node appears k times, so the negative set follows the same degree-biased distribution as the positives

Aiyappa, Wang, Kim, Seckin, Ahn & Kojaku, "Implicit Degree Bias in the Link Prediction Task," ICML 2025

Degree is all you need, until you fix the test

AUC-ROC versus degree heterogeneity: degree-only matches SOTA on the standard benchmark, collapses to chance on the degree-corrected one

Aiyappa, Wang, Kim, Seckin, Ahn & Kojaku, "Implicit Degree Bias in the Link Prediction Task," ICML 2025

Second Stroll, recap

  • The link-prediction benchmark samples edges differently: positives come kpk\sim k\,p_k, negatives pk\sim p_k, so degree alone (sijkikjs_{ij}\propto k_i k_j) can top the leaderboard.
  • We can address this by sampling from the same distribution: draw the negatives the same degree-biased way.
  • A blind spot can exist even in such a universally-used benchmark!

🐾 Stroll #3

A walk into latent space.

What is an embedding?

An autoencoder: an encoder compresses a cat image into a short embedding vector [3.28, 1.64, 7.81, 5.72, …], and a decoder reads it back to 'Cat!'

Squeeze complex data through a bottleneck while keeping it useful.

That vector is embedding.

Embedding often has one job: predict its context

Compress a word or a node into a vector that can predict its context:

  • Autoencoder: reconstruct itself
  • word2vec: predict nearby words
  • LLMs: predict missing/next tokens

To predict the context well, the vector has to pack a lot of meaning.

Word2vec: geometry can encode semantics

Word embeddings: king→queen parallels man→woman; verb tenses line up; each country points to its capital

Mikolov et al., "Distributed Representations of Words and Phrases," NeurIPS 2013

word2vec: two vectors + skip-gram

i j query key

Slide a window along the text; from a word, predict its context.

Two vectors: a query ui\mathbf u_i and a key vj\mathbf v_j.

vjui\mathbf v_j^{\top}\mathbf u_i estimates how likely jj sits in ii's context.

Negative sampling

Distinguish real neighbors from random samples.

logp(ji)    logσ(vjui)  +  k=1KEnPn[logσ(vnui)]\log p(j\mid i)\;\approx\;\log\sigma(\mathbf v_j^{\top}\mathbf u_i)\;+\;\sum_{k=1}^{K}\mathbb{E}_{\,n\sim P_n}\big[\log\sigma(-\,\mathbf v_n^{\top}\mathbf u_i)\big]

Can a network live in such a space?

DeepWalk & node2vec: a walk is a sentence

1 2 3 4 5 6 7

Nodes are words; the sequence is a sentence. Feed it to word2vec, and that is node2vec.

Perozzi, Al-Rfou & Skiena, "DeepWalk: Online Learning of Social Representations," KDD 2014 · Grover & Leskovec, "node2vec: Scalable Feature Learning for Networks," KDD 2016

Graph embedding

Did you just say we're sampling through random walks?

A random walk mixes up two signals

me

Proximity

me

Degree

Then, are graph embedding methods affected by the degree bias?

🐾 Stroll #4

A walk that finds communities.

Community detection

Zachary's karate club network, nodes colored by the two real-world factions

You know the community detection problem...

Detectability limit: where does community structure become invisible?

Performance Mixing (noise) → Algorithmic limit Detectability limit Perfect mixing

Under the block model paradigm, there is a detectability limit; beyond it, we can't distinguish the community signal from noise.

Graph embedding methods ~ spectral embedding

Rijn2v=log1Tτ=1TP(x(t+τ)=jx(t)=i)P(x(t)=j)R^{\text{n2v}}_{ij}=\log\frac{1}{T}\sum_{\tau=1}^{T}\frac{P(x^{(t+\tau)}=j\mid x^{(t)}=i)}{P(x^{(t)}=j)}

μn2v=μ=11k\mu^{*}_{\text{n2v}}=\mu^{*}=1-\frac{1}{\sqrt{\langle k\rangle}}

The detectability limit for random walk-based graph embedding is the same as the detectability threshold μ\mu^{*}!

Kojaku, Radicchi, Ahn & Fortunato, "Network community detection via neural embeddings," Nat. Commun. 15 (2024)

Node2vec is near optimal

Element-centric similarity vs mixing parameter across benchmarks: node2vec tracks the detectability limit

Kojaku, Radicchi, Ahn & Fortunato, "Network community detection via neural embeddings," Nat. Commun. 15 (2024)

At the same time, DeepWalk often fails in practice

The same benchmark grid with DeepWalk's flat-at-zero curves circled in panels A and D: DeepWalk collapses while node2vec (red) tracks the detectability limit

Same walks, same theoretical limit. Why doesn't DeepWalk work?

Kojaku, Radicchi, Ahn & Fortunato, "Network community detection via neural embeddings," Nat. Commun. 15 (2024)

However, it turns out that negative sampling accidentally removes the degree bias.

P(ji)=1ZP0(j)degree baseline  exp ⁣(vjui)residualP(j\mid i)=\frac{1}{Z}\,\underbrace{\textcolor{#EF4444}{P_0(j)}}_{\textcolor{#EF4444}{\text{degree baseline}}}\;\underbrace{\exp\!\big(\mathbf v_j^{\top}\mathbf u_i\big)}_{\text{residual}}

Kojaku, Yoon, Constantino & Ahn, "Residual2Vec: Debiasing graph embedding with random graphs," NeurIPS 2021  ·  Murray, Yoon, Kojaku, Costas, Jung, Milojević & Ahn, "Unsupervised embedding of trajectories captures the latent structure of scientific migration," PNAS 2023

Same objective, but the difference boils down to degree bias

DeepWalk node2vec
Estimator hierarchical softmax negative sampling
The estimate unbiased biased (cancels the degree bias)

Degree obscures other structure in DeepWalk

DeepWalk embedding colored by degree: high-degree nodes pulled to the core, low-degree to the rim, forming a degree gradient

All due to the unintended degree bias cancellation (or lack thereof)!

node2vec works better in many tasks because it accidentally removed degree bias from random walks.

Unintended bias cancelling degree bias

  • node2vec (negative sampling) inadvertently removes the degree bias, letting the embedding capture structure that's independent of degree more clearly (vs. DeepWalk).
  • The two were treated as almost interchangeable options. Subtle details can make a critical difference because degree dominates.

🐾 Stroll #5

Using bias to de-bias

biased data = biased baseline × unbiased residual

P(ji)=1ZP0(j)baseline  exp ⁣(vjui)residualP(j\mid i)=\frac{1}{Z}\,\underbrace{P_0(j)}_{\text{baseline}}\;\underbrace{\exp\!\big(\mathbf v_j^{\top}\mathbf u_i\big)}_{\text{residual}}

Can we exploit this?

Kojaku, Yoon, Constantino & Ahn, "Residual2Vec: Debiasing graph embedding with random graphs," NeurIPS 2021

What if we alter the null distribution?

We can plug in the bias we want gone to the baseline P0P_0.

residual2vec: a designed baseline

Pr2v(ji)=1ZiP0(ji)designed null model  exp ⁣(uivj)debiased residualP_{\text{r2v}}(j\mid i)=\frac{1}{Z'_i}\,\underbrace{P_0(j\mid i)}_{\text{designed null model}}\;\underbrace{\exp\!\big(\mathbf u_i^{\top}\mathbf v_j\big)}_{\text{debiased residual}}

Model the bias we want gone into P0P_0; the residual comes out blind to it.

Describe the bias as a null model

residual2vec pipeline: choose a null model such as a degree-corrected SBM to set the baseline P0, then factor it out of the random-walk co-occurrences to get the residual embedding

Choose a null model (say, a degree-preserving dcSBM); it fixes the baseline P0P_0. Factor it out of the walk's co-occurrences, and what's left is the debiased residual.

A concrete bias: recency in citations

Citations flow from recent journal-year nodes down to older ones, a strong temporal bias

Papers cite recent work far more than old work. Can we remove this temporal information from embedding?

→ Create a null model that preserves temporal structure

Make the embedding blind to time

Embeddings colored by year: GloVe and node2vec show a strong year gradient; residual2vec (r2v-dcSBM) washes the year out

Color by year: GloVe and node2vec line up by date. residual2vec, given a time-encoding baseline, washes the year out.

So it can see the disciplinary structure more clearly

The same embeddings colored by subject category: residual2vec separates disciplines far more cleanly than GloVe, node2vec, or DeepWalk Prediction based on embedding: r2v-dcSBM gives the best prediction of journal impact factor (R-squared) and subject category (F1) among the embedding methods

Color by field (left): the debiased space separates disciplines, and it predicts impact factor and subject category best (right).

Kojaku, Yoon, Constantino & Ahn, NeurIPS 2021

residual2vec allows us to remove bias, once we can concretely model it.

🐾 Stroll #6

Mobility as a walk, pulled by gravity

How far apart are two places?

Closeness a map can't show

World map with an arc from Korea to the United States marked with a question mark

Social and cultural proximity counts.

What if we embed how scientists move?

  • Nodes: institutions
  • Careers: empirical walks

A career is a sentence

A trajectory of institutions over time t0 to t4, fed into word2vec
Embedding of institutions colored by continent; countries form clusters laid out like a world map

Murray, Yoon, Kojaku, Costas, Jung, Milojević & Ahn, "Unsupervised embedding of trajectories captures the latent structure of scientific migration," PNAS 2023

It encodes culture and language

Portuguese-speaking Brazil sits beside Portugal; Spanish-speaking Latin America clusters with Spain

It even encodes prestige

An embedding axis aligns with institutional prestige; Spearman rho = 0.78 against an independent ranking

One axis lines up with institutional prestige: Spearman ρ=0.78\rho = 0.78 against an independent ranking.

The gravity law of mobility

m₁ m₂ r

Tij=Cmimjf(rij)T_{ij} = C\,m_i\,m_j\,f(r_{ij})

Flux between ii and jj grows with their masses (population) and decays with distance rijr_{ij}.

"You are less likely to go somewhere far away than somewhere close."

The gravity law of mobility

m₁ m₂ r

Tij=Cmimjf(rij)T_{ij} = C\,m_i\,m_j\,f(r_{ij})

Flux between ii and jj grows with their masses (population) and decays with distance rijr_{ij}.

"You are less likely to go somewhere far away than somewhere close."

The gravity law of mobility

m₁ m₂ r

Tij=Cmimjf(rij)T_{ij} = C\,m_i\,m_j\,f(r_{ij})

Flux between ii and jj grows with their masses (population) and decays with distance rijr_{ij}.

"You are less likely to go somewhere far away than somewhere close."

The embedding fits it much better than physical distance

Gravity-law fits: embedding proximity (R2 = 0.48) predicts flux far better than geographic distance (R2 = 0.22), across global, domestic, and international moves

It even beats fitting the gravity law directly

Flux explained and RMSE by method: word2vec cosine distance outperforms MDS, spectral, PPR, and gravity-law-fit distances

How? Why?

Equivalence to the gravity law of mobility

Tijflux=1ZP0(i)P0(j)mass×mass  exp ⁣(vjui)closeness\underbrace{T_{ij}}_{\text{flux}}=\frac{1}{Z}\,\underbrace{P_0(i)\,P_0(j)}_{\text{mass}\times\text{mass}}\;\underbrace{\exp\!\big(\mathbf v_j^{\top}\mathbf u_i\big)}_{\text{closeness}}

The degree bias in negative sampling becomes the mass term, making it equivalent to the gravity law of mobility!

Distance here has an interpretable meaning through the gravity law!

The institution embedding, reread as a map of functional distance

Understanding embedding models' degree bias

Interpretable embedding space as instruments

An interpretable space of knowledge, career, beliefs, ...

Embedding space of beliefs

... where distance predicts adoption

Lee, Aiyappa, Ahn, Kwak & An, Nature Human Behaviour (2025)

Visible and Invisible barriers in cities

Weng, Kim, Ahn & Moro, "Beyond Distance: Mobility Neural Embeddings Reveal Visible and Invisible Barriers in Urban Space," (2025)

Embedding space as a measurement platform

County map, Boston embedding, and the cosine-vs-geographic scatter: the embedding as a ruler for barriers

Comparing effective distance vs. physical distance reveals barriers.

Estimating importance of each barrier

Stacked bars of barrier importance: POI distances dominate, but admin boundary and demographic distances act as invisible barriers, 2019-2021

logitP ⁣(ij)dPOI+dPhy+dDemo+dCounty\operatorname{logit} P\!\left(i \leftrightarrow j\right)\sim d_{\text{POI}}+d_{\text{Phy}}+d_{\text{Demo}}+d_{\text{County}}

Disruption: when future and past split

... and reveal simultaneous breakthroughs

Kim, Kojaku & Ahn, "Uncovering simultaneous breakthroughs with a robust measure of disruptiveness," Science Advances 12(14) (2026)

Starling murmuration: Karen Roe / CC BY 2.0

It all started from

"Hey, how does degree bias work here?"

We keep learning the most basic lessons again and again

That's because they are what makes networks networks!

Thank you!

Backward contact tracing · Nature Physics 2021

Sadamori Kojaku
Laurent Hébert-Dufresne
Enys Mones
Sune Lehmann

Implicit degree bias in link prediction · ICML 2025

Rachith Aiyappa
Xin Wang
Munjung Kim
Ozgur Can Seckin
Sadamori Kojaku

Embedding scientific migration · PNAS 2023

Dakota Murray
Jisung Yoon
Sadamori Kojaku
Rodrigo Costas
Woosung Jung
Staša Milojević

Mobility & invisible borders · preprint 2025

Guangyuan Weng
Minsuk Kim
Esteban Moro

Debiasing graph embedding · NeurIPS 2021

Sadamori Kojaku
Jisung Yoon
Isabel Constantino

Embedding community structure · Nature Communications 2024

Sadamori Kojaku
Filippo Radicchi
Santo Fortunato

Belief embedding · Nature Human Behaviour 2025

Byunghwee Lee
Rachith Aiyappa
Haewoon Kwak
Jisun An

Robust disruptiveness · Science Advances 2026

Munjung Kim
Sadamori Kojaku

Support

Air Force Office of Scientific Research Minerva Research Initiative National Science Foundation IU Luddy School University of Virginia UVA School of Data Science NVIDIA