Module 11: Visualizing high dimensional data¶
In [1]:
Copied!
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
sns.set_style('white')
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
sns.set_style('white')
Scatterplot matrix for low-high dimensional data¶
In many cases, the number of dimensions is not too large. For instance, the "Iris" dataset contains four dimensions of measurements on the three types of iris flower species. It's more than two dimensions, yet still manageable.
In [2]:
Copied!
iris = sns.load_dataset('iris')
iris.head(2)
iris = sns.load_dataset('iris')
iris.head(2)
Out[2]:
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
We get four dimensions (sepal_length, sepal_width, petal_length, petal_width). One direct way to visualize them is to have a scatter plot for each pair of dimensions. We can use the pairplot()
function in seaborn to do this.
In [3]:
Copied!
sns.pairplot(iris)
sns.pairplot(iris)
Out[3]:
<seaborn.axisgrid.PairGrid at 0x15811d250>
By using colors, you can get a much more useful plot.
In [4]:
Copied!
sns.pairplot(iris, hue='species')
sns.pairplot(iris, hue='species')
Out[4]:
<seaborn.axisgrid.PairGrid at 0x16a10c890>