1 Histogram
1.0 Dataset
import seaborn as sns penguins = sns.load_dataset("penguins") penguins
1.1 Univariate Histogram
1.1.1 displot
sns.displot(penguins, x="flipper_length_mm")
1.1.2 histplot
sns.histplot(penguins, x="flipper_length_mm")
1.1.3 binwidth Set Histogram Interval Width
sns.displot(penguins, x="flipper_length_mm", binwidth=1)
sns.histplot(penguins, x="flipper_length_mm", binwidth=10)
1.1.4 Cases with Fewer Data Categories
tips = sns.load_dataset("tips") tips
There are only a few categories, so the histogram doesn't fill the whole picture
sns.displot(tips, x="size")
There are several solutions:
1.1.4.1 Manual set of bin s
sns.displot(tips, x="size",bins=[1,2,3,4,5,6,7])
1.1.4.2 Setting discrete
sns.displot(tips, x="size",discrete=True)
1.1.4.3 Set shrink
sns.displot(tips, x="size",shrink=2)
1.5 Histogram+hue
sns.displot(penguins, x="flipper_length_mm", hue="species")
1.5.1 element='step'
By default, different histograms "layer" each other, and in some cases, they may be difficult to distinguish.
One option is to change the visual representation of the histogram from a bar chart to a ladder chart:
sns.displot(penguins, x="flipper_length_mm", hue="species", element='step')
1.5.2 multiple="stack"(vertical overlay)
Do not overlap, change to overlay
sns.displot(penguins, x="flipper_length_mm", hue="species", multiple='stack')
1.5.3 multiple="dodge"(horizontal side by side)
sns.displot(penguins, x="flipper_length_mm", hue="species", multiple='dodge')
1.6 Histogram+col
Multiple Charts
sns.displot(penguins, x="flipper_length_mm", col="species")
2. Kernel Density Estimation
- Histograms are designed to approximate the potential probability density function of the generated data through box and count observations. Kernel Density Estimation (KDE) provides different solutions to the same problem.
- Instead of using discrete boxes, KDE graphs use Gaussian kernel smoothing to produce continuous density estimates:
sns.displot(penguins, x="flipper_length_mm", kind="kde")
2.1 Select Smooth Bandwidth
- Much like the bin size in the histogram, the ability of KDE to accurately represent data depends on the choice of smooth bandwidth.
- Over-smoothed estimates may delete meaningful features, but under-smoothed estimates can obscure the true shape in random noise.
- The easiest way to check the robustness of estimates is to adjust the default bandwidth
sns.displot(penguins, x="flipper_length_mm",kind="kde", bw_adjust=.05)
sns.displot(penguins, x="flipper_length_mm",kind="kde", bw_adjust=.85)
2.2 hue
sns.displot(penguins, x="flipper_length_mm",kind="kde", hue="species")
2.3 multiple
sns.displot(penguins, x="flipper_length_mm",kind="kde",hue="species", multiple='stack')
3 Cumulative distribution function
sns.displot(penguins, x="flipper_length_mm", kind="ecdf")
3.1 hue
sns.displot(penguins, x="flipper_length_mm",kind="ecdf", hue='species')
4 Bivariate distribution
4.1 Bivariate Histogram
sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm")
4.1.1 Color-Numeric Map
sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", cbar=True)
4.1.2 rug
Display individual observations within the graph
sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", rug=True)
4.2 Bivariate Contours
sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", kind='kde')
4.2.1 thresh - lowest value
sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",kind='kde', thresh=0.5)
4.2.2 levels Contour/Manual Layer Correspondence Value
sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",kind='kde', thresh=0.5, levels=3)
sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",kind='kde', thresh=0.5, levels=[0.01,0.1,1])
4.2.3 rug
sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",kind='kde', rug=True)
4.3 hue
sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", hue='species')
sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", kind='kde',hue='species')
5 Joint Drawing
By default, it is a 2D scatterplot + respective histogram
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")
5.1 KDE
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", kind='kde')
5.2 Suitable for your needs
g = sns.JointGrid(data=penguins, x="bill_length_mm", y="bill_depth_mm") g.plot_joint(sns.histplot) g.plot_marginals(sns.boxplot)
Plot_ Join - Intermediate Diagram
plot_marginals - Diagrams on both sides
5.3 pairplot
Let's review this dataset again
import seaborn as sns penguins = sns.load_dataset("penguins") penguins
There are four columns with data, and after we use pairplot, the diagonal part is a histogram of the values. Non-diagonal parts are scatterplots
sns.pairplot(penguins)
5.3.1 Suitable for your needs
g = sns.PairGrid(penguins) g.map_upper(sns.histplot) g.map_lower(sns.kdeplot) g.map_diag(sns.histplot, kde=True)
Reference Content Visualizing distributions of data — seaborn 0.11.2 documentation (pydata.org)