seaborn notes: plotting the distribution of points

1 Histogram

1.0 Dataset

import seaborn as sns
penguins = sns.load_dataset("penguins")
penguins

1.1 Univariate Histogram

1.1.1 displot

sns.displot(penguins, x="flipper_length_mm")

1.1.2  histplot

sns.histplot(penguins, x="flipper_length_mm")

1.1.3 binwidth Set Histogram Interval Width

sns.displot(penguins, x="flipper_length_mm",
           binwidth=1)

 

sns.histplot(penguins, x="flipper_length_mm",
            binwidth=10)

1.1.4 Cases with Fewer Data Categories

tips = sns.load_dataset("tips")
tips

There are only a few categories, so the histogram doesn't fill the whole picture

sns.displot(tips, x="size")

 

There are several solutions:

1.1.4.1 Manual set of bin s

sns.displot(tips, x="size",bins=[1,2,3,4,5,6,7])

 

1.1.4.2 Setting discrete

sns.displot(tips, x="size",discrete=True)

1.1.4.3 Set shrink

sns.displot(tips, x="size",shrink=2)

1.5 Histogram+hue

sns.displot(penguins, x="flipper_length_mm", 
            hue="species")

1.5.1 element='step'

By default, different histograms "layer" each other, and in some cases, they may be difficult to distinguish.

One option is to change the visual representation of the histogram from a bar chart to a ladder chart:

sns.displot(penguins, x="flipper_length_mm", hue="species",
           element='step')

1.5.2 multiple="stack"(vertical overlay)

Do not overlap, change to overlay

sns.displot(penguins, x="flipper_length_mm", hue="species",
           multiple='stack')

1.5.3 multiple="dodge"(horizontal side by side)

sns.displot(penguins, x="flipper_length_mm", hue="species",
           multiple='dodge')

1.6 Histogram+col

Multiple Charts

sns.displot(penguins, x="flipper_length_mm", 
            col="species")

 

2. Kernel Density Estimation

  • Histograms are designed to approximate the potential probability density function of the generated data through box and count observations. Kernel Density Estimation (KDE) provides different solutions to the same problem.
  • Instead of using discrete boxes, KDE graphs use Gaussian kernel smoothing to produce continuous density estimates:
sns.displot(penguins, x="flipper_length_mm",
            kind="kde")

2.1 Select Smooth Bandwidth

  • Much like the bin size in the histogram, the ability of KDE to accurately represent data depends on the choice of smooth bandwidth.
  • Over-smoothed estimates may delete meaningful features, but under-smoothed estimates can obscure the true shape in random noise.
  • The easiest way to check the robustness of estimates is to adjust the default bandwidth
sns.displot(penguins, x="flipper_length_mm",kind="kde",
           bw_adjust=.05)

sns.displot(penguins, x="flipper_length_mm",kind="kde",
           bw_adjust=.85)

 

2.2 hue

sns.displot(penguins, x="flipper_length_mm",kind="kde",
           hue="species")

 

 2.3 multiple

sns.displot(penguins, x="flipper_length_mm",kind="kde",hue="species",
            multiple='stack')

3 Cumulative distribution function

sns.displot(penguins, x="flipper_length_mm",
            kind="ecdf")

 3.1 hue

sns.displot(penguins, x="flipper_length_mm",kind="ecdf",
           hue='species')

4 Bivariate distribution

4.1 Bivariate Histogram

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm")

4.1.1 Color-Numeric Map

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",
            cbar=True)

 

 4.1.2 rug

Display individual observations within the graph

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",
            rug=True)

 

4.2 Bivariate Contours

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",
           kind='kde')

4.2.1 thresh - lowest value

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",kind='kde',
           thresh=0.5)

 

4.2.2 levels Contour/Manual Layer Correspondence Value

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",kind='kde',
           thresh=0.5, levels=3)

 

 

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",kind='kde',
           thresh=0.5, levels=[0.01,0.1,1])

 

 4.2.3 rug

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",kind='kde',
            rug=True)

  

 4.3 hue

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",
           hue='species')

 

 

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm",
           kind='kde',hue='species')

5 Joint Drawing

By default, it is a 2D scatterplot + respective histogram

sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")

 5.1 KDE

sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm",
             kind='kde')

5.2 Suitable for your needs

g = sns.JointGrid(data=penguins, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.histplot)
g.plot_marginals(sns.boxplot)

Plot_ Join - Intermediate Diagram

plot_marginals - Diagrams on both sides

 5.3 pairplot

Let's review this dataset again

import seaborn as sns
penguins = sns.load_dataset("penguins")
penguins

There are four columns with data, and after we use pairplot, the diagonal part is a histogram of the values. Non-diagonal parts are scatterplots

sns.pairplot(penguins)

 

5.3.1 Suitable for your needs

g = sns.PairGrid(penguins)
g.map_upper(sns.histplot)
g.map_lower(sns.kdeplot)
g.map_diag(sns.histplot, kde=True)

Reference Content Visualizing distributions of data — seaborn 0.11.2 documentation (pydata.org) 

Tags: Python programming language

Posted by cyprus on Wed, 17 Aug 2022 00:19:44 +0300