Bye, visualization! Hello, Pandas!

Source: Python Data Science
Author: Dongge take off

Data analysis with Python is inseparable from pandas. Pndas plays a more important role in processing and transforming data. Visual operations are also built in pands, but the effect is very rough.

Therefore, when using Python for data analysis, the normal practice is to use pandas for data processing first, and then use Matplotlib, Seaborn, Plotly, Bokeh, etc. to visualize the dataframe or series.

But to be honest, each visualization package has its own unique methods and functions, which I often forget, which has always been a headache for me.

Good news! Starting from the latest pandas version 0.25.3, the above operations are no longer required. Data processing and visualization can be completed with pandas.

pandas can now use Plotly and Bokeh as visual backend to directly realize interactive operation without using visual package alone.

Let's see how to use it.

1. Activate backend

After import ing pandas, directly use the following code to activate backend, such as plot.

pd.options.plotting.backend = 'plotly'

Currently, pandas's backend supports the following visualization packages.

  • Plotly
  • Holoviews
  • Matplotlib
  • Pandas_bokeh
  • Hyplot

2. Plotly backend

The advantage of Plotly is that it is written based on the Javascript version of the library, so the generated Web visualization chart can be displayed as HTML file or embedded in Python based Web application.

Let's see how to use plot as the backend of pandas for visualization.

If plot is not already installed, you need to install it PIP intsall plot. If you are using plot in jupyterab, you need to perform several additional installation steps to display the visualization.

First, install IPywidgets.

pip install jupyterlab "ipywidgets>=7.5"

Then run this command to install the plot extension.

jupyter labextension install jupyterlab-plotly@4.8.1

The example is selected from openml Org, the link is as follows:

Data link: https://www.openml.org/d/187

This data is also the sample data in scikit learn, so you can also use the following code to import it directly.

import pandas as pd
import numpy as np

from sklearn.datasets import fetch_openml

pd.options.plotting.backend = 'plotly'

X,y = fetch_openml("wine", version=1, as_frame=True, return_X_y=True)
data = pd.concat([X,y], axis=1)
data.head()

The dataset is wine related and contains many functions of wine types and corresponding labels. The first few rows of the dataset are shown below.

Let's explore the dataset using plot backend.

The drawing method is almost the same as that of using the built-in drawing operation of Pandas, except that the visualization effect is now displayed in rich plot.

The following code plots the relationship between two elements in the dataset.

fig = data[['Alcohol', 'Proline']].plot.scatter(y='Alcohol', x='Proline')
fig.show()

If you hover over the chart, you can choose to download the chart as a high-quality image file.

We can create a bar chart in combination with Pandas's groupby function to summarize the mean difference of Hue between various types.

data[['Hue','class']].groupby(['class']).mean().plot.bar()

Add class to the scatter chart we just created. With plot, you can easily apply different colors to each class to visually see the classification.

fig = data[['Hue', 'Proline', 'class']].plot.scatter(x='Hue', y='Proline', color='class', title='Proline and Hue by wine class')
fig.show()

3. Bokeh backend

Bokeh is another Python visualization package that also provides rich interactive visualization. Bokeh also has a streaming API, which can create real-time visualization for streaming data such as financial markets.

The GitHub link of pandas bokeh is as follows:

https://github.com/PatrikHlob...

As usual, you can install it with pip. pip install pandas bokeh.

In order to display Bokeh visualization in jupyterab, two new extensions need to be installed.

jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @bokeh/jupyter_bokeh

Next, we use Bokeh backend to recreate the scatter diagram just plotly implemented.

pd.options.plotting.backend = 'pandas_bokeh'

import pandas_bokeh
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

output_notebook()
p1 = data.plot_bokeh.scatter(x='Hue', 
                              y='Proline', 
                              category='class', 
                              title='Proline and Hue by wine class',
                              show_figure=False)
show(p1)

The key statement is one line of code, which is very fast. The interactive effect is as follows.

Bokeh also has plot_ The grid function can create a layout similar to a dashboard for multiple charts. Four charts are created in the grid layout below.

output_notebook()

p1 = data.plot_bokeh.scatter(x='Hue', 
                              y='Proline', 
                              category='class', 
                              title='Proline and Hue by wine class',
                              show_figure=False)

p2 = data[['Hue','class']].groupby(['class']).mean().plot.bar(title='Mean Hue per Class')

df_hue = pd.DataFrame({
    'class_1': data[data['class'] == '1']['Hue'],
    'class_2': data[data['class'] == '2']['Hue'],
    'class_3': data[data['class'] == '3']['Hue']},
    columns=['class_1', 'class_2', 'class_3'])

p3 = df_hue.plot_bokeh.hist(title='Distribution per Class: Hue')

df_proline = pd.DataFrame({
    'class_1': data[data['class'] == '1']['Proline'],
    'class_2': data[data['class'] == '2']['Proline'],
    'class_3': data[data['class'] == '3']['Proline']},
    columns=['class_1', 'class_2', 'class_3'])

p4 = df_proline.plot_bokeh.hist(title='Distribution per Class: Proline')

pandas_bokeh.plot_grid([[p1, p2], 
                        [p3, p4]], plot_width=450)

It can be seen that the visualization part is done in one line of code based on the dataframe of pandas, and finally plot_grid completes the layout.

4. Summary

Add multiple third-party visualization backend in the built-in pandas drawing function, which greatly enhances the function of pandas for data visualization. In the future, there may be no need to learn many visualization operations. You can also hit the soul with pandas!

It's not easy to be original. Let's praise and support it.

This article was first published in my original official account: Python data science. Welcome to pay attention.
Personal website: http://www.datadeepin.com/

Tags: Python Data Analysis Visualization pandas

Posted by mikkex on Wed, 11 May 2022 10:45:58 +0300