April 19, 2020

Touching up a Data Plot

By Tobias C. Kaechele

In this tutorial I'm going to show you how to touch up a generic scatter plot made with the Matplotlib package in Python. You should be fairly familiar with Python and have a basic knowledge about how to plot something with Matplotlib.

Looking at the Data

The data to be visualized are the predictions of a neural net trying to guess the happiness score of a country by looking at some data points. As for the workbench, I set up Jupyter Lab.

Using the Pandas package, we load a DataFrame (basically a data table) from the pickle file provided in the source materials. For our convenience, we save references of the columns of interest. In most cases, you can treat these references like lists of numbers. So don't worry about the underlying magic.

import pandas as pd

data = pd.read_pickle( 'predicted.pickle' )

scores = data["Score"]
predictions = data["Score prediction"]
errors = data["Score error"]

scores are the actual happiness scores of a country from the original dataset, having a range from 0 to 10. predictions are the predicted happiness scores from my neural net. errors represent how much do the actual values differ from the predicted ones. In this case they are the squared errors.

Styling the Axis and Frame

Let's start by plotting the dataset using Matplotlib. We also indicate the error (deviation between the actual and predicted happiness score) by a colour and show a corresponding colorbar on the right side.

...

import matplotlib.pyplot as plt

fig, ax = plt.subplots(nrows=1, ncols=1)
scatter = ax.scatter(scores, predictions, c=errors)
fig.colorbar(scatter, ax=ax)

fig.show()

Because the predicted and the actual scores share the same range we opt for equal scales. Thus, we set the limits of the y-axis (ylim) to the limits of the x-axis (xlim). To visually emphasize this equality, we force the diagram to be of the shape of a square.

...

# make diagram square
ax.axis('square')

# make axis use same limits
ax.set_ylim(ax.get_xlim())

fig.show()

Changing the background is done by calling ax.set_facecolor(...). We can actually pass an hex value to it. Easy as pie! To get rid of the frame we need to get the top, right, bottom, and left spine and make is invisible, e.g. ax.spines['top'].set_visible(False).

...

# set the background color
ax.set_facecolor('#efeff1')

# remove frame
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

fig.show()

Next, we are going to plot a white grid via ax.grid(...). Unfortunately, the grid is always on top by default. Thus, we need to call ax.set_axisbelow(True) to sent the axes and the grid to the background.

...

# draw grid
ax.grid(which="major", color="w", linestyle='-', linewidth=1)

# sent axes and grid to the back
ax.set_axisbelow(True)

fig.show()

As you can see above, the labelled tick marks at the axes are not matching the rest of the design. Thus, we are going to adjust their colour and length by using ax.tick_params().

...

# set color and width of tick marks (and of the labels)
ax.tick_params(axis='x', colors='#777777', width=0)
ax.tick_params(axis='y', colors='#777777', width=0)

fig.show()

Improving the Colorbar

To adjust the colorbar, we have to dig deeper. We have to retrieve a reference for it by saving the output of fig.colorbar(...) as colorbar. Now we can remove its frame by gaining access to its outline and set the visibility via colorbar.outline.set_visible(...).

The biggest challenge is to change the appearance of the tick marks and labels. I found this to be documented poorly. The key is that the object colorbar has a member ax which contains another member axes allowing you to make changes to the axes of the colorbar itself. Thus, we can configure the tick marks like any other figure by calling color.ax.axes.tick_params(...).

However, to change the colour of the tick labels we must must first get a handle for the yticklabels via colorbar.ax.axes.get_yticklabels(). Second, we use the function plt.setp(...) to set a specific attribute. In this case we are changing the colour of our tick labels as you can see below.

...

import matplotlib.pyplot as plt

fig, ax = plt.subplots(nrows=1, ncols=1)
scatter = ax.scatter(scores, predictions, c=errors)

# get reference of colorbar
colorbar = fig.colorbar(scatter, ax=ax)

# remove frame
colorbar.outline.set_visible(False)

# set color of ticks
colorbar.ax.axes.tick_params(axis='y', color='#777777')

# get reference of tick marks
yticklabels = colorbar.ax.axes.get_yticklabels()

# set color attribute of tick marks
plt.setp(yticklabels, color='#777777')

...

fig.show()

Adjusting the Colormap

Doesn't this look way prettier than Matplotlib's default settings? However, the blue-to-yellow colour gradient used to indicate the errors might not work for you. Of course, there is a wide range of colour gradients (so called colormaps) to choose from which already come with the Matplotlib package. But sometimes, this isn't enough, for example, if your plot has to comply to the style guide of your company.

I'm going to show you an easy method how to brew your own colormap, like you would do in PowerPoint or Adobe Illustrator.

Basically, you want to have a linear colour gradient between two or more colours. In this case we can use LinearSegmentedColormap.from_list(...) from the package matplotlib.colors.

LinearSegmentedColormap.from_list(name='greyToRed', segmentdata=['#555566', '#F00A41'], N=256)

As you can see, the function expects a name and an array of colours as segmentdata describing your gradient. In this case, our gradient starts at #555566 and stops at #F00A41. The parameter N specifies how many colour steps there will be in-between. The more steps the smoother the gradient. Below you can see a small discrete sample of our colormap.

To use the gradient, we will go back in our source code, insert the missing import statement, create a colormap called cmap, and modify the call to the ax.scatter function by passing the colormap as cmap.

...

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap

# create linear colour gradient
cmap = LinearSegmentedColormap.from_list('greyToRed', ['#555566', '#F00A41'], N=256)

fig, ax = plt.subplots(nrows=1, ncols=1)
scatter = ax.scatter(scores, predictions, c=errors, cmap=cmap)

...

fig.show()

Above you can see the final result of our efforts.¹ I hope you enjoyed this tutorial on how to touch up plots in Python. Feel free to download the source code and try it out yourself!

^
Note that I tweaked the limits of the axes by setting ax.set_xlim([3.5, 8.5])