import requestsWhy
Tufte’s ‘Visual Display of Quantitative Information’ from 1983 is one of the most beautiful books I’ve ever read. Edward argues for data visualisation to be a highly regarded craft and that its the synthesis of statistics, design, and truth telling. I was moved by how emotionally compelling the book was. I wont write a review here but anyone who spends anytime expressing data visually in their life, however tiny, I think will get great value from this book.
I was particularly inspired by his scatter plots recommendations which combine Tukey box & whisker plots into the axes of scatters and a large reduction in ink + marked data points as goals. I thought they were gorgeous and as he proves, communicate far more than standard plots you come across. I want to know how to produce them in python so I thought I’d share my solution and exploration in this short notebook.
Another Great Book
Scientific Visualization: Python + Matplotlib by Nicolas Rougier is another fantastic book I’m referencing to understand how matplotlib actually works. Most of the time I reference the tutorial / gallery code in altair, seaborn, etc and move on with my life without worrying too much about what ‘fig’ and ‘ax’ mean. But curiousity has got the better of me and this book clearly explains these terms and the fundamentals of the library. Another recommendation
Lets Grab Iris Data
Here’s a quick web request to grab it from UCI, altair also has vega datasets which makes this dead easy from their tutorials. Pick your poison…
res = requests.get("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")from io import StringIO
import pandas as pddata = pd.read_csv(StringIO(res.text),header=None,names=["sepal_len","sepal_wid","petal_len","petal_wid","class"])# Looks good, thanks UCI!
data.head()| sepal_len | sepal_wid | petal_len | petal_wid | class | |
|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
Simple Plots with Altair
Altair is a great viz library with heaps of default behaviour that is great. A simple example below shows the interactivity and investigation that is out of the box with only a few lines of readable code. This articulates a great starting point for scatter plots and whilst its superior to those that Tufte has issue with, it can still be improved.
import matplotlib.pyplot as plt
import seaborn as sns
import altair as altalt.themesThemeRegistry(active='default', registered=['dark', 'default', 'fivethirtyeight', 'ggplot2', 'latimes', 'none', 'opaque', 'quartz', 'urbaninstitute', 'vox'])
# Defining a colorblind friendly theme for my accessibility needing friends. Here's a nice short read on colorblind friendly pallettes: https://davidmathlogic.com/colorblind/
# I think this theme is compatible but please scream at me if I've made a mistake here, I always want to be inclusive and I think data viz underdiscusses making colorblind friendly visualisations in the name of prettyness.
alt.themes.enable('urbaninstitute')ThemeRegistry.enable('urbaninstitute')
alt.Chart(data).mark_circle(size=60).encode(
x='sepal_len:Q',
y='sepal_wid:Q',
color='class:N',
tooltip=['sepal_len', 'sepal_wid', 'petal_wid', 'class']
)# An interactive version! Altair is cool....
alt.Chart(data).mark_circle(size=60).encode(
x='sepal_len:Q',
y='sepal_wid:Q',
color='class:N',
tooltip=['sepal_len', 'sepal_wid', 'petal_wid', 'class']
).interactive()Improvements from Tufte
From page 130 of ‘The Visual Display of Quantitative Information’, in the ‘Theory of Data Graphics’ section under ‘Redesign of the Scatterplot, Tufte outlines a few recommendations that I think are interesting. Mainly involving removing information to improve the ’data-ink ratio’ which is his simple and clever arguement to measurably improve and compare graphics by removing non data ink. I won’t re-explain it here, read the dang book its so dang good. 1. Change the Frame Lines / Axes to Represent a Quartile Plot 2. Show Distributions and Ticks in Axes 3. Remove Unused Axes Space 4. Turn down or off the background grid
Altair Mostly Beat me To it
Altair has a fantastic tutorial also inspired by the book here: https://altair-viz.github.io/gallery/dot_dash_plot.html which we will use as a starting point.
This crosses off #2 of our goals as we can see the data distributions already.
However I’d still like to improve this by simplifying the frame lines.
# Here's Altair's start plot taken directly from https://altair-viz.github.io/gallery/dot_dash_plot.html but popping in our sepal data
# sepal_len', 'sepal_wid', 'petal_wid', 'class
# Configure the options common to all layers
brush = alt.selection(type='interval')
base = alt.Chart(data).add_selection(brush)
# Configure the points
points = base.mark_point().encode(
x=alt.X('sepal_len', title=''),
y=alt.Y('sepal_wid', title=''),
color=alt.condition(brush, 'class', alt.value('grey'))
)
# Configure the ticks
tick_axis = alt.Axis(labels=False, domain=False, ticks=False)
x_ticks = base.mark_tick().encode(
alt.X('sepal_len', axis=tick_axis),
alt.Y('class', title='', axis=tick_axis),
color=alt.condition(brush, 'class', alt.value('lightgrey'))
)
y_ticks = base.mark_tick().encode(
alt.X('class', title='', axis=tick_axis),
alt.Y('sepal_wid', axis=tick_axis),
color=alt.condition(brush, 'class', alt.value('lightgrey'))
)
# Build the chart
y_ticks | (points & x_ticks)Modifications
Lets first fix these bottom axes so most of the space is used properly, we can do this by either clearing out the frame space that isn’t used or not starting from zero which I think is a reasonable move if you want to show relationships, however this can misconstrue the scale of differences if there are tiny deviations overall.
Altair has a great guide on their modification of axes here: https://altair-viz.github.io/user_guide/customization.html?highlight=axis#adjusting-axis-limits which we’ll be using extensively.
Modify the Scale to Fill | Line 11-12
Lets first change the scale to not start at 0 by setting the scale keyword zero to be false. We could also take the min and max of our X variable ‘sepal length’ and Y variable sepal width and set the domain keyword to a tuple of those values.
Turn Down the Grid | Line 6-7
Lets also modify the grid values & try and remove more ink for free, see line 6 with the plot_axis and alt.Axis object we’re defining
Add in Quartiles | Line 31-39
Lets also add in boxplots to the grid to see distributions in the axes. In Tufte’s book he actually converts the axis itself into the distribution. I’m not sure how to implement that and I actually think this works really nicely as its written below!
Altair is Cool, Draw a Grid!
Also try dragging a small selection box over the viz and then moving the box around, you can see the distributions and values light up as to which are selected.
# Configure the options common to all layers
brush = alt.selection(type='interval')
base = alt.Chart(data).add_selection(brush)
plot_axis = alt.Axis(labels=True, domain=False, ticks=False,grid=False)
tick_axis = alt.Axis(labels=False, domain=False, ticks=False,grid=False)
# Configure the points
points = base.mark_point().encode(
x=alt.X('sepal_len', title='',axis=plot_axis,scale=alt.Scale(zero=False)),
y=alt.Y('sepal_wid', title='',axis=plot_axis,scale=alt.Scale(zero=False)),
color=alt.condition(brush, 'class', alt.value('grey'))
)
# Configure the ticks
x_ticks = base.mark_tick().encode(
alt.X('sepal_len', axis=tick_axis,scale=alt.Scale(zero=False)),
alt.Y('class', title='', axis=tick_axis),
color=alt.condition(brush, 'class', alt.value('lightgrey')),
)
y_ticks = base.mark_tick().encode(
alt.X('class', title='', axis=tick_axis),
alt.Y('sepal_wid', axis=tick_axis,scale=alt.Scale(zero=False)),
color=alt.condition(brush, 'class', alt.value('lightgrey'))
)
x = base.mark_boxplot(extent='min-max').encode(
alt.X('class', title='',axis=tick_axis),
alt.Y('sepal_wid',title='',axis=tick_axis,scale=alt.Scale(zero=False)),
color='class')
y = base.mark_boxplot(extent='min-max').encode(
alt.Y('class', title='',axis=tick_axis),
alt.X('sepal_len',title='',axis=tick_axis,scale=alt.Scale(zero=False)),
color='class')
# Build the charts
y_ticks | x | (points & y & x_ticks)Looking Good
That was super straightforward! With some simple modifications to the basic altair scatterplot, we can create Tufte inspired scatters which I think are really beautiful and informative. A high data-ink ratio! He actually laments graphical software in the book as it got people to produce total fluff and filler instead of taking care in the craft as hand drawing plots would force you to. I wonder what his feelings are considering how far we have come and how prevalent graphing libraries like altair are and how easy they are to use.