The Yhat Blog

machine learning, data science, engineering

ggplot is a graphics package for Python that aims to approximate R's ggplot2 package in both usage and aesthetics.

This is a post summarizing the latest fixes and enhancements in the ggplot-0.4 release.

Tidying up our mess

The positive reaction to ggplot from the onset was, candidly, a bit overwhelming! We'll be the first ones to admit that the first cut was a little...err...rough. But we've been hard at work incorporating new features and fixes and are extremely enthusiastic about the progress and interest in the project.

A big thank you to everyone submitting pull requests! We are deeply appreciative, and we're doing our best to keep up!


One of the most obvious deficiencies of the inital version of ggplot was the faceting implementation. In the original blog post, a few of the facet_wrap and facet_grid plots were flat out missing certain variables (sorry about it!).

ggplot-0.4 makes facets a little prettier, fixes issues with assigning the wrong colors, shapes, and other aesthetics, and provides the ability to do some basic scaling when using facet_grid.

from ggplot import *
p = ggplot(aes(x='price'), data=diamonds)
p + geom_histogram() + facet_wrap("cut")

p = ggplot(aes(x='wt', y='mpg'), data=mtcars)
p + geom_point() + facet_grid("cyl", "gear", scales="free_y")

ggplot(aes(x='carat', y='price', colour='cut'), data=diamonds) + \
    geom_point() + facet_wrap("clarity")

We're still working on legends for facets, but that's coming soon!


Another big improvement is the overall look and feel of the graphics. Originally we were using a .matplotlibrc file that didn't give the user the ability to customize the style of their plots.

With version 0.4, however, ggplot supports proper themes! Hats off to Jan Schulz for tackling this one and building out the key plumbing involved.

ggplot(aes(x='date', y='beef'), data=meat) + \
    geom_line() + \

And of course no theme implementation would be complete without an xkcd style!

ggplot(aes(x='date', y='beef'), data=meat) + \
    geom_line() + \

New geoms

We've been able to port over geoms pretty quickly.

Thank you Eric Chiang & Justin Haynes for adding:

  • geom_step
  • geom_text
  • geom_tile
random_walk = pd.DataFrame({
    "x": np.arange(100),
    "y": np.cumsum(np.random.choice([-1, 1], 100))
ggplot(aes(x='x', y='y'), data=random_walk) + \

ggplot(aes(x='wt', y='mpg', label='name'), data=mtcars) + \

df = pd.DataFrame({
    'x': ['a', 'b', 'c', 'a'],
    'y': [3, 2, 1, 2],
    'fill': np.random.random(4)
print ggplot(aes(x='x', y='y', fill='fill'), data=df) + \
    geom_tile() + \
    xlab('X Label') + \
    ylab('Y Label') + \
    ggtitle('This is geom_tile!\n')

Multiple ggplot objects in 1 plot

For more advanced plots, you inevitably need to be able to work with multiple data frames at the same time. Just add in a geom that contains a reference to your data and customize the aesthetics! We're still working out the legends so we'll thank you for your patience ahead of time.

random_walk1 = pd.DataFrame({
  "x": np.arange(100),
  "y": np.cumsum(np.random.choice([-1, 1], 100))
random_walk2 = pd.DataFrame({
  "x": np.arange(100),
  "y": np.cumsum(np.random.choice([-1, 1], 100))
ggplot(aes(x='x', y='y'), data=random_walk1) + \
    geom_step() + \
    geom_step(aes(x='x', y='y'), data=random_walk2)

Color Scales

One of my favorite parts of ggplot2 for R is the color scaling. We've added a basic implementation of scale_color_gradient and scale_color_manual.

ggplot(aes(x='wt', y='mpg', color='mpg'), data=mtcars) + \
    geom_point() + \
    scale_colour_gradient2(low="coral", high="steelblue")

ggplot(aes(x='drat', y='mpg', color='wt'), data=mtcars) + \
    geom_point() + \
    scale_colour_gradient(low="white", mid="blue", high="black")

We're working on scale_color_brewer. Look for it in the next release.

color_list = [
lng = pd.melt(meat, ['date'])
ggplot(lng, aes('date', 'value', color='variable')) + \
    geom_point(size=3) + \


Another really helpful utility we've added is the ggsave command. It let's you save a plot to a .png file. We're going to be adding PDF support soon as well!

p = ggplot(aes(x='price'), data=diamonds) + geom_histogram() + ggtitle('My Diamond Histogram')
ggsave(p, "my_diamond_histogram.png")

And now we can open the .png file we just created in our current working directory

In [5]: ! open ./my_diamond_histogram.png

What's in the pipeline

  • scale_color_brewer
  • additional methods for stat_smooth
  • facet legends
  • full IPython Notebook support (currently legends get chopped off :( )
  • saving multiple plots to a PDF
  • geom_errorbar

Our Products

A Python IDE built for doing data science directly on your desktop.

Download it now!

A platform for productionizing, scaling, and monitoring predictive models in production applications.

Learn More

Yhat (pronounced Y-hat) provides data science and decision management solutions that let data scientists create, deploy and integrate insights into any business application without IT or custom coding.

With Yhat, data scientists can use their preferred scientific tools (e.g. R and Python) to develop analytical projects in the cloud collaboratively and then deploy them as highly scalable real-time decision making APIs for use in customer- or employee-facing apps.