The Yhat Blog


machine learning, data science, engineering


What's new in ggplot 0.11

by Greg |


Thanks for all the great feedback on our recent blog post announcing ggplot 0.10. We've taken your feedback into account and I think that a lot of you will really like this new release. We've got more customizations, advanced plotting features (see Multiple data frames in the same plot), and lots of contributions from the community!

So sit back, relax, and get ready to dive into the wonderful world of plotting in Python with ggplot!

Custom themes and text: theme and element_text

One of the most requested features for ggplot has been the ability to customize the look, feel, and text/fonts used in their plots. Well 0.11 delivers on this! Introducing custom themes!

Taking a look at the argument options for theme using our very own Rodeo.

And it wouldn't really be super helpful if you didn't also include the ability to customize text; enter element_text. If you've used ggplot for R then this should be old hat for you. But if not, no worries.

Pro Tip (and shameless plug): You can use Rodeo to inspect function docstrings by using ? followed by the function you're curious about (i.e. ?element_text or ?pd.cut)

element_text is a function that allows you to customize the text that appears on your plots (titles, axis labels, legends, etc.). It is frequently used in conjunction with the theme function (think: "theme my x axis with the following font"). For example...

ggplot(diamonds, aes(x='price')) + \
    geom_histogram(bins=50) + \
    theme(axis_title_x=element_text("Bins", vjust=-0.1, size=28),
          axis_text_x=element_text(angle=45, size=20))

Or maybe add a splash of color!

ggplot(diamonds, aes(x='price')) + \
    geom_histogram(bins=50) + \
    theme(plot_margin=dict(bottom=0.2, left=0.2),
          title=element_text("Super Awesome Plot!", size=34, color='CornflowerBlue'),
          axis_title_x=element_text("Bins\n(Group Groupings)", size=28, color='MediumSpringGreen'),
          axis_title_y=element_text("Count\n(Unit Unitables)", size=28, color='MediumSpringGreen'),
          axis_text_x=element_text(angle=45, size=10, color='MediumSlateBlue'),
          axis_text_y=element_text(size=10, color='MediumSlateBlue')
         )

Docstrings

Another big improvment we've made in 0.11 is docstrings. Docstrings were largely absent in the previous releases and in the places where they did exist they weren't particularly helpful.

If you're a Rodeo user you've probably already noticed them, but in case you aren't (well you should be 😊 but I'll let it slide) here's what you're missing out on:

Make your life easier, use the docstrings!

Feature Request: specifying the number of bins in a histogram

Thanks to Philip Hazelden for requsting this. ggplot does a decent job on having reasonable defaults, but sometimes you need to do something super specific--for example, determining the number of bins to include in a histogram. We've introduced support for this in 0.11. It's pretty straightforward, just include a parameter called bins (go figure) when you call geom_histogram to specify how many buckets your histogram will have.

Love that long tail 😊

Web-Friendly: Base64 Support

Also new in this release is the ability to save images as base64 "embeddable" strings. If you happen to be using ggplot on the web, this will probably be helpful for you. It gives you an output that you can throw directly into a webpage. Handy for things like...blogging!

p = ggplot(pigeons, aes(x='pos', y='speed')) + geom_step()
print p.save_as_base64(as_tag=True)
# <img src = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAxgAAAJACAYAAAAO18BKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1%2B/AAAIABJ...

This image is base64 encoded! Take a look at the source to see what I mean.

Multiple data frames in the same plot

A great part about 0.11.0 is that there were a number of contributions from the community. Big thanks to Stephan Schmeing who added support for using multiple data frames in the same ggplot. That might seem like a strange feature, but it's actually super helpful. For example, let's take our built-in dataset mpg, which contains information about cars and their miles per gallon performance. If you wanted to compare each car's city and highway mpg, you might use a scatterplot. Easy enough...

from ggplot import *
ggplot(mpg, aes(x='cty', y='hwy')) + geom_point()

But what if you wanted to overlay a smoothed curve of how Toyotas perform. Previously this would be a huge pain to do in ggplot. Difficult enough that you'd probably give up before figuring out how to do it (at least I would). Now you can pass your data directly into a geom or stat layer. Here's how it looks:

toyota = mpg[mpg.manufacturer=="toyota"]

ggplot(mpg, aes(x='cty', y='hwy')) + geom_point() + \
    stat_smooth(aes(x='cty', y='hwy'), color='blue', se=False, data=toyota)

Bug-fixes

There were a number of individuals who contributed bug fixes to this release. Thanks so much for your help!

  • save() height typo bug => the mysterious microly (sorry, couldn't find your real name 😊)
  • edgecolors on geom_point (small fix but geom_point looks 10x better) => Christopher Roach
  • python3 fixes; wheel for pypi distributions => Nick Timkovich (the community once again bails me out of a Python3 problem)
  • README and docs fixes => Jeremy Kahn, Sebastian Nagel (ggplot is once again victimized by my atrocous atrocious spelling)

That's all!

Well we packed a lot into 0.11 but that's all for now. Stay tuned for more updates by subscribing to our blog, or follow along ggplot development on github. Feel free to post feature requests (or bugs) in the issues. I try to do my best to get to as many as I can (and I'd like to think that this release is an indication that we get to most of them!).



Our Products


Rodeo: a native Python editor built for doing data science on your desktop.

Download it now!

ScienceOps: deploy predictive models in production applications without IT.

Learn More

Yhat (pronounced Y-hat) provides data science solutions that let data scientists deploy and integrate predictive models into applications without IT or custom coding.