The Yhat Blog


machine learning, data science, engineering


db.py 0.4: Handlebars Meets SQL

by Greg |


Do you ever feel like your SQL scripts are a bit repetitive? Now that Yhat has been around for a while, we actually have a decent amount of data to report on. Recently, I've been finding my SQL scripts for reporting our key metrics getting more and more ridiculous (I had one query that had 16 UNION ALL statements in it).

So to combat my ugly reports, we've added templating features to db.py!--in particular, Handlebars.

Don't know what templating or Handlebars is? No need to worry. In this post I'll outline why building templates is a great way to help eliminate SQL headaches and show you the basics of Handlebars.

Handlebars comes to town

Handlebars is a templating framework. It let's you write one "base template" (for instance, the header and footer of a web page) that can be re-used by other pages. So if you need to make a change to your site's footer, you only have to change 1 template instead of making an edit to every single page.

But templating isn't just something that should be relegated to Rails apps or MVC frameworks. How many SQL scripts have you written that contain those same 20-30 lines in them? Why can't take the same principles from building web pages and apply them to SQL scripts.

Rollie Fingers, former Oakland A and handlebar mustache aficionado.

Why you will love this

If you have ever written a SQL script that looks like the one below, then you need to stop what you're doing and install db.py.

We can turn that into this:

Hopefully the line count on those files alone is enough to sell you. Let's dig into the details and really get cranking.

The Basics

If you haven't already, upgrade to the latest db.py (pip install -U db.py).

Ok, now you're ready to get started. Handlebars + SQL (hereafter called HBSQL) let's you write scripts that look a lot like regular SQL scripts, but have special properties that allow you to embed expressions into them.

Take the following script for instance. The {{ <variable name> }} tags are placeholders for variables you'd like to include in your script.

When we pass in variables into the data argument, the variables are automatically inserted into our script.

Ok variables are pretty trivial (not much different form string formatting in Python). But what if we had variables that were more complex than letters or numbers.

For example, let's write a query that counts the number of rows in a list of tables. For each table name in the list, we're going to represent it as {{ . }} in our HBSQL query. the {{ . }} serves as a placeholder for each item in the list.

What happens next is the good part. Pass in your query along with the list to db.py's query function. db.py will generate 3 SQL queries (one for each item in my data list) and execute them independently. Once they've completed, it will combine them into 1 data frame.

Ending the UNION ALL blues

Let's try a more complex example. In this query, I'm calculating the amount spent on a per country basis for my fictional company. In addition, I have some custom filters setup for each country to exclude the bottom 20% of transactions.

I can actually store my variables in a CSV file and then merge them into my HBSQL template using pandas. This will make it super easy to add countries and modify the query even when exchange rates change. It also let's a non-technical person update my report without interrupting my work flow.

These are both pretty reasonable asks by a superior when it comes to building a report. Unfortunately if your data infrastructure isn't setup properly (which is typically the case), this report is a huge pain to produce.

Using HBSQL takes a SQL script that would be 160 lines, and turns it into 16 lines of code. Not too shabby.

My threshold is dependent upon the country (for instance, maybe it's based on exchange rate). I could either have 10 different SQL select statements doing a UNION ALL, have a giant, nasty WHERE clause/CASE statement combo, or could use a template. I think you can guess what I'm going to do.

Full code snippets here

Final Thoughts

For more on Handlebars, check out the docs. It's language agnostic, so javascript documentation works just as well.

There's also an interesting paid product that actually implements helper functions using SQL/Handlebars. Hopefully we'll have helper functions in the 0.5 release ;).



Our Products


Rodeo: a native Python editor built for doing data science on your desktop.

Download it now!

ScienceOps: deploy predictive models in production applications without IT.

Learn More

Yhat (pronounced Y-hat) provides data science solutions that let data scientists deploy and integrate predictive models into applications without IT or custom coding.