The Yhat Blog


machine learning, data science, engineering


Introducing db.r

by Greg |


If you follow our blog and you've read 10 R packages I wish I knew about earlier, you might remember package #5, the database driver of your choice. I think one of the reasons why many people avoid the R database packages and instead steadily collect piles upon piles of CSVs is that the R database packages aren't really all that user friendly. They can feel a little overwhelming--especially if you're the kind of person who's more interested in R for stats than for managing a database.

So we've written a package to help make querying databases in R more fun!

Think db.py ... but in R

We've gotten some great reception to db.py. There have been some really helpful contributions to the project as well. Special thanks to Seth Mason, debjan, and Gianluca Emireni for working on the MS SQL and Oracle integrations!

Warning: We've been informed that there's a bug when using db.r with certain versions of RStudio. We're working on fixing it! In the meantime, you can check for updates here.

Connecting to databases

But as useful as db.py is for Python, there isn't really a corollary for R. So today we're happy to announce the release of db.r!

We used the same sort of API and thought concepts from db.py in building db.r. Connecting to your databases is pretty easy.

> library(db.r)
> db <- DB(username="kermit", password="rainbowconnection", hostname="dw.themuppets.org",
         dbname="muppetsdb", dbtype="postgres")
> db
# DB[postgres][dw.themuppets.org]:5432 > kermit@muppetsdb

Pretty simple. db.r knows what "good" default values are. So if you don't know what your database's port is, there's a good chance db.r can figure it out.

In addition to connecting to your database, you can also save sets of credentials so you only have to type in all that connection info once!

# from previous session
> db
# DB[postgres][dw.themuppets.org]:5432 > kermit@muppetsdb
> db$save_credentials(profile="muppetdb")
> db$load_credentials(profile="muppetdb")
> db
# DB[postgres][dw.themuppets.org]:5432 > kermit@muppetsdb

Inspecting Schema

Like db.py, db.r makes it easy to explore your databases.

NOTE: For the remainder of this post, we'll be using DemoDB which comes with db.r as an example database. The data comes from the Chinook Database which is commonly used as an example databse.

> db <- DemoDB()
> db$tables
# $Album
#   table_name column_name     data_type    foreign.keys ref.keys
#   1      Album     AlbumId       INTEGER
#   2      Album       Title NVARCHAR(160)
#   3      Album    ArtistId       INTEGER Artist.ArtistId

# $Artist
#   table_name column_name     data_type foreign.keys ref.keys
#   1     Artist    ArtistId       INTEGER
#   2     Artist        Name NVARCHAR(120)
> db$find_column("M.*Id")
#   table_name column_name data_type
#   50  MediaType MediaTypeId   INTEGER
#   59      Track MediaTypeId   INTEGER

> db$find_table("Pl*")
#   $Playlist
#   table_name column_name     data_type foreign.keys ref.keys
#   1   Playlist  PlaylistId       INTEGER
#   2   Playlist        Name NVARCHAR(120)
#
#   $PlaylistTrack
#    table_name column_name data_type foreign.keys ref.keys
#    1 PlaylistTrack  PlaylistId   INTEGER
#    2 PlaylistTrack     TrackId   INTEGER

Querying Data

db.r uses DBI and the accompanying database specific libraries to execute queries. You don't really need to worry about this because it's all been abstracted into the query and query_from_file functions.

> db$query("select * from Artist limit 10;")
#    ArtistId                 Name
#    1         1                AC/DC
#    2         2               Accept
#    3         3            Aerosmith
#    4         4    Alanis Morissette
#    5         5      Alice In Chains
#    6         6 Antônio Carlos Jobim
#    7         7         Apocalyptica
#    8         8           Audioslave
#    9         9             BackBeat
#    10       10         Billy Cobham
> df <- db$query_from_file("myscript.sql")

Wrap Up

We're going to keep improving db.r, so stay tuned for more updates (MS SQL support is coming!).



Our Products


Rodeo: a native Python editor built for doing data science on your desktop.

Download it now!

ScienceOps: deploy predictive models in production applications without IT.

Learn More

Yhat (pronounced Y-hat) provides data science solutions that let data scientists deploy and integrate predictive models into applications without IT or custom coding.