There's a lot of talk about ggplot these days (we even wrote a Python version of it)
and for good reason: it's a great plotting package that's easy to use. Despite this, I sometimes
find myself wanting something even quicker than ggplot. When that's the case, I
turn to base R plots. They're not as pretty and the syntax is a little unpleasant
but they're very fast, work on just about anything, and are often used by the pros.
In those regards, it's actually really similar to UNIX tools such as
So sit back, relax, and get ready to have some fun with R base plots!
We're using the
iris dataset. It's a tried and true classic and while it's not
the most exciting data in the world, it's built into R (so you don't need to
download anything) and easy to understand.
head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa
The other dataset we'll be using is the
USAccDeaths dataset which contains
numbers on the accidental deaths in the U.S. from 1973 to 1978. It's also built
into R and is a good example of a time series dataset. This will let us show
off some of R's handy built-in features for working with time series data.
USAccDeaths Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1973 9007 8106 8928 9137 10017 10826 11317 10744 9713 9938 9161 8927 1974 7750 6981 8038 8422 8714 9512 10120 9823 8743 9129 8710 8680 1975 8162 7306 8124 7870 9387 9556 10093 9620 8285 8466 8160 8034 1976 7717 7461 7767 7925 8623 8945 10078 9179 8037 8488 7874 8647 1977 7792 6957 7726 8106 8890 9299 10625 9302 8314 8850 8265 8796 1978 7836 6892 7791 8192 9115 9434 10484 9827 9110 9070 8633 9240
Ok first things first: the command to make plots is, you guessed it,
More good news: just about every data structure in R is plotable. That's not to
say it'll look pretty or even make sense, but you can always try and find out.
You can add colors to your points by passing a value to the
If you get tired of calling the
iris data frame with the
$ every time, you can
"attach" data which will imply that everything from there forward is referencing
the dataset you
attach. Just don't forget to
detach when you're done.
So as an example, let's say we want to plot specific values on the
axis. Instead of having to prefix our variables with
iris$, we'll use
attach(iris) plot(x=Sepal.Width, y=Sepal.Length) detach(iris)
Time series plotting is really easy with R. Since R natively has a time series
type, plots work right out of the box. In the example below, I'm going to pass
plot function the
You can see that we can also assign labels to our x and y axis by using
plot(USAccDeaths, xlab="Year", ylab="Accident Deaths in U.S.")
Adding points is also super easy. There are functions called
lines which, you guessed it again, layer points and lines on your existing
plot(USAccDeaths, xlab="Year", ylab="Accident Deaths in U.S.", main="Traffic Accident Deaths") points(USAccDeaths, pch=10)
You might have noticed there's a really weird circle with a cross in the
middle of it on the points of our graph. You can assign different styles of points using the
pch argument. Point styles can even be assigned to different categories (or
"levels" in R) of a variable.
plot(x=iris$Petal.Width, y=iris$Petal.Length, pch=as.numeric(iris$Species), col=as.numeric(iris$Species))
One of my very favorite things about R: histograms! When I made the switch from Excel to R, I had heard tales of mad sorcery where I could replace catalogs of frequency tables with one line of R code.
Histograms are great. They're a super easy way to get a quick feel for what your dataset looks like. So while it's one of the first things I learned in R, it's also one of the things I use the most.
Density Plots and Legends
To display distributions of different variables on the same plot, I recommend
density creates an estimate of the pdf (probability density
function) of your variable. This basically gives you a nice, continuous line
representing the distribution of your data. We'll use the
lines function to
add individual distributions with different colors to our plot.
virginica <- subset(iris, Species=="virginica") versicolor <- subset(iris, Species=="versicolor") setosa <- subset(iris, Species=="setosa") # plot distributions for each species plot(density(virginica$Sepal.Width), col="blue") lines(density(versicolor$Sepal.Width), col="red") lines(density(setosa$Sepal.Width), col="green") legend(2, 1.2, c("virginica", "versicolor", "setosa"), c("blue", "red", "green"))
So there you have it: the basics about base plots in R. That's all I'll cover today, but if you're interested in learning more here are some other resources: