The Yhat Blog

machine learning, data science, engineering

Predicting SMS spam

by yhat |

Let's face it, getting SMS spam stinks, and worst of all it can be hard to tell if a text is spam or actual correspondence you might care about.

Here's one from AT&T.
+1 (1) 113271909: Hello! We hope you enjoy all the possibilities with your new phone. Thanks for choosing AT&T.

Anyone else confused by this message? If so, we built an app you can use when you receive these cryptic SMS messages.

Text Spam to 832-495-4517

Classifying Text Messages

We built a text classifier which can identify spam texts and deployed it to Yhat. You can read about the text classifier in our tutorial and about how to deploy it to Yhat in our docs. In this post, we'll show you how we brought everything together using the Twilio SMS API.

Hooking it up to Twilio

Twilio SMS API has 2 main functions: sending and receiving messages. We need to be able to reply to text messages, so we're going to build a simple API that responds with the Twilio markup language (TwiML). Our API will parse the incoming request from Twilio, extract the SMS message, generate the probability that it's spam using our model, and then respond using TwiML. Twilio will then read our response and send it back to the sender.

Twilio's docs have some great examples of how to build a simple response. We went with Ruby for this demo.

We used HTTParty to make the REST calls to the Yhat API. To make a prediction using Yhat, you pass your username, apikey, and the name of your model into the query parameters and send any data required to make a prediction in the request body as a parameter data.

The Sinatra app is almost the exact same as the one found in Twilio's docs. The only difference is we're using the Yhat class we defined to call the Yhat server with the SMS Body. Then there's a little json parsing and formatting of our SMS response.

After deploying the app on Heroku we just need to configure our Twilio number and we're ready to go. Go to your Twilio Account page and pick out a number. Once you have a number, click it and input your Heroku hostname as the SMS Request URL.

Send us an SMS!

That's it, we're ready to go! Feel free to fire off a text message to us at 832-495-4517 to see how spammy it is. If you've got any spammy texts rotting away in your inbox send them our way and see how our classifier does. If for some reason you don't have a phone on you, visit our online version.

Putting it all together...

Here's the code in full.

Our Products

Rodeo: a native Python editor built for doing data science on your desktop.

Download it now!

ScienceOps: deploy predictive models in production applications without IT.

Learn More

Yhat (pronounced Y-hat) provides data science solutions that let data scientists deploy and integrate predictive models into applications without IT or custom coding.