A quick bio in his own words:
“Much of my past work was in artificial intelligence, personalization, recommendations, search, and advertising. Over the years, I have worked at Amazon, Google, and Microsoft, founded and run my own startups, and advised several other startups, some of which were acquired. I invented the now widely used item-to-item collaborative filtering algorithm, contributed to many patents and academic publications, and have been quoted often in books and in the press. I have an MS in Computer Science from University of Washington and an MBA from Stanford.”
1. What project have you worked on do you wish you could go back to, and do better?
All of them! There’s always more to do, more improvements to make, another thing to try. Every time you build anything, you learn what you could do to make it better next time.
2. What advice do you have to younger analytics professionals and in particular PhD students in the Sciences?
Learn to code. Computers are a tool, and coding is the way to get the most out of that tool. If you can code, you can do things in your field that others cannot. Coding is a major force multiplier. It makes you more powerful.
3. What do you wish you knew earlier about being a data scientist?
I was doing what is now called data science at Amazon.com in 1997.The term wasn’t even coined until 2008 (by Jeff Hammerbacher and DJ Patil). It’s hard to be much earlier. As for what I wish, I mostly wish I had the powerful tools we have now back then; today is a wonderland of data, tools, and computation. It’s a great time to be a data scientist.
4. How do you respond when you hear the phrase ‘big data’?
I usually think of Peter Norvig talking about the unreasonable effectiveness of data and Michele Banko and Eric Brill finding that more data beat better algorithms in their 2001 paper. Big data is why Amazon’s recommendations work so well. Big data is what tunes search and helps us find what we need. Big data is what makes web and mobile intelligent.
5. What is the most exciting thing about your field?
I very much enjoy looking at huge amounts of data that no one has looked at yet. Being one of only a few to explore a previously unmined new source of information is very fun. Low hanging fruit galore! It’s also fraught with peril, as you’re the first to find all the problems in the data as well.
6. How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?
Data problems should be iterative. Start simple. Solve a small problem. Explore the data. Then solve a harder problem. Then a harder one. Each time you take a step, you’ll get ideas on where to go next, and you also get something out at each step. Too many people start trying to solve the entire problem at the beginning, flailing for a long time, usually to discover that it was the wrong problem to solve when they finally struggle to completion. Start with easier problems, learn where to go, and you might be surprised by all the goodies you find along the way.
Huge thank you's to Greg for his time and insight, and to Peadar for allowing us to share his interview with Greg on the Yhat blog.