The Yhat Blog


machine learning, data science, engineering


Scikit-Learn Cheat Sheet: Python Machine Learning

by Karlijn Willems |


This post originally appeared on the DataCamp blog. Big thanks to Karlijn and all the fine folks at DataCamp for letting us share with the Yhat audience!

Scikit-Learn library

Most of you who are learning data science with Python will have definitely heard already about scikit-learn, the open source Python library that implements a wide variety of machine learning, preprocessing, cross-validation and visualization algorithms with the help of a unified interface.

If you're still quite new to the field, you should be aware that machine learning, and thus also this Python library, belong to the must-knows for every aspiring data scientist.

That's why DataCamp has created a scikit-learn cheat sheet for those of you who have already started learning about the Python package, but that still want a handy reference sheet. Or, if you still have no idea about how scikit-learn works, this machine learning cheat sheet might come in handy to get a quick first idea of the basics that you need to know to get started.

Either way, we're sure that you're going to find it useful when you're tackling machine learning problems!

This scikit-learn cheat sheet will introduce you to the basic steps that you need to go through to implement machine learning algorithms successfully: you'll see how to load in your data, how to preprocess it, how to create your own model to which you can fit your data and predict target labels, how to validate your model and how to tune it further to improve its performance.

In short, this cheat sheet will kickstart your data science projects: with the help of code examples, you'll have created, validated and tuned your machine learning models in no time.

What are you waiting for?

Time to get started!

You might begin with DataCamp's scikit-learn tutorial for beginners, in which you'll learn in an easy, step-by-step way how to explore handwritten digits data, how to create a model for it, how to fit your data to your model and how to predict target values. In addition, you'll make use of Python's data visualization library matplotlib to visualize your results.

You can also just jump right into running the code examples provided on the cheat sheet. If you want to jump right into coding, be sure to also check out Yhat's data science IDE, Rodeo. If you've ever worked in RStudio, it's a very similar setup. You can download Rodeo for Windows, Mac or Linux here. Fun fact: as of v2.5.2, the Windows version comes with Python built-in (since installing Python on Windows can really be a pain.) Specifically, Rodeo ships with Continuum's Miniconda. You can read more about that here.

Rodeo is a convenient environment for data exploration and analysis with packages like Scikit-Learn



Our Products


Rodeo: a native Python editor built for doing data science on your desktop.

Download it now!

ScienceOps: deploy predictive models in production applications without IT.

Learn More

Yhat (pronounced Y-hat) provides data science solutions that let data scientists deploy and integrate predictive models into applications without IT or custom coding.