This post originally appeared on the DataCamp blog. Big thanks to Karlijn and all the fine folks at DataCamp for letting us share with the Yhat audience!
Most of you who are learning data science with Python will have definitely heard
scikit-learn, the open source Python library that implements a
wide variety of machine learning, preprocessing, cross-validation and visualization
algorithms with the help of a unified interface.
If you're still quite new to the field, you should be aware that machine learning, and thus also this Python library, belong to the must-knows for every aspiring data scientist.
That's why DataCamp has created a
scikit-learn cheat sheet for those of you who
have already started learning about the Python package, but that still want a handy
reference sheet. Or, if you still have no idea about how
scikit-learn works, this machine
learning cheat sheet might come in handy to get a quick first idea of the basics
that you need to know to get started.
Either way, we're sure that you're going to find it useful when you're tackling machine learning problems!
scikit-learn cheat sheet will introduce you to the basic steps that you
need to go through to implement machine learning algorithms successfully:
you'll see how to load in your data, how to preprocess it, how to create your
own model to which you can fit your data and predict target labels, how to validate
your model and how to tune it further to improve its performance.
In short, this cheat sheet will kickstart your data science projects: with the help of code examples, you'll have created, validated and tuned your machine learning models in no time.
What are you waiting for?
Time to get started!
You might begin with DataCamp's scikit-learn tutorial for beginners, in which you'll learn in an easy, step-by-step way how to explore handwritten digits data, how to create a model for it, how to fit your data to your model and how to predict target values. In addition, you'll make use of Python's data visualization library matplotlib to visualize your results.
You can also just jump right into running the code examples provided on the cheat sheet. If you want to jump right into coding, be sure to also check out Yhat's data science IDE, Rodeo. If you've ever worked in RStudio, it's a very similar setup. You can download Rodeo for Windows, Mac or Linux here. Fun fact: as of v2.5.2, the Windows version comes with Python built-in (since installing Python on Windows can really be a pain.) Specifically, Rodeo ships with Continuum's Miniconda. You can read more about that here.