The Yhat Blog


machine learning, data science, engineering


Setting Up Scientific Python

by yhat |


I've found that one of the most difficult parts of using the Scientific Python libraries is getting them installed and setup on my computer. pandas, scipy, numpy, and sklearn make heavy use of C/C++ extensions which can be difficult to compile and configure on whatever flavor of OS you use. In this post I'll go over the easiest way to install the libraries you need to get up and running with Scientific Python.

Getting Python

Step 1 is getting Python of course! If you don't already have Python2.7 installed on your computer, select one of these distributions from python.org. Double-check to make sure you don't already have Python 2.7 installed, many UNIX distributions ship with it.

Installing pip

Next you need to install pip, Python package manager.

OSX

$ curl http://python-distribute.org/distribute_setup.py
$ python distribute_setup.py

Windows

Download Christoph Gohlke's installer

Linux

Debian, Ubuntu
$ apt-get install python-pip
CentOS, Fedora
$ yum -y install python-pip

Make sure pip is on your PATH. If it isn't, add the python/scripts directory to your PATH.

Enthought Free Distribution

Enthought, which provides commercial support for Scientific Python, is nice enough to publish an installer that works on Windows, OSX, and Linux. This eliminates a lot of headaches of having to compile libraries and ensures you get the most stable versions. There are different tiers of installers, including paid versions, but for most people the free version is all you'll need. They're website is a little tricky to navigate (they sort of funnel you to the non-free versions), but here's the page you want. Select the distribution for your OS and it'll start the download. This part could take a while. Packed into the installer are the following libraries:

  • scipy
  • numpy
  • ipython
  • matplotlib
  • pandas
  • sympy
  • nose
  • traits
  • chaco
Once the download finishes double-clicking the installer will get you setup with everything--including adding all libraries to your PYTHON PATH.

Installing sklearn, statsmodels, and patsy

Now that we've got core libraries installed, it's time to add some fun stats packages. The Enthought distribution took care of the compiled dependencies. pip makes installing these libraries a breeze:

$ pip install --upgrade scikit-learn
$ pip install --upgrade statsmodels
$ pip install --upgrade patsy

These libraries are going to start spitting out a lot of garbage into the terminal during the install. Don't worry, this is normal! You might want to take this time to have someone non-technical come by your computer. They'll most likely assume that you're in The Matrix.

Conclusion

That's it! You should be ready to go. If you run into any problems (typically happens if you have previous versions of libraries installed), check stackoverflow or the numpy/pandas/sklearn docs.



Our Products


Rodeo: a native Python editor built for doing data science on your desktop.

Download it now!

ScienceOps: deploy predictive models in production applications without IT.

Learn More

Yhat (pronounced Y-hat) provides data science solutions that let data scientists deploy and integrate predictive models into applications without IT or custom coding.