I've found that one of the most difficult parts of using the Scientific Python libraries is getting them installed and setup on my computer. pandas, scipy, numpy, and sklearn make heavy use of C/C++ extensions which can be difficult to compile and configure on whatever flavor of OS you use. In this post I'll go over the easiest way to install the libraries you need to get up and running with Scientific Python.
Step 1 is getting Python of course! If you don't already have Python2.7 installed on your computer, select one of these distributions from python.org. Double-check to make sure you don't already have Python 2.7 installed, many UNIX distributions ship with it.
- Python 2.7.3 Windows Installer
- Python 2.7.3 Windows X86-64 Installer
- Python 2.7.3 Mac OS X 64-bit/32-bit x86-64/i386 Installer
- Python 2.7.3 Mac OS X 32-bit i386/PPC Installer
- Python 2.7.3 compressed source tarball (for Linux, Unix or Mac OS X)
- Python 2.7.3 bzipped source tarball (for Linux, Unix or Mac OS X, more compressed)
Next you need to install
pip, Python package manager.
$ curl http://python-distribute.org/distribute_setup.py $ python distribute_setup.py
Download Christoph Gohlke's installer
$ apt-get install python-pip
$ yum -y install python-pip
pip is on your PATH. If it isn't, add the
python/scripts directory to your PATH.
Enthought Free Distribution
Enthought, which provides commercial support for Scientific Python, is nice enough to publish an installer that works on Windows, OSX, and Linux. This eliminates a lot of headaches of having to compile libraries and ensures you get the most stable versions. There are different tiers of installers, including paid versions, but for most people the free version is all you'll need. They're website is a little tricky to navigate (they sort of funnel you to the non-free versions), but here's the page you want. Select the distribution for your OS and it'll start the download. This part could take a while. Packed into the installer are the following libraries:
Installing sklearn, statsmodels, and patsy
Now that we've got core libraries installed, it's time to add some fun stats packages. The Enthought distribution took care of the compiled dependencies.
pip makes installing these libraries a breeze:
$ pip install --upgrade scikit-learn
$ pip install --upgrade statsmodels
$ pip install --upgrade patsy
These libraries are going to start spitting out a lot of garbage into the terminal during the install. Don't worry, this is normal! You might want to take this time to have someone non-technical come by your computer. They'll most likely assume that you're in The Matrix.