To write any Machine learning algorithms or perform any data mining task, one needs to be equipped with right set of computational tools. These tools vary from user friendly syntaxes to powerful and highly robust and optimized libraries. Variety of toolkits are present and are being used by people depending upon their comfort.
First comes the selecting the working environment. From my past experiences, I prefer using Linux as most software packages are really easy to install on Linux than windows.
I use a Ubuntu Precise 12.04 OS which ofcourse is free to download and use. I faced some issues in configuring usage of my GPU, but a simple google search was enough to sort out the problem.
Then comes the preferred programming language. There are quite a few options available. I considered few of them:
Hence my final work environment consisted of following pieces:
First comes the selecting the working environment. From my past experiences, I prefer using Linux as most software packages are really easy to install on Linux than windows.
I use a Ubuntu Precise 12.04 OS which ofcourse is free to download and use. I faced some issues in configuring usage of my GPU, but a simple google search was enough to sort out the problem.
Then comes the preferred programming language. There are quite a few options available. I considered few of them:
- Java: Has many machine learning libraries and tons of other packages, and robust support on stack overflow.
- Matlab/ Octave: Powerful rapid prototyping tool. Lacks deployment capabilities.
- Python: A very powerful user friendly interpreter language with very well written documentation and several very well known packages that turn out to be data-scientists' heaven. Python has lesser learning curve than other languages and makes implementation easier. Syntaxes are intuitive and support is readily available.
- R : No doubt there there isn't any powerful statistical tool than R. It provides libraries that are easy to use, yet very powerful. R is popular choice among data scientists. It syntaxes deals everything in terms of vectors and arrays, making its very foundation mathematical.
Hence my final work environment consisted of following pieces:
- Python and RPy( R's python interface)
- SciPy / NumPy: These packages need no introduction.
- Pandas for Python: statistical package
- Scikit Sklearn: Machine Learning package for python
- IPython and NBviewer: An extraordinary python editor; allows piece wise code execution apart from direct markdown and magic commands.
- Theano: A very useful symbolic variable oriented python pacakage; use for implementing deep learning algorithms.
- PlotLy: easy to use python plotting library. Save graphs on cloud and allows lots of GUI editing later.
- ObspY: A very standard python seismic analysis library.
No comments:
Post a Comment