Introduction to Python


Innumerable technologies and programing languages are available for developers to learn and master. A life time may not suffice to learn them all! But, Techies still have the inner urge to learn and master newer and newer technologies - for the fun.

Python is one such language that is easy to learn and fun to code. Python has inherited the best of the practices from its counterparts, and has corrected a lot of issues they contain. Python is fun, Python is easy and Python is rich! It is backed by a huge open source community that ensures extensive support. It is ideal for rapid prototyping and fast development.

Core Python


This series of blogs on Python were compiled as I was trying to learn the language. I will present them here, for someone who might want a quick introduction to the language, without digging through all the manuals. This is not a 'Complete Reference' nor is it a 'Python for Dummies'. It is meant for someone who understands development and wants to peep into the world of Python.

Python is infinite. You can go on learning different aspects of the language - and I can assure you that the language will grow faster than you can learn!

NumPy


NumPy is the most important library in any computational task in Python. It combines the ease and flexibility of scripting with very high performance. It is the most basic library used in any analytics or machine learning task in Python. Anyone interested in doing some real work in Machine Learning (not just talking about it), must be conversant with this library.

This is just a top level overview of the library. A developer should make all attempts to master this library. Unfortunately the documentation for this library is not so good. There is a limit to tutorials and videos. I would recommend NumPy Cookbook - by Ivan Idris for someone who is genuinely interested in mastering the code.

 

ScikitLearn


The SciKitLearn library has a chunk of ready implementations of most basic Machine Learning algorithms. Most of the Machine Learning libraries are based on the principle of "concept-heavy and code-lite'. Once we understand the concepts well, the syntax of implementation is quite simple. ScikitLearn offors ready configurable classes for most of the algorithms. We just instantiate and then "fit" the model to the training data and then verify with the test data. All this can be achieved in just a couple of lines of code.

Pandas


Machine learning requires huge amounts of data and it requires an efficient way for processing this huge amount of data. Pandas helps us with the latter. It provides efficient data structures like Series, DataFrame and Panel for processing one, two and three dimensional data structures. It provides a good chunk of methods for manipulating the data in these structures.

It has a good functionality for statistical processing of this data. It provides for indexing, selecting, grouping and filtering the data by columns and values - virtually everything that one would want to do while processing data. Pandas is also extensible and it has builtin capabilities to allow us to add more functionality. It works very well with its cousins - NumPy and Tensorflow. All this makes it the library of choice for data handling.

TensorFlow


TensorFlow is an open-source software library from Google. It was meant for dataflow programming across a range of tasks. It is a symbolic math library, and is largely used for machine learning applications such as neural networks. Originally it was developed by the Google Brain team for internal Google use. As the AI research community got more and more collaborative, Tensorflow was released under the Apache 2.0 open source license.

Tensorflow and its component Keras, are vastly used in implementing Deep Learning algorithms. Like most machine learning libraries, Tensorflow is "concept-heavy and code-lite". The syntax is not very difficult to learn. But its concepts are very important. By design, Tensorflow is based on lazy execution (though we can force eager execution). That means, it does not actually process the data available till it has to. It just gathers all the information that we feed into it. It processes only when we finally ask it to process it.