TensorFlow is an open-source software library from Google. It was meant for dataflow programming across a range of tasks. It is a symbolic math library, and is largely used for machine learning applications such as neural networks. Originally it was developed by the Google Brain team for internal Google use. As the AI research community got more and more collaborative, Tensorflow was released under the Apache 2.0 open source license.

Tensorflow and its component Keras, are vastly used in implementing Deep Learning algorithms. Like most machine learning libraries, Tensorflow is "concept-heavy and code-lite". The syntax is not very difficult to learn. But its concepts are very important. By design, Tensorflow is based on lazy execution (though we can force eager execution). That means, it does not actually process the data available till it has to. It just gathers all the information that we feed into it. It processes only when we finally ask it to process it.


To share a brief introduction to the basic ideas, we can look into an implementation of a simple problem. MINST (Modified National Institute of Standards and Technology) gives a good dataset for of handwritten digits from 0 to 9. We can use this to train a neural network - and build a model that can read and decode handwritten digits.

This problem is often called the "Hello World" of Deep Learning. Ofcourse, we need a lot more for developing "real" applications. This is good to give you an introduction to the topic. Of course, there is a lot more to Tensorflow. If you are interested in a detailed study, you can take up an online course.


Like any Python script, we start with importing the required libraries. Tensorflow and Numpy in this case.

import tensorflow as tf
import numpy as np

The next step is to load the available data. Tensorflow provides us with a good set of test datesets that we can use for learning and testing. The MINST dataset is also available in these. So the job of fetching the training and test data is quite simple in this case. In real life problems, accumulating, cleaning and loading such data is a major part of the work. Here we do it in just one line.

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

This loads the data into 4 arrays. The train images, train labels, test images and test labels. As the names suggest, the first two are used to train the network and the other two are used to test it. The images are 28x28 pixel images and the labels give us a number from 0 to 9. Ofcourse, any real data would not have all the input images that fit into such a template. Each image would have different dimension and labeling pattern. Thus, a major step after loading the data is to alter the datasets to match the requirements of our network.

train_images.shape    # 60000x28x28
train_labels.shape    # 60000,
test_images.shape     # 10000x28x28
test_labels.shape     # 10000,

A typical neural network would require a training set to have input of an array (training data index) of one dimensional arrays (input field index). And an training labels as an array (training data index) of single dimensional arrays (binary value for each output possibility). In this case, we have in input array of 60000x28x28 and output of 60000x1. This should be corrected. Numpy provides methods that can help us do that.

train_images = train_images.reshape(-1,784)
test_images = test_images.reshape(-1,784)

This changes the 28x28 array of each image to a single dimensional array of 784. Next we change the label data. The current labels have a number (0-9) for each input image. We need this to be an array of binary fields corresponding to each possible digit. Tensorflow provides simple methods for doing that.

test_labels = tf.keras.utils.to_categorical(test_labels)
train_labels = tf.keras.utils.to_categorical(train_labels)

Data Augmentation

Machine learning algorithms are greedy! Any amount of data is less. In order to feed this greed, we need to generate more data - from the data available. This process is called data augmentation. There is no miracle here. We just use our knowledge about the data to generate more training information. For example, The number 2 in a bitmap would still be 2 if the image is shifted a few pixels left, right, up or down. For us, this is an obvious information. But for the machine, each of these forms new input data. We can also twist the images or rotate them to generate more data. We know that all these changes do not alter the label of the image. Here we can demonstrate the four simple change of shifting the image in four directions. We can do more processing in a real problem. But this should suffice for an example. We collect this altered data into new arrays called augmented_images and augmented_labels

Shift up

A = np.delete(train_images, np.s_[:28:], 1)
A = np.insert(A, [A.shape[1]] * 28, [0]*28, 1)
augmented_images = np.append(train_images, A, axis=0)
augmented_labels = np.append(train_labels, train_labels, axis=0)

Shift down:

A = np.delete(train_images, np.s_[-28:], 1)
A = np.insert(A, [0] * 28, [0] * 28, 1)
augmented_images = np.append(augmented_images, A, axis=0)
augmented_labels = np.append(augmented_labels, train_labels, axis=0)

Shift right:

A = np.delete(train_images, np.s_[-2:], 1)
A = np.insert(A, [0, 0], [0, 0], 1)
augmented_images = np.append(augmented_images, A, axis=0)
augmented_labels = np.append(augmented_labels, train_labels, axis=0)

Shift left:

A = np.delete(train_images, np.s_[:2:], 1)
A = np.insert(A, [A.shape[1],A.shape[1]], [0, 0], 1)
augmented_images = np.append(augmented_images, A, axis=0)
augmented_labels = np.append(augmented_labels, train_labels, axis=0)

Thus, we shift generate more data by shifting the image by 2 pixels in each side. Of course the labels remain same. And are just appended to the array over and over for 4 times - generating the augmented_labels array. For an image of 28 pixels, 2 pixel change is quite good. Let us now peep into what we have got:

augmented_images.shape    # (300000, 784)
augmented_labels.shape    # (300000, 10)

test_images.shape         # (10000, 784)
test_labels.shape         # (10000, 10)

Thus, we have augmented an input data of 60000 records into 300000 records. Note that the length of each images / label array couple has to be same.

But we have a problem. We just appended the augmented data to the training data. That means, the input data has 5 chunks of arrays - each with a direction bias. Each chunk has all images pushed to one side or to the center. That is not a "random distribution". We need to solve this problem by shuffling the data. But how do we shuffle the training images as well as the training labels - without altering the correspondence?

Don't worry. Python helps us do that.

train_data = np.c_[augmented_images.reshape(len(augmented_images), -1), augmented_labels.reshape(len(augmented_labels), -1)]
augmented_images = train_data[:, :augmented_images.size//len(augmented_images)].reshape(augmented_images.shape)
augmented_labels = train_data[:, augmented_images.size//len(augmented_images):].reshape(augmented_labels.shape)

Essentially, we append each label to the corresponding image (both are just numbers). Then we shuffle the data and extract the labels again. Don't worry if you find the above code too complex. It works!


After doing all this, we can now start with the job of building the network and training it. This code is a lot simpler than one could imagine. Keras is a part of Tensorflow that helps us build network models and train them. We can simply instantiate and add layers to the network model. Let us start with instantiating a simple model.

model = tf.keras.Sequential()

The first layer in the network has to take 784 inputs - corresponding to the 784 input values in each training image. The activation function is typically Relu for the intermediate layers and sigmoid for the final layer. We will do the same without questioning the basics at this point.

model.add(tf.keras.layers.Dense(18, activation=tf.nn.relu, input_shape=(784,)))

Now, we add the following layers. The size and the number of layers here depend upon judgement and experience. This is a very important component of the success of any application. For life size problems, researchers have suggested various different architectures for the kind of problems. Since this problem is not so big, we can in fact play around with these aspects of the network and check how it performs. I liked this set of layers. You can (and should) try to play around with these and see for yourself how it affects the output.

model.add(tf.keras.layers.Dense(11, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(5, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))

Note that the intermediate layers have activation function of Relu and the last one is SoftMax. Now that we have built the network, we can go ahead with "compiling" the model. The loss function and the optimizer, again, are chosen by experts based on experience. We can play around with some of these to see how they affect the output.

model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

Fit the Model

Finally, we do the "real" job of training the model.

model.fit(augmented_images, augmented_labels, epochs=20)

This takes some time. As it processes the data, it prints out the status of the processing:

300000/300000 [==============================] - 16s 52us/step - loss: 0.4008
Epoch 19/20
149280/300000 [==============================] - 16s 52us/step - loss: 0.3915
Epoch 20/20
300000/300000 [==============================] - 16s 52us/step - loss: 0.3875

And we are done! If all we did above was good, we should have a good model that does a good job. With data of this volume, It is difficult to have training loss of 0. In fact if loss is 0, quite likely we are overfitting.

Validate the Model

We can check how we have done by checking the final loss and accuracy. On the training data:

300000/300000 [==============================] - 9s 29us/step

On the test data:

10000/10000 [==============================] - 1s 135us/step

That is quite good. But that is how Tensofflow sees the loss. To get a feel of what it has done, let us see in raw code how the model has learnt. The code below compares the output of the model for the test data and compares it with the real values provided by the test labels.

predictions = model.predict(test_images)
errors = 0

for i in range(predictions.shape[0]):
  m = max(predictions[i])
  for j in range(10):
    if ((predictions[i][j] != m) and (test_labels[i][j] ==1 )):
      errors = errors + 1

print("Error rate: ", 100 * errors / predictions.shape[0])

This gives us the output:

Error rate:  6.88

Thus, 93% of the test images were classified correctly. You can play around with the code above - try more augmentation, try to change the network model.. to obtain a better accuracy.