Keras is a high level, very powerful, and easy to use library for deep learning in Python. It runs on top of multiple other libraries and uses them as a backend, such as Tensorflow or Theano. I will be using a Tensorflow backend, but it should run the same no matter what backend is used.
Models (or networks) in Keras are defined as a sequence of layers. A Sequential model is created first and layers are added one at a time until you are happy with the network topology.
X = test_samples #shape is 100 rows by 8 columns
Y = test_labels #shape is 100 by 3 (one-hot encoding)
X_val = validation_samples
Y_val = validation_labels
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10)
scores = model.evaluate(X, Y)
Lets say we have 100 samples and 8 variables per sample and there are 3 possible categories to which the samples belong. We also have a small set of samples set aside for validation of the model. Lets build a simple fully connected network structure. I will go through the above code step by step
X = test_samples #shape is 100 rows by 8 columns
Y = test_labels #shape is 100 by 3 (one-hot encoding)
X_val = validation_samples
Y_val = validation_labels
model = Sequential()
Add the input and first hidden layers. Here we use a fully connected layer which is defined by the 'Dense' class in Keras. We tell the model to create an input layer with 8 nodes by setting the input_dim
variable to 8. The 12 tells it to create a Dense hidden layer with 12 nodes. The 'relu' tells the layer to use the 'relu' activation function (more on that later).
model.add(Dense(12, input_dim=8, activation='relu'))
Add another fully connected hidden layer. This time with 8 nodes, still with the relu activation function. Notice that we did not have to set the input dimension. You only set the input dimension for the first layer added.
model.add(Dense(8, activation='relu'))
Add the output layer. It has 3 nodes, one for each possible category. It uses the 'softmax' activation function (recommended for multi-class classification).
model.add(Dense(3, activation='softmax'))
Compile the model. Now that the network is defined, we have to compile it. This translates the model from Keras into the specific backend being used (Tensorflow in my case).
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Fit the model. This step is where the model is actually trained on the data.
model.fit(X, Y, epochs=150, batch_size=10)
Test the accuracy. Here we are testing the accuracy with the validatiion samples we kept out of the training dataset.
scores = model.evaluate(X_val, Y_val)
There are many choices to be made when adding layers, starting with the type of layer to add. You can find out more about the existing layers here and you will learn more about the different layers and their options later. The main layers we are interested in are as follows:
The activation function. The activation function is what decides whether or not a neuron/node should be activated. Here are the available activation functions The most popular ones are:
The compiling step prepares the model to be run by the backend. For this step we need to select a few options. Keras documentation on compile
categorical_crossentropy - for multi category label predictions
The fitting step trains the model on the input data. For this step we need to select a few options. Keras documentation on fit
epochs - this is the number of times the model is exposed to the training set. At each iteration the optimizer tries to adjust the weights so that the loss function is minimized
batch_size - This is the number of training instances observed before the optimizer performs a weight update.
The input data (both samples and the labels) needs to be in the datatype format of 'float32'. This can be set using the .astype()
function from numpy.
train_images = train_images.astype('float32')
The input labels can be converted from a simple list to one-hot encoding using the to_categorical
function from keras. The labels should be in the format of a numpy array.
from keras.utils import to_categorical
train_labels = to_categorical(train_labels)
When setting up the shape of your input data, the first axis is generally the 'samples' axis. It should be equal to the number of samples in your data. The other values represent the shape of the samples. Example of a data set with 60,000 samples and each sample is a 28 by 28 matrix:
print(train_images.shape)
(60000, 28, 28)
You may need to rearrange your input data into a new shape. This can be done with the reshape()
function in numpy. This is applicable, for example, when using fully connected layers with image data. You may need to convert the image from a matrix to a vector.
train_images = train_images.reshape((60000, 28 * 28))
Please continue on to Multi-layer Perceptrons in Keras.