Deep Learning (DL) has been on the rise from quite a while now and it has proven to be of immense help to people on a multitude of topics. From Face Recognition to Automatic Speech Recognition (ASR) to Autonomous Driving, DL Engineers have enjoyed the success in augmenting the capabilities of an engineering system to make intelligent decisions. These decisions can, thus, be an aid to subject-matter experts like doctors, in carrying out patient diagnosis, assisted-surgeries, or medicinal recommendations based on past patient medical history. Thus, the purpose of Artificial Intelligence (AI) boils down to reducing the amount of manual laborious tasks people in multiple domains have to perform on a routinely basis.

One such application of AI is in medicine and healthcare where the task of the AI system is to look at multiple chest X rays of patients and diagnose whether they have Pneumonia or not. This can save doctors tons of time as they won’t have to manually screen through several X rays and classify them one by one. Sounds interesting, let’s see how this feat can be accomplished! 😀

What is Pneumonia?

Pneumonia is a medical condition whereby the alveoli (air sacs) in the lungs inflame due to pus or other fluids, either due to a bacterial or a viral infection. This can lead to difficulties in breathing since enough oxygen doesn’t reach the bloodstream. A doctor would then suggest getting a chest X ray done in order to verify whether you have Pneumonia or not and how far it might have spread. The figure below shows the difference between a normal chest X ray and that of one suffering from Pneumonia.

Left: Normal X ray, Right: Pneumonia X ray

Exploring the Dataset

The dataset used for this project has been taken from a Kaggle repository titled Chest X-Ray Images (Pneumonia) The dataset consists of two classes namely NORMAL and PNEUMONIA for each of the train, test and val directories as shown below:

Dataset directory structure


I used a Google Colab (GPU) backend environment to run the project and hence used the Kaggle API to load the dataset directly to the environment instead of downloading it to disk. The steps to reproduce the same along with the entire source code used in this project can be found here. With that out of the way, let’s get started!

First, let’s import the necessary libraries.

# Importing the necessary libraries
import os
import cv2
import keras
import itertools

import numpy as np 
import tensorflow as tf

from PIL import Image
from time import time
from glob import glob
from tqdm import tqdm
from google.colab import files
from keras.preprocessing import image
from sklearn.metrics import confusion_matrix
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.initializers import glorot_uniform
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Dropout, Dense, Activation, Flatten

from matplotlib import pyplot as plt'ggplot')

Visualizing the Dataset

Let’s begin by visualizing the images in the dataset as well as the dataset distribution per class. All the necessary helper functions have been implemented for this. Let’s begin by calling them 😀

# Visualizing the first num_pics image in the training and test sets 
show_image(dataset["train_NORMAL"], num_pics=1, dataset="train", label="NORMAL")
show_image(dataset["train_PNEUMONIA"], num_pics=1, dataset="train", label="PNEUMONIA")

show_image(dataset["test_NORMAL"], num_pics=1, dataset="test", label="NORMAL")
show_image(dataset["test_PNEUMONIA"], num_pics=1, dataset="test", label="PNEUMONIA")

This plots the first image in the training and test sets as shown:

# Visualizing the datasets
x_label = "Images"
y_label = "Number of images"
title = "Distribution of images in the dataset"
visualize_dataset_distribution(dataset=dataset, x_label=x_label, y_label=y_label, title=title)

This displays the per-class distribution of the dataset as a bar graph:

Preprocessing the dataset

With image datasets in applications like Computer Vision, it is almost always the norm to preprocess the dataset by resizing, reshaping and/or normalizing the images so that the pixel values lie between 0 and 1. Here we resize the stock images to a shape of (224,224,3) so that they have a height of 224 pixels, a width of 224 pixels and 3 color channels i.e. RGB. These dimensions need to be coherent with the input of our Convolutional Neural Network as we’ll see later. We also divide the individual pixel values by 255 so that they lie in the interval [0,1].

def transform_dataset(normal_path:str, pneumonia_path:str):
  # Routine to convert and return the X and y datasets as np.ndarrays after some preprocessing
  X, y = [], []
  # Transform each stock image into a 224x224 RGB image and then into a vector of the same size but normalized between 0 and 1
  for img in tqdm(glob(os.path.join(normal_path, "*.jpeg"))):
    img = cv2.imread(str(img))
    img = cv2.resize(img, (224,224))
    if img.shape[2] ==1:
      img = np.dstack([img, img, img])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.astype(np.float32)/255.
  # Target labels -> 1 = Pneumonia, 0 = Normal
  initial = len(X)
  y_a = np.zeros(initial)

  for img in tqdm(glob(os.path.join(pneumonia_path, "*.jpeg"))):
    img = cv2.imread(str(img))
    img = cv2.resize(img, (224,224))
    if img.shape[2] ==1:
      img = np.dstack([img, img, img])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.astype(np.float32)/255.

  final = len(X)
  y = np.concatenate((y_a, np.ones(final-initial)))
  y = np.reshape(y, (y.shape[0],1))
  X = np.array(X)
  return X, y

Let’s call the helper functions to generate the required datasets:

# Generate the training sets
X_train, y_train = transform_dataset(normal_path=train_NORMAL, pneumonia_path=train_PNEUMONIA)
X_test, y_test = transform_dataset(normal_path=test_NORMAL, pneumonia_path=test_PNEUMONIA)
X_val, y_val = transform_dataset(normal_path=val_NORMAL, pneumonia_path=val_PNEUMONIA)

X_train = tf.convert_to_tensor(X_train)
y_train = tf.convert_to_tensor(y_train)
X_test = tf.convert_to_tensor(X_test)
y_test = tf.convert_to_tensor(y_test)
X_val = tf.convert_to_tensor(X_val)
y_val = tf.convert_to_tensor(y_val)

These steps commands take some while to execute since we are doing a lot of processing on around 6000 images! The tf.convert_to_tensor() method is used to convert numpy arrays into tensors which will then be input to our model. After this, the shapes of our datasets look like this:

X_train has shape  (5216, 224, 224, 3)
y_train has shape  (5216, 1)
X_test has shape  (624, 224, 224, 3)
y_test has shape  (624, 1)
X_val has shape  (16, 224, 224, 3)
y_val has shape  (16, 1)

Building the Model Architecture

Now’s the time to define our model architecture and create some magic with it!

The architecture used in this project and the associated values for the parameters were taken from this paper title “XCOVNet: Chest X‑ray Image Classifcation for COVID‑19 Early Detection Using Convolutional Neural Networks” by Madaan, Vishu et al. The original research paper has been cited at the end of this article.

CNN Model Architecture

We will be using the Keras API for tensorflow to construct and build our model with the following parameters:

  • A fixed kernel size of 3×3 for all the Conv2D layers in the model.
  • Conv2D blocks consisting of same convolutions followed by Dropout with a dropout rate of 0.2 and MaxPool2D with a pool size of 2×2.
  • Xavier aka. Glorot Initializer is used to initialize the weights of all the kernels.
  • ReLU activation function is used except for the output layer which consists of the sigmoid activation function since this is a binary classification problem.
def model(input_shape:tuple=(224,224,3), classes=1, initializer=glorot_uniform):
  # Routine to define the model architecture here

  # Define the input to be a tensor of shape input_shape
  input = Input(input_shape)

  # First hidden layer
  X = Conv2D(filters=32, kernel_size=(3,3), padding='same', activation = 'relu', kernel_initializer = initializer(seed=0))(input)
  X = Dropout(rate=0.2)(X)
  X = MaxPooling2D((2,2))(X)

  # Second hidden layer
  X = Conv2D(filters=64, kernel_size=(3,3), padding='same', activation = 'relu', kernel_initializer = initializer(seed=0))(X)
  X = Dropout(rate=0.2)(X)
  X = MaxPooling2D((2,2))(X)

  # Third hidden layer
  X = Conv2D(filters=64, kernel_size=(3,3), padding='same', activation = 'relu', kernel_initializer = initializer(seed=0))(X)
  X = Dropout(rate=0.2)(X)
  X = MaxPooling2D((2,2))(X)

  # Output layer
  X = Conv2D(filters=128, kernel_size=(3,3), padding='same', activation = 'relu', kernel_initializer = initializer(seed=0))(X)
  X = Dropout(rate=0.2)(X)
  X = Flatten()(X)
  X = Dense(units=2048, activation='relu')(X)
  X = Dropout(rate=0.2)(X)
  X = Dense(units=1024, activation='relu')(X)
  X = Dropout(rate=0.2)(X)
  X = Dense(units=512, activation='relu')(X)
  X = Dropout(rate=0.2)(X)
  output = Dense(classes, activation='sigmoid', kernel_initializer = glorot_uniform(seed=0))(X)

  # Create model
  model = Model(inputs = input, outputs = output)
  return model

Let’s build the model and print out the model summary using model.summary()

# Let's build the model and look at its summary
XNet = model(input_shape=(224,224,3), initializer=glorot_uniform)
Model summary

Now that we have built our model, let’s define a few other hyperparameters and begin compiling our model!

# Define the optimizer and other hyperparameters and metrics and compile the model
checkpoint_filepath = "best_model"
optimizer = Adam(learning_rate=0.0001, decay=1e-5)
early_stopping = EarlyStopping(patience=5)
checkpoint = ModelCheckpoint(filepath=checkpoint_filepath, monitor = 'val_accuracy', mode = "max", save_best_only=True, save_weights_only=True)
XNet.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=optimizer)

Let’s decipher what these messy hyperparameters mean:

  • BATCH_SIZE defines the number of training examples to loop over in one iteration or step of the training process.
  • EPOCHS defines the total number of complete passes through the entire training set
  • Adam optimizer is used for backpropagation, in order to adjust the weights of all the neurons in the network, with a learning rate of 0.0001 and a decay rate of 1e-5
  • EarlyStopping is used in order to stop the training process earlier if the generalisation gap i.e. the gap between the training loss and the validation loss begins to increase. This trend is an indication of overfitting and so EarlyStopping is thus used to prevent overfitting.
  • ModelCheckpoint monitors the validation accuracy during the training process and is responsible for saving the best performing model’s weights
  • The model XNet is compiled with a Binary Cross Entropy loss function since this is a binary classification task.

Pheww! That’s a LOT of gibberish. Let’s get on with fitting the model to our data where the actual magic happens!

Before we begin fitting our model, ensure you are hooked up to a beefy compute environment i.e. a GPU since training CNNs can take up a lot of time and resources. With that, let’s begin:

# Fit the model and save the history
history =
    x = X_train, 
    y = y_train, 
    epochs = EPOCHS, 
    batch_size = BATCH_SIZE,
    steps_per_epoch = num_training_steps,
    validation_data = (X_val, y_val),
    callbacks = [early_stopping, checkpoint]


Once the model finishes training, we can load the best performing model according to the criteria we set earlier and see how it performs on unseen data:

# Load the weights for the best performing model based on the set requirements

# Evaluation on the test set
test_loss, test_score = XNet.evaluate(
print("Loss on test set: ", test_loss)
print("Accuracy on test set: ", test_score)
Evaluation on Test Set

As we can see the model has an Accuracy of 77% on the test set which is decent enough given that we had a highly imbalanced dataset with only a few thousands of images. Computer Vision tasks require image datasets with numbers in tens and hundreds of thousands for accurate predictions. Since the dataset had a highly imbalanced distribution, some better evaluation metrics could also be the Precision and Recall.

  • Precision determines what proportion of positive inferences were actually correct.
  • Recall determines what proportion of actual positives were correctly inferred as correct.

Let’s plot out the model’s history, as well as the confusion matrix in order to calculate the precision and recall:

# Lets plot the model's history including the confusion matrix
# Summarize history for accuracy
plt.title('Model Accuracy')
plt.legend(['train', 'test'], loc='upper left')

# Summarize history for loss
plt.title('Model Loss')
plt.legend(['train', 'test'], loc='upper left')

y_pred = XNet.predict(X_test)
y_pred = (y_pred > 0.5)
cnf_matrix = confusion_matrix(y_test, y_pred, labels = [0. , 1.])
plot_confusion_matrix_self(cnf_matrix, classes = ["Normal" , "Pneumonia"], title = "Confusion Matrix for CNN Model", cmap =

# Calculate Precision and Recall
tn, fp, fn, tp = cnf_matrix.ravel()
precision = tp/(tp+fp)
recall = tp/(tp+fn)

print("Recall of the model is {:.2f}".format(recall))
print("Precision of the model is {:.2f}".format(precision))
Train and Validation Accuracy
Train and Validation Loss
Confusion Matrix
Precision and Recall

The model achieved a Recall of 99% and a Precision of 73%.

Testing on-the-fly

One last bonus round before we wrap this article up. You can also choose to pick a photo manually and upload it to the notebook to run inference on the image and see how the model actually performs on real-world unseen images. To do this, I have written a helper routine which prompts the user to upload an image and runs the inference on it.

def infer_on_the_fly(model):
  # Routine to allow a user to upload images for inference

  uploaded = files.upload()
  # Transform each stock image into a 224x224 RGB image and 
  # then into a vector of the same size but normalized between 0 and 1
  for file_name in uploaded.keys():
    path = '/content/' + file_name
    img = cv2.imread(path)
    img = cv2.resize(img, (224,224))
    if img.shape[2] ==1:
      img = np.dstack([img, img, img])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.astype(np.float32)/255.
    X = np.expand_dims(img, axis=0)
    X = np.array(X)
    X = tf.convert_to_tensor(X)
    y = model.predict(X, batch_size = 32)
    # Set a threshold of 0.5 for classification into Pneumonia (1) or Not Pneumonia (0)
    if y[0]> 0.5:
      print(file_name + ' is pneumonia with {} probability'.format(y[0]))
      print(file_name + 'is normal with {} probability'.format(y[0]))

Here, I uploaded a picture from the test/PNEUMONIA directory of the dataset and set the threshold probability to be 0.5, above which the model accurately predicted the image as Pneumonia with an inference probability of 0.998!

Model predicts Pneumonia with 0.99 probability


With data augmentation techniques such as random zoom and vertical flips, we can synthesize more data to fix the skew of our dataset for a more fine-tuned approach. The model that we implemented here was actually used in order to detect COVID-19 among patients using their chest x rays. Perhaps other architectures such as VGG-19 or Xception can be used to see whether they perform better on the given dataset. Transfer Learning is another interesting concept that can be applied in this case e.g. using models trained on the ImageNet database e.g. AlexNet. The shallower layers in these networks have already learned to extract low-level features from images such as edges, lines, shapes etc. These can be used to initialize the weights of our custom architecture for better results, which can also fix the problem of having a relatively smaller dataset. The possibilities are endless!

If you find any of these techniques useful or have others in mind, please do mention them down in the comments.

Paper Reference: Madaan, V., Roy, A., Gupta, C., Agrawal, P., Sharma, A., Bologa, C., & Prodan, R. (2021). XCOVNet: Chest X-ray Image Classification for COVID-19 Early Detection Using Convolutional Neural Networks. New generation computing, 1–15. Advance online publication.

Leave a Reply

Your email address will not be published. Required fields are marked *