Deep Learning (DL) has been on the rise from quite a while now and it has proven to be of immense help to people on a multitude of topics. From Face Recognition to Automatic Speech Recognition (ASR) to Autonomous Driving, DL Engineers have enjoyed the success in augmenting the capabilities of an engineering system to make intelligent decisions. These decisions can, thus, be an aid to subject-matter experts like doctors, in carrying out patient diagnosis, assisted-surgeries, or medicinal recommendations based on past patient medical history. Thus, the purpose of Artificial Intelligence (AI) boils down to reducing the amount of manual laborious tasks people in multiple domains have to perform on a routinely basis.
One such application of AI is in medicine and healthcare where the task of the AI system is to look at multiple chest X rays of patients and diagnose whether they have Pneumonia or not. This can save doctors tons of time as they won't have to manually screen through several X rays and classify them one by one. Sounds interesting, let's see how this feat can be accomplished! :D
What is Pneumonia?
Pneumonia is a medical condition whereby the alveoli (air sacs) in the lungs inflame due to pus or other fluids, either due to a bacterial or a viral infection. This can lead to difficulties in breathing since enough oxygen doesn't reach the bloodstream. A doctor would then suggest getting a chest X ray done in order to verify whether you have Pneumonia or not and how far it might have spread. The figure below shows the difference between a normal chest X ray and that of one suffering from Pneumonia.
Exploring the Dataset
The dataset used for this project has been taken from a Kaggle repository titled Chest X-Ray Images (Pneumonia) The dataset consists of two classes namely NORMAL and PNEUMONIA for each of the train, test and val directories as shown below:
I used a Google Colab (GPU) backend environment to run the project and hence used the Kaggle API to load the dataset directly to the environment instead of downloading it to disk. The steps to reproduce the same along with the entire source code used in this project can be found here. With that out of the way, let's get started!
First, let's import the necessary libraries.
Visualizing the Dataset
Let's begin by visualizing the images in the dataset as well as the dataset distribution per class. All the necessary helper functions have been implemented for this. Let's begin by calling them :D
# Visualizing the first num_pics image in the training and test sets show_image(dataset["train_NORMAL"], num_pics=1, dataset="train", label="NORMAL") show_image(dataset["train_PNEUMONIA"], num_pics=1, dataset="train", label="PNEUMONIA") show_image(dataset["test_NORMAL"], num_pics=1, dataset="test", label="NORMAL") show_image(dataset["test_PNEUMONIA"], num_pics=1, dataset="test", label="PNEUMONIA")
This plots the first image in the training and test sets as shown:
# Visualizing the datasets x_label = "Images" y_label = "Number of images" title = "Distribution of images in the dataset" visualize_dataset_distribution(dataset=dataset, x_label=x_label, y_label=y_label, title=title)
This displays the per-class distribution of the dataset as a bar graph:
Preprocessing the dataset
With image datasets in applications like Computer Vision, it is almost always the norm to preprocess the dataset by resizing, reshaping and/or normalizing the images so that the pixel values lie between 0 and 1. Here we resize the stock images to a shape of (224,224,3) so that they have a height of 224 pixels, a width of 224 pixels and 3 color channels i.e. RGB. These dimensions need to be coherent with the input of our Convolutional Neural Network as we'll see later. We also divide the individual pixel values by 255 so that they lie in the interval [0,1].
Let's call the helper functions to generate the required datasets:
# Generate the training sets X_train, y_train = transform_dataset(normal_path=train_NORMAL, pneumonia_path=train_PNEUMONIA) X_test, y_test = transform_dataset(normal_path=test_NORMAL, pneumonia_path=test_PNEUMONIA) X_val, y_val = transform_dataset(normal_path=val_NORMAL, pneumonia_path=val_PNEUMONIA) X_train = tf.convert_to_tensor(X_train) y_train = tf.convert_to_tensor(y_train) X_test = tf.convert_to_tensor(X_test) y_test = tf.convert_to_tensor(y_test) X_val = tf.convert_to_tensor(X_val) y_val = tf.convert_to_tensor(y_val)
These steps commands take some while to execute since we are doing a lot of processing on around 6000 images! The tf.convert_to_tensor() method is used to convert numpy arrays into tensors which will then be input to our model. After this, the shapes of our datasets look like this:
X_train has shape (5216, 224, 224, 3) y_train has shape (5216, 1) X_test has shape (624, 224, 224, 3) y_test has shape (624, 1) X_val has shape (16, 224, 224, 3) y_val has shape (16, 1)
Building the Model Architecture
Now's the time to define our model architecture and create some magic with it!
The architecture used in this project and the associated values for the parameters were taken from this paper title "XCOVNet: Chest X‑ray Image Classifcation for COVID‑19 Early Detection Using Convolutional Neural Networks" by Madaan, Vishu et al. The original research paper has been cited at the end of this article.
We will be using the Keras API for tensorflow to construct and build our model with the following parameters:
- A fixed kernel size of 3x3 for all the Conv2D layers in the model.
- Conv2D blocks consisting of same convolutions followed by Dropout with a dropout rate of 0.2 and MaxPool2D with a pool size of 2x2.
- Xavier aka. Glorot Initializer is used to initialize the weights of all the kernels.
- ReLU activation function is used except for the output layer which consists of the sigmoid activation function since this is a binary classification problem.
Let's build the model and print out the model summary using model.summary()
# Let's build the model and look at its summary XNet = model(input_shape=(224,224,3), initializer=glorot_uniform) XNet.summary()
Now that we have built our model, let's define a few other hyperparameters and begin compiling our model!
# Define the optimizer and other hyperparameters and metrics and compile the model BATCH_SIZE = 32 EPOCHS = 10 checkpoint_filepath = "best_model" optimizer = Adam(learning_rate=0.0001, decay=1e-5) early_stopping = EarlyStopping(patience=5) checkpoint = ModelCheckpoint(filepath=checkpoint_filepath, monitor = 'val_accuracy', mode = "max", save_best_only=True, save_weights_only=True) XNet.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=optimizer)
Let's decipher what these messy hyperparameters mean:
- BATCH_SIZE defines the number of training examples to loop over in one iteration or step of the training process.
- EPOCHS defines the total number of complete passes through the entire training set
- Adam optimizer is used for backpropagation, in order to adjust the weights of all the neurons in the network, with a learning rate of 0.0001 and a decay rate of 1e-5
- EarlyStopping is used in order to stop the training process earlier if the generalisation gap i.e. the gap between the training loss and the validation loss begins to increase. This trend is an indication of overfitting and so EarlyStopping is thus used to prevent overfitting.
- ModelCheckpoint monitors the validation accuracy during the training process and is responsible for saving the best performing model's weights
- The model XNet is compiled with a Binary Cross Entropy loss function since this is a binary classification task.
Pheww! That's a LOT of gibberish. Let's get on with fitting the model to our data where the actual magic happens!
Before we begin fitting our model, ensure you are hooked up to a beefy compute environment i.e. a GPU since training CNNs can take up a lot of time and resources. With that, let's begin:
# Fit the model and save the history history = XNet.fit( x = X_train, y = y_train, epochs = EPOCHS, batch_size = BATCH_SIZE, steps_per_epoch = num_training_steps, validation_data = (X_val, y_val), callbacks = [early_stopping, checkpoint]
Once the model finishes training, we can load the best performing model according to the criteria we set earlier and see how it performs on unseen data:
# Load the weights for the best performing model based on the set requirements XNet.load_weights("best_model") # Evaluation on the test set test_loss, test_score = XNet.evaluate( X_test, y_test, batch_size=BATCH_SIZE) print("Loss on test set: ", test_loss) print("Accuracy on test set: ", test_score)
As we can see the model has an Accuracy of 77% on the test set which is decent enough given that we had a highly imbalanced dataset with only a few thousands of images. Computer Vision tasks require image datasets with numbers in tens and hundreds of thousands for accurate predictions. Since the dataset had a highly imbalanced distribution, some better evaluation metrics could also be the Precision and Recall.
- Precision determines what proportion of positive inferences were actually correct.
- Recall determines what proportion of actual positives were correctly inferred as correct.
Let's plot out the model's history, as well as the confusion matrix in order to calculate the precision and recall:
The model achieved a Recall of 99% and a Precision of 73%.
One last bonus round before we wrap this article up. You can also choose to pick a photo manually and upload it to the notebook to run inference on the image and see how the model actually performs on real-world unseen images. To do this, I have written a helper routine which prompts the user to upload an image and runs the inference on it.
Here, I uploaded a picture from the test/PNEUMONIA directory of the dataset and set the threshold probability to be 0.5, above which the model accurately predicted the image as Pneumonia with an inference probability of 0.998!
With data augmentation techniques such as random zoom and vertical flips, we can synthesize more data to fix the skew of our dataset for a more fine-tuned approach. The model that we implemented here was actually used in order to detect COVID-19 among patients using their chest x rays. Perhaps other architectures such as VGG-19 or Xception can be used to see whether they perform better on the given dataset. Transfer Learning is another interesting concept that can be applied in this case e.g. using models trained on the ImageNet database e.g. AlexNet. The shallower layers in these networks have already learned to extract low-level features from images such as edges, lines, shapes etc. These can be used to initialize the weights of our custom architecture for better results, which can also fix the problem of having a relatively smaller dataset. The possibilities are endless!
If you find any of these techniques useful or have others in mind, please do mention them down in the comments.
Paper Reference: Madaan, V., Roy, A., Gupta, C., Agrawal, P., Sharma, A., Bologa, C., & Prodan, R. (2021). XCOVNet: Chest X-ray Image Classification for COVID-19 Early Detection Using Convolutional Neural Networks. New generation computing, 1–15. Advance online publication. https://doi.org/10.1007/s00354-021-00121-7