A question people often raise is ”Data Science vs Data Analytics”. We have discussed data science in a lot of other articles but today we talk about data analytics, in fact, we talk about the first step of data analytics which is data visualization.

In an earlier article, we saw how you can visualize your data and perform analysis in about 2 lines of code using the SweetViz library (A data visualization tool). If you missed that article, you can read it here.


2 lines of code may seem like a deal-breaker but most of the time your data visualization jobs will require you to perform much more complex analysis and for that, you need to create visualizations of your own that cater to your requirements.

MatPlotLib is the most popular Python library that is used for plotting all sorts of charts and graphs. This library allows a great deal of flexibility and customization down to the smallest level.

In this article, we will be exploring different kinds of plots and looking into the customization features that MatPlotLib allows and will look at several data visualization examples.


Getting started with Matplotlib

Of course, the first step is always to install the desired library. Matplotlib can simply be installed by running the following command in your terminal/command prompt.

pip install matplotlib

In this article, we will use the Covid-19 Vaccinations dataset that can be found here.

For loading the dataset we will use the pandas library. If you do not have pandas installed, you install it by running the following command in your terminal.

pip install pandas

The first steps are to load and explore the dataset to see what values can be plotted for a useful visualization.

#importing the libraries
import matplotlib.pyplot as plt
import pandas as pd
covid = pd.read_csv("country_vaccinations.csv") #load the dataset
covid.sample(5) #sample 5 rows from the dataset
Loaded Dataset

So what we have here is the information from all the countries regarding the no. of vaccinations administered. We can definitely do some fun plotting here. Also important to note, we have some missing values in the dataset (denoted by NaN above).

Alright enough wait. Lets plot.

covid_pak = covid[covid["country"] == "Pakistan"] #single out data for Pakistan
covid_pak = covid_pak.sort_values(by = "date") #sort data by date


plt.plot(covid_pak["date"] ,covid_pak["total_vaccinations"]) #Plot the relevant data

Here we have singled out data for the country “Pakistan” and the last line uses matplotlib to plot the total vaccine doses administered against the date on the x-axis.

Plot for covid vaccinations in Pakistan

That single line created our plot for us. The line is broken because as I mentioned before the data has missing values. We can see that the trend is that the number of vaccinations is increasing with time.

The plot is there but it is quite – UGLY – for lack of a better word. Let’s add some more elements and make the graph a little larger for better viewing.

plt.figure(figsize = (12,10))#define the figure along with its size
plt.plot( covid_pak["date"] ,covid_pak["total_vaccinations"]) #Plot the relevant data
plt.title("Total Vaccinations administered in Pakistan")#Set title for the plot
plt.xlabel("Date") #Set X-axis label
plt.ylabel("No. of vaccines") #Set y-axis label
Graph with some necessary elements

In the code above we have first altered the figure size to (12,10) corresponding to the width and height of the figure. Next, we go on to plot the data and then set the labels of the X and Y axis, and finally give the plot a title.

Another important concept to learn is matplotlib subplot. In the above figure we have a single line graph but what if we want to look at multiple graphs. IN such a case we create subplots.

The simple command to initiate a matplotlib subplot is:

plt.subplot(2,2,1) #create a 2 by 2 grid to create 4 different plots

The above line creates an empty 2 x 2 with space for 4 different plots, the ‘1’ in the last position in the parenthesis means the plot created immediately after this line will fill up cell no. 1. Let’s create 4 such plots.

plt.figure(figsize = (12,10))#define the figure along with its size

plt.subplot(2,2,1) #fill up cell no. 1
plt.plot(India["people_vaccinated"])
plt.title("No. of Vaccinated people in India")
plt.xlabel("Date") #Set X-axis label
plt.ylabel("No. of vaccines") #Set y-axis label

#----------------------------------------------------

plt.subplot(2,2,2) #fill up cell no. 2
plt.plot(covid_pak["people_vaccinated"])
plt.title("No. of Vaccinated people in Pakistan")
plt.xlabel("Date") #Set X-axis label
plt.ylabel("No. of vaccines") #Set y-axis label

#----------------------------------------------------

plt.subplot(2,2,3) #fill up cell no. 3
plt.plot(america["people_vaccinated"])
plt.title("No. of Vaccinated people in America")
plt.xlabel("Date") #Set X-axis label
plt.ylabel("No. of vaccines") #Set y-axis label

#----------------------------------------------------

plt.subplot(2,2,4) #fill up cell no. 4
plt.plot(uk["people_vaccinated"])
plt.title("No. of Vaccinated people in the United Kingdom")
plt.xlabel("Date") #Set X-axis label
plt.ylabel("No. of vaccines") #Set y-axis label
2×2 grid with 4 subplots

A grid-like this makes it much easier to compare graphs of different data points.

** Another important point to note is that in the subplots above, we haven’t specified any values for the y -axis. In such a case MatPlotLib automatically generates numbers from 0 to N, N is the length of data points on the y-axis. **


Different kinds of plots

What we have created above is a simple line graph. MatPlotLib allows you to create several different types of plots according to your need. Some of these plots are:

  • Scatter Plot
  • Bar Graph
  • Pie Chart

Let’s see how we can create these

  • MatPlotLib Scatter Plot
plt.figure(figsize = (12,10))#define the figure along with its size

plt.scatter(country_name, max_vaccinations)# create a scatter plot for the total vaccinations in every country

plt.title("Total vaccinations in a country")
plt.xlabel("Country Name")
plt.ylabel("No. of Vaccinations")
Scatter Plot for total vaccinations in every country

The graph seems to have a very odd pattern as almost every country seems to have its value to close to zero with the exception of a handful. This does not mean that all those countries did not receive any vaccine, if you look at the top right side of the graph you can read ‘1e9‘. This defines the scale of the graph and means that the numerals on the y-axis are to be multiplied with 10^9. This means that many of the countries might have values 0.1 x 10e9. This is still a large number however the trend of our data makes it difficult to understand this. We can fix this skewness of the plot by changing the y-axis scale to logarithmic by simply adding the following line to the code.

plt.yscale("log")
Scatter plot with a log scale on the y-axis

This makes a lot more sense. This data visualization example shows that how important it is to alter graphs to make appropriate sense out of our data.

There is one more disturbing element in all the plots we have created so far. The labels on the x-axis are not visible at all. This is because there are so many labels to display that they are all jumbled up. Fear not, we will address this issue in the advanced portion of this tutorial where we look to beautify our plots more and make the most out of them.  

  • MatPlotLib Bar Plot

A barplot is usually used to display the magnitudes of a certain quantity against different independent variables. In MatPlotLib you can simply create a bar plot by the following function.

plt.bar()

We need to create some additional data to display a useful bar plot.

canada = covid[covid["country"] == "Canada"] #isolate records for canada

#extract records for date: 2021-11-04
uk_onedate = uk[uk["date"] == "2021-11-04"]
america_onedate = america[america["date"] == "2021-11-04"]
canada_onedate = canada[canada["date"] == "2021-11-04"]
pak_onedate = covid_pak[covid_pak["date"] == "2021-11-04"]
labels = ["UK", "USA", "Canada", "Pakistan"]
plt.figure(figsize = (12,10))#define the figure along with its size
plt.bar(labels, [uk_onedate["people_vaccinated"].iloc[0], america_onedate["people_vaccinated"].iloc[0], 
         canada_onedate["people_vaccinated"].iloc[0], pak_onedate["people_vaccinated"].iloc[0]]) #argument 1 is the list of labels to be displayed on the x-axis and argument 2 is the list of values against the labels
plt.title("Total people vaccinated till 04-11-2021")
plt.xlabel("Countries")
plt.ylabel("No. of people vaccinated")
Bar plot to display total vaccinated people till 04-11-2021
  • MatPlotLib Pie Chart

A pie chart is usually used to visualize the percentages of different elements inside a single entity.

MatplotLib has the following function to draw a pie chart.

plt.pie()

Let’s just use dummy data for this example. We’ll use our covid data in the next section

plt.figure(figsize = (12,10))#define the figure along with its size
_ = plt.pie([24,40,12])
Pie Chart for dummy values

We passed the function the values: 24,40,12. In the above pie chart, the orange part represents 40 since it is the largest value and similarly for the remaining values.


Beautifying the graphs and charts (Advanced options)

Photo by Maxime Bhm on Unsplash

All the above plots we have created do the job well but none of them look professional. Luckily what we’ve seen above is just the tip of the ice burg. MatPlotLib has a plethora of customization options.

Let’s create some cool-looking graphs from the same covid dataset that we loaded earlier. We will need some additional information for the plots.

population_uk = 67.2e6 #population of of the UK
population_usa = 329.8e6 #population of USA

Above we have defined the total population of the two mentioned countries.

Now we write our code.

fig = plt.figure(figsize = (14,10)) #create figure and set figure size

_ = plt.plot(uk["date"][:n],  uk["people_vaccinated"][:n], color = 'red')#plot data for vaccinations in UK
_ = plt.plot(uk["date"][:n],  america["people_vaccinated"][:n], color = 'blue')#plot data for vaccinations in USA

_ = plt.fill_between(uk["date"][:n], uk["people_vaccinated"][:n],america["people_vaccinated"][:n],
                     facecolor = "#0435d4", alpha = 0.5)#color inbetween the two plots

_ = plt.axhline(population_uk , linestyle = '--', color = 'red')#draw line for poppulation of UK
_ = plt.axhline(population_usa , linestyle = '--', color = 'blue')#draw line for poppulation of USA

plt.ylabel("No. of vaccine doses administered", fontsize=18)#set y-axis label
plt.xlabel("TimeStamp", fontsize=18)#set y-axis label
plt.title("Vaccinations in UK and USA (2021)")#set title for the chart

plt.legend(["UK", "USA",r"Total population(UK)",r"Total Population(USA)"], loc = 2)#place legend

ax = plt.gca()#get object representing the current axis to modify it.

ax.patch.set_facecolor('blue')#set background color
ax.patch.set_alpha(0.1)#set oppacity of background

ax.grid(axis = 'y',linewidth = 1, alpha = 0.5) #enable grid

ax.xaxis.set_major_locator(plt.MaxNLocator(20)) #set no. of ticks to show on the y-axis
plt.xticks(rotation = 45) #rotate x-axxis ticks

#hide chart edges
right_side = ax.spines["right"]
top_side = ax.spines["top"]
left_side = ax.spines["left"]
bottom_side = ax.spines["bottom"]
right_side.set_visible(False)
top_side.set_visible(False)
left_side.set_visible(False)
bottom_side.set_visible(False)

That is a lot of code for one plot but the result is totally worth it.

Comparing UK and USA in terms of total vaccinations

Now isn’t that just gorgeous!! 😍Let’s first talk about what the graph represents.

We have the number of vaccinations applied for Covid in the UK and USA. Even though the US seems to be quite ahead in no. of the vaccinations, the dashed line representing the total population tells a different story. UK has vaccinated almost 80% of its population whereas the USA is at about only 60%.

Now let’s see what we have changed here.

  • Added some gridlines.
  • Created multiple plots on the same graph.
  • Shaded the region between the lines to highlight the differences.
  • Reduced the number of labels displayed on the x-axis to avoid jumbling.
  • Increased font sizes for the x and y labels
  • Added a MatPlotLib legend

The code above is well-commented so you can see which line does what.

Now let’s see a few more examples.

#creating some additional data
single_date = covid[(covid["date"] == "2021-11-05") & (covid["country"].isin(["United States", "United Kingdom", "India", "Pakistan", "China"]))]

#adding population for more countries
population_china = 1.402e9
population_pak = 220.9e6
population_india = 1.38e9
fig = plt.figure(figsize = (10,7)) #create figure and set figure size
plt.scatter(single_date["country"],[population_china,population_india,population_pak,population_usa], s = 40,color = 'red',zorder = 4)
bars = plt.bar(single_date["country"], single_date["total_vaccinations"],  edgecolor='blue',zorder = 3)
#show grid
plt.grid(linewidth = 0.5,zorder = -1)
#show legend
plt.legend(["Total population of country","Total vaccinations done"], loc=1)

#This code colors bars green on the provided condition
for i,x in enumerate(zip(single_date["total_vaccinations"],[population_china,population_india,population_pak,population_usa])):
    if x[0] > x[1]:
        bars[i].set_color('green')
A customized bar chart

A bar chart that shows total vaccinations in a country with the bar colored green if the no. of vaccinations exceeds the total population of the country (The vaccinations have multiple doses so this is possible) to get a gist of how well the vaccinations are going.

Now finally a good-looking pie chart.

vacc_demo_pak = covid_pak[covid_pak["date"] == "2021-11-05"]

partially_vaccinated = vacc_demo_pak["people_vaccinated"].iloc[0] - vacc_demo_pak["people_fully_vaccinated"].iloc[0]
unvaccinated = population_pak - vacc_demo_pak["people_vaccinated"].iloc[0]

#organising data for the pie chart
data = [partially_vaccinated, unvaccinated, vacc_demo_pak["people_fully_vaccinated"].iloc[0]]
mylabels = ["partially Vaccinated","Unvaccinated", "Fully Vaccinated"]
mycolors = ["Blue", "Red", "Green"]

fig = plt.figure(figsize = (10,7))
_ = plt.pie(data, labels = mylabels, explode = [0,0,0.2], shadow = True,autopct='%1.1f%%', colors=mycolors)
_ = plt.title("Vaccination demographics in Pakistan")
Pie chart representing the Covid vaccinations in Pakistan 

Now that looks so much better than the earlier one. It even highlights the no. of people that are fully vaccinated by the exploding that piece.


Conclusion

This was a very long article but still does not cover half of the capabilities of the MatPlotLib library. The data visualization examples shown above are just to get you started, there is so much more to it. Only with practice will you be able to above bad data visualization practices and carry out some amazing data visualization projects.

Leave a Reply

Your email address will not be published. Required fields are marked *