Web Applications in Production using Flask, Gunicorn WSGI and Nginx (Part 2/3)

Cover Picture
Web Applications with Gunicorn WSGI and Nginx

In our previous tutorial of this series, we looked at how we could deploy deep learning based web applications by strategically structuring our Flask project, serving dynamic content using the Jinja Engine and building an aesthetically pleasing frontend interface using the powerful Bootstrap library. Flask is a widely used web framework that can be used to build these web applications in just a few tens or hundreds of lines of code. However, it is not considered to be a production-ready web application server since it doesn’t support scalability, security and concurrency, servicing only one request at a time.

Thus, in this tutorial, we will look at the architecture of a basic production-ready web application-server stack and see how each individual component communicates with each other. We’ll also dive into the intimidating concept of a WSGI server, its history & need and its actual implementation using the Gunicorn WSGI.

This is the second article in a 3-part series on Developing Web Applications using Flask and Python Series. Stay tuned for the next part!

A Basic Architecture Stack

When working with web applications in production scenarios, concepts like a high concurrency rate or a secure application protocol almost always become an inherent need of the application. But how to go about choosing such a stack and why is it necessary to understand what each component is or how it communications with each other? Let’s explore some of these questions in more detail.

There are three main players to a production-ready web application:

  1. A web server, like Nginx or Apache, responsible for intercepting and rerouting the incoming HTTP requests, and serving any static HTML content.
  2. A WSGI-compliant application server, like Gunicorn or uWSGI, responsible for serving the HTTP requests by distributing the load among multiple spawned instances of the web application.
  3. A web application written under some web framework like Django or Flask.

The basic architecture of such a WSGI-compliant web application architecture is shown below for reference.

basic architecture of a production-grade web app
Basic Architecture of a WSGI-powered web application

We have looked at the web application side of this stack in our previous tutorial. Let’s now dive into what makes this stack production-ready by understanding what is meant by WSGI-compliancy and its implications on a web app.

What is a WSGI?

A Web Server Gateway Interface or WSGI for short (pronounced “whizzgee“) is a communication standard that is used to port any Python web application framework, that supports WSGI like Django or Flask, with any web server like Apache, Nginx etc. This is the de facto universal standard protocol that was introduced in PEP 333(3) back in 2010 to promote web application portability across a variety of web servers.

The History of WSGI Servers

In the early days of the internet, web servers of the time were using the HTTP/0.9 (1989) protocol to parse any incoming HTTP requests, fetch any requested static web page(s) stored on the server’s hard drive and send them back to the browser in the form of a response. However, the need of the hour was the capability to serve dynamic content over the internet since it could unlock the potential to serve forms, display banners or promos, deploy site searches and much more. This could be made possible by an external script that would take care of rendering the dynamic media and processing the data respectively. The web server’s responsibility would then be to invoke this script every time a request demanding dynamic content was encountered. Problem solved right?

Except that now whenever a request hits a web server, it had to fire up that external script, we talked about, to service that request in order to load or save dynamic data. It would do so by forking that script and starting a new child process that would inherit from the parent process all the necessary variables needed to service that request. At that time, there was no particular standard to naming these variables and so web server developers came up with their own variations. This hindered the process of obtaining portability, so to speak, since for each different web server, there had to be a different script that would read from the variables specific to that web server only. Thus, with that the world got introduced to the Common Gateway Interface or CGI for short.


CGI, introduced in 1997, was the first universal standard used to name these special variables and their purpose. This meant that now no matter what web server you chose or what external script you wanted to call, be it Python or Perl, your external script would read the request data from variables named according to a standard convention and then service the request accordingly. This allowed for portability. The Python community also took this a step further and in 2003, also standardized the way any web server was supposed to call this external .py script. This became known as the Python Web Server Gateway Interface or Python WSGI, or simply WSGI for convenience.

The WSGI Standard

The WSGI standard has two sides:

  • The web server side
  • The web application framework side

Any script (web application) that was supposed to be executed by the web server is expected to have a callable object (a callable is any object with a defined __call__ method, be it a function or a class) which would be invoked by the web server. This callable is sometimes also referred to as the WSGI app in the web development fraternity. This callable accepts two positional arguments:

  • The first parameter is a Python dictionary with the CGI key-value pairs
  • The second parameter (regardless of knowing what it is) must be called once within that callable and passed on the HTTP response status and the HTTP headers list.
  • The response body is supposed to be the return value of this callable

These are the simple rules that not just ensured portability to any web server but also allowed for the flexibility required to execute any web framework whatsoever. This is the beauty of the Python WSGI standard.

So now it would seem that everything is going according to plan right? Well.. 😅

WSGI be like

This time whenever a request hits a web server, it has to fire up the Python interpreter, for example in case of a Python script, every single time to execute that external script we’ve been talking about throughout this article. This is also known as forking in terms of the UNIX terminology. This means that the user probably has to wait another few hundred milliseconds for each request and while this might not sound much but it can become a pain to bear after some time. There must be some other work around to this.

If only we could fork these scripts right before we encountered any HTTP requests on the web server. This way we could avoid the overhead of loading a new Python instance into the memory while executing the script every single time. This is exactly why we have WSGI servers like Gunicorn or uWSGI for. They are based on the concept of a pre-fork workers model which means exactly as the name suggests; forking the script before an HTTP request comes in.

Gunicorn WSGI

Green Unicorn or Gunicorn is a powerful WSGI-enabled web server, based on the pre-fork worker model, used extensively when deploying web applications in production-ready scenarios. Gunicorn sits between the web server and the web framework, as described in the previous section, effectively acting as a bridge between them both. We talked about Gunicorn being a pre-fork worker model. Lets first see what this means.


Gunicorn Architecture

Gunicorn works on a pre-fork workers model, sure, but before we define what is meant by that, let’s first see what is meant by forking and workers in computing terms. Forking a (parent) process means creating a duplicate (child) process that inherits from that parent process and continues execution in parallel to the parent process. A worker is simply an alias used for a process. Thus a fork workers model would mean employing a single parent/master process which would then be responsible for spawning/forking and managing multiple child workers/processes and their health.

Building upon that, a pre-fork workers model then simply mean that Gunicorn not only forks the master process into multiple child processes to distribute the application load among them, it does so before it receives an HTTP request from the client(s). This is shown graphically in the previous section.

The number of instances that a Gunicorn server forks the web application depends upon the configuration parameters that are passed to Gunicorn at the startup time. Have a look at the usage section down below for more discussion on these parameters.


There are several ways to install Gunicorn on your system depending on your use-case. On Ubuntu/Linux systems you can install Gunicorn by:

sudo apt-get update sudo apt-get install gunicorn
Code language: Python (python)

Alternatively, the same can be done using pip:

pip install gunicorn


If you have followed along the previous tutorial of this series, you’ll be familiar with the directory structure of our Flask web application:

  • webapp/
    • static/
    • templates/
    • uploads/
    • XNet/
      • XNet.json
      • XNet.h5
    • __init__.py
    • params.py
    • views.py
    • xnet.py
  • README.md
  • requirements.txt
  • run.py

The parent webapp/ directory is the root directory of our project while the run.py file serves as the entry point for our application. Since we have talked about how Gunicorn sits between the web application side and the web server side, effectively acting as a bridge between the two, we have to set it up so that it intercepts the incoming HTTP requests prior to their usual entry point into the application.

Gunicorn runs via a CLI by specifying the WSGI application callable that it needs to connect with in order to set up communication between the web server and the web application. We can run Gunicorn from the root directory of our project by running the following command:

gunicorn [OPTIONS] [WSGI_APP]
Code language: CSS (css)

where [WSGI_APP] is of the form $(MODULE_NAME):$(VARIABLE_NAME). Here $(VARIABLE_NAME) refers to the name of our WSGI callable application which previously, if we recall, we named webapp by specifying in our webapp/__init__.py file like so:

webapp = Flask(__name__)
Code language: Python (python)

Thus, we can run Gunicorn simply by:

gunicorn webapp:webapp
Code language: Bash (bash)

This will run our web application at the default address with port 8000. We will later see how to bind our application to a different port or sock file. The [OPTIONS] argument provided in the CLI specifies a list of multiple configuration arguments that can be passed to tell Gunicorn how to serve the application. Here is a non-exhaustive list of some useful arguments to choose from:

  • -n STR or –name=STR: This option can be used to tell the system a good alias for the web service which can later be used for monitoring and debugging purposes using the process system tables.
  • -w INT or –workers=INT: The number of worker processes to fork to run an instance of the web application each. The official documentation suggests that a good number for this can be between 2-4 workers * number of CPU cores.
  • -t INT or –threads=INT: The number of worker threads, running on each of the worker processes, to handle the web requests each. This will spawn each worker specified by the -w command equal to the the number of threads specified. The official documentation suggests that a good number for this can be between 2-4 threads * number of CPU cores. However, the optimal combination of this along with the number of workers for your particular use-case can vary and would need to be tuned manually with a few hits-and-trials. Look at the next section for a brief discussion regarding threads & processes and how to smartly tune this option to squeeze the most out of your application in terms of scalability and concurrency.
  • -b ADDRESS or –bind=ADDRESS: This option can be used to bind the application to some user-defined port rather than the default port 8000. It can also be used to bind the application to a UNIX Socket file as we will see in a minute.
  • -D bool or –daemon bool: Whether the Gunicorn process should be run in the background as a daemon process or not.
  • –timeout=INT: The maximum amount of time in seconds after which a silent worker will be killed and restarted.
  • –log-level=STR: The granularity of the output logs thrown by Gunicorn for debugging and monitoring purposes.

A complete exhaustive list of all these settings and configuration parameters is given here. These settings can also be provided in a config file rather than passed as arguments via the CLI. The config file can then be specified by the -c PATH or –config=PATH argument.

Tuning the Workers and Threads Configuration

Gunicorn comes with an option to provide the number of worker processes that Gunicorn must fork for each of them to serve the web application and also the number of threads running on each worker to handle the incoming requests, as seen in the previous section. However, choosing an optimal number of workers and threads can become challenging when their underlying purpose is not clear.

group of person on stairs
Some workers

Workers or processes are generally used wherever there is a requirement for a high computational power with less or no regard for the memory footprint of the application. Each worker fires up a separate instance of the web application, thus in theory, applications that are generally CPU-bound in nature (are bottlenecked by the CPU) benefit from increasing the number of workers. Generally memory becomes the key determinant while choosing to configure a high number of worker processes. We can also configure the type of workers to choose from the configuration parameters. Here is a quick rundown on the type of workers provided with the software suite, their purpose and probable usage.

While processes each run separate instances of the web application, threads run on each worker process and thus share the application memory and resources among them. Running multiple threads is generally beneficial for I/O-bound (bottlenecked by the I/O) applications where each thread is concurrently servicing individual requests. Of course, one has to be very cautious with encountering possible race condition, resource blocking or program thread-safety, in general, when working in multi-threaded scenarios.

The optimum combination of workers (w) and threads (t) is thus dependent on the nature of the application and the underlying hardware resources available. There is no fixed rule of thumb to follow when tuning these parameters apart from a general advice provided in the official documentation. However, by understanding the limitations provided, one can try out multiple different combinations of the two parameters to achieve a sweet spot trade-off between them in order to optimize the scalability and concurrency of the application.

Now that we have demystified two players in the production-ready architecture of our web app, let’s now stack the final piece together.

Nginx: An Elegant Web Server

Nginx (pronounced “engine-ex“) has historically almost been the default choice for a HTTP web server for serving static and media content over the internet. However, as time has passed, it has proven to be an effective reverse-proxying and HTTP caching tool as well as for the purpose of load-balancing web apps. For this tutorial, let’s see how we can set it up to use it as a reverse proxy for our application.


You can install Nginx on Ubuntu-based systems by running:

sudo apt update sudo apt install nginx
Code language: Bash (bash)

Once that’s done, simply open up your browser and type localhost in the address bar and press enter. If everything configured correctly, you should see a screen similar to the one shown below:

Nginx is up

Congratulations, you’ve not only installed a web server but also reverse proxied your first request! 🙌 It was as simple as that. Now that you’ve seen a glimpse of what Nginx can do for you, let’s see what is cooking under the hood for us to get a gist of all the flavors.


The way Nginx is able to serve static files or reverse proxy the incoming request; rerouting it depending on the requested URI, is all elegantly managed by the configuration files in the /etc/nginx/ directory. This directory serves as the default root folder where Nginx looks for all of its instructions on how to behave when a certain URL is hit in the browser. There are two well-known practices of maintaining configuration files for all your sites under the Nginx schema:

  1. Create the config files (we’ll talk about them in a minute) for all the sites you want your server to serve (even if not for the time-being) in the /etc/nginx/sites-available/ directory and create symlinks between these files and the files in the directory /etc/nginx/sites-enabled/ for just those sites which you want currently enabled.
  1. Rather than keeping track of symlinks, you could also just create the config files in the /etc/nginx/conf.d/ directory with a .conf suffix, for just those sites you want currently enabled. Any sites, being served by these config files, you want to be disabled in the future can either just be taken out of this directory or renamed to not having the .conf suffix.

Most developers have been working with the first option from some years and even though both of these methods will serve the purpose of configuring Nginx to work properly, the second method is generally considered as the best practice as it is easier to work with one directory rather than managing multiple directories and the symlinks between them. The first method has also been cited to have been deprecated according to the Nginx Cookbook:

The /etc/nginx/conf.d/ directory contains the default HTTP server configuration file. Files in this directory ending in .conf are included in the top-level http block from within the /etc/nginx/nginx.conf file. It’s best practice to utilize include statements and organize your configuration in this way to keep your configuration files concise. In some package repositories, this folder is named sites-enabled, and configuration files are linked from a folder named site-available; this convention is deprecated.

Finally, to tell Nginx where to look for the configuration files you’ve just set up, open up the /etc/nginx/nginx.conf file and it will look something like this:

user www-data; worker_processes auto; pid /run/nginx.pid; include /etc/nginx/modules-enabled/*.conf; events { worker_connections 768; # multi_accept on; } http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; # server_tokens off; # server_names_hash_bucket_size 64; # server_name_in_redirect off; include /etc/nginx/mime.types; default_type application/octet-stream; ## # SSL Settings ## ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; # Dropping SSLv3, ref: POODLE ssl_prefer_server_ciphers on; ## # Logging Settings ## access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; ## # Gzip Settings ## gzip on; # gzip_vary on; # gzip_proxied any; # gzip_comp_level 6; # gzip_buffers 16 8k; # gzip_http_version 1.1; # gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript; ## # Virtual Host Configs ## include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; }
Code language: Nginx (nginx)

There are a lot of things in this file but you only need to worry about the include statements at the end of the file, which tell Nginx to match the wildcard pattern in the directories specified and load all of those matched files the next time the web server reloads. So depending on which method you choose to follow from above (1 or 2), these statements should help load up your configurations on the server’s next reload.

But how to write these config files ourselves? Remember the architecture diagram from above? The web server is essentially the entry point of all the incoming web requests so any request that was going through Gunicorn previously, now needs to be intercepted by Nginx and then rerouted to their appropriate destinations based on the config file(s) we define for Nginx.

Let’s create a config file for our application inside the directory /etc/nginx/conf.d/ using the second method described above and name it webapp.conf:

server { listen 6060; server_name; access_log /home/haris/nginx/access_webapp.log; error_log /home/haris/nginx/error_webapp.log; location / { include proxy_params; proxy_pass http://unix:/home/haris/gunicorn/gunicorn_webapp.sock; } }
Code language: Nginx (nginx)

Let’s go through this file line-by-line:

  • server {}: Defines a new server block for Nginx to serve.
  • listen 6060: Tells Nginx to listen at port 6060 for any incoming web requests.
  • server_name: Defines the hostname or names of which requests should be directed to this server. This could also be something like www.pneuxnet.com, if we have purchased this domain, or its subdomains like pneuxnet.com or predict.pneuxnet.com, if we have also correctly configured their DNS records (more on this later).
  • location <path> {}: Matches the <path> to the request’s URI. The portion of the request’s URL after the domain, is referred to as its URI.
  • include proxy_params: Tells Nginx to include some predefined parameters for proxying and reverse-proxying purposes.
  • proxy_pass <redirect path>: Tells Nginx to redirect the incoming request whose URI matches <path>, to a UNIX socket file as specified above.

Run the following command to test your config file and restart Nginx for the changes to take effect:

sudo nginx -t
Code language: Bash (bash)
Test the configuration files

Now restart Nginx by entering:

sudo systemctl restart nginx

or by:

sudo service nginx restart

Now all that is left is to run our Gunicorn application and bind it to the aforementioned socket file to tie-in our application stack and allow requests to be reverse-proxied from Nginx via this socket file to Gunicorn and through the WSGI interface into our Flask web app (Visualize the architecture diagram from before). Run the following command in the webapp/ directory:

sudo ufw allow 'Nginx Full' gunicorn webapp:webapp --bind=unix:/home/haris/gunicorn/gunicorn_webapp.sock --log-file /home/haris/gunicorn/gunicorn_webapp.log --timeout 500
Code language: Bash (bash)

Go to the browser and type localhost:6060 and you should see the app running perfectly fine. The logs for each file can be viewed in the file specified by the –log-file flag.

Bonus tip: There is also a way to avoid downtime of our application in production upon unexpected crashing or server reboots by setting up automatic restarts of the Gunicorn application in such unforeseen scenarios. This ensures that our application remains available to the users at all times with minimal downtime of course. This can be done by creating a unit file with the suffix .service in the /etc/systemd/system/ directory:

[Unit] Description=PneuXNet Web Application Gunicorn Service After=network.target [Service] User=haris Group=www-data Restart=always RestartSec=1 WorkingDirectory=/home/haris/webapp Environment="PATH=/home/haris/webapp/env/bin" ExecStart=/home/haris/webapp/env/bin/gunicorn webapp:webapp --bind=unix:/home/haris/gunicorn/gunicorn_webapp.sock --log-file /home/haris/gunicorn/gunicorn_webapp.log --timeout 500 [Install] WantedBy=multi-user.target
Code language: JavaScript (javascript)

I will not go into the details of how to write this file or what each statement in this file means. If you want to go into such details, I recommend you check out this article. Now, let’s enable our Gunicorn service:

sudo systemctl start pnuexnet_webapp sudo systemctl enable pnuexnet_webapp
Code language: Bash (bash)

The deployment configurations for Gunicorn and Nginx can be found in this repository.

Enabling the SSL/TLS Certificate using Let’s Encrypt

There is one last thing we need to talk about before we can wrap this up. If you go back to the browser and enter localhost:6060, you will see a warning like so:

Insecure HTTP Connection

This is because our connection is not secured and can be intercepted since it sits on the unsecured and unencrypted HTTP protocol. The standard practice of communication over the internet is by incorporating the Transport Layer Security (TLS) (previously SSL) protocol to secure our data and application. This is done by implementing the HTTPS protocol which builds upon the conventional HTTP by encrypting the communication between the web browsers and the web servers. Let’s see how the TLS/SSL encryption works first

a lock
Time to secure our app

In order to implement TLS for a web app, there must be a TLS/SSL Certificate signed by a valid Certificate Authority (CA), like Let’s Encrypt, installed on the origin server of the app. This certificate contains important information about who owns the website domain, along with the server’s public key, both of which are important for validating the server’s identity for security reasons. When a user connects to a website from their browser, their device and the server effectively perform a TLS/SSL Handshake, after which the web server then sends encrypted data over the HTTPS protocol to the client’s device where it is decrypted and shown to the user.

Unfortunately, in order to implement this we need to obtain a public domain for our website from some of the public domain providers out there like GoDaddy etc. Once you have purchased a public domain for your website, and configured its DNS records to point to the IP address of your server (there are plenty of good tutorials on how to do this on the internet), head on over to Step 6 of this article to obtain a TLS/SSL certificate for your website from Let’s Encrypt. Follow the guide from there to enable the HTTPS protocol for your website.. and that’s that! We just deployed a web application to a production-ready environment, taking care of important metrics like scalability, availability and security. Pat yourself for sticking around for this one! 🥇


In this tutorial, we have dealt extensively with deploying web applications to production while keeping in mind some crucial aspects like scalability, security and availability to drive our architecture choices at each step. We have also looked at some of the best and widely adopted practices that developers tend to stick to in the industry and their reasons of doing so.

However, up till now we have only talked about deployments on a single machine. What happens if we somehow have to move all of our application stack onto some other machine down the road? Do we need to repeat all of our steps again on this new system, from creating environments and installing packages to configuring the web servers and the load balancers of our application? If only we could wrap our entire application in some sort of container and take it with us wherever we would go.. 😏

Until then! ✌

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Deep Learning on the Web using Flask, Python and Jinja (Part 1/3)
Deep Learning on the Web

Deep Learning on the Web using Flask, Python and Jinja (Part 1/3)

Flask is a very powerful framework used to develop highly scalable and

You May Also Like