In part 4 of Building a Blog I will talk about Nginx and how it is configured to serve static files efficiently and forward API requests to my Scotty server. The goal is to have Nginx as a secure and performant entry point to my site.
Why Nginx
The web is wild. There are all kinds of web clients and specifications aren’t followed 100% of the time. There are also malicious folks trying to bring your site down. In a production environment, it’s better to have your application behind a robust and performant HTTP server like Nginx. It has every feature that you could possibly need. The ones I’m most interested in are:
- Efficient static file serving
- Reverse proxy with caching
- Rate limiting
These things are essential for a performant site, but the thing I like the most about Nginx is how easy it is to setup Let’s Encrypt. I think it’s very important to use HTTPS everywhere because:
- Your interactions with a web site should be private.
- No 3rd party should be able to inject ads or any extra content to a site you are visiting.
Let’s Encrypt’s Certbot makes it really easy to use HTTPS everywhere in your server. It can even generate the necessary Nginx configuration and let you focus on other things. You can run any amount of apps and services on your server and they all can be accessed with HTTPS via Nginx without having to patch them to support HTTPS.
Dealing with Configuration files
Nginx is another one of those programs has so many good features that you have to
edit some files under /etc
to get it to work for your use case. This is a
problem because ideally I want all the source files needed to deploy my blog
under a single repository. This way, if I ever need to deploy to a new server,
all I have to is run git clone
and run some scripts. Writing configuration
files is an extra step that I want to avoid.
A simple solution would be to create the configuration files inside the repo and
copy them to /etc
during deploy. I think this is viable for most web apps,
when the Nginx instance is used exclusively for the web app. This isn’t viable
for me because I’m hosting this on a Virtual Private Server (VPS) that I also
want to use to run other web apps and services. Those services should not be
exposed to the internet directly, they should be behind Nginx (HTTPS
everywhere!) and my blog’s source code should not know anything about them.
The solution here is to separate the global Nginx configuration, which knows
everything going on in the system, and the blog configuration. Lucky for me,
Nginx configuration language supports an include
directive to load extra
config files from arbitrary configurations. My global Nginx configuration looks
something like this:
http {
include /path/to/repo/nginx/http.conf
# More configuration
server {
include /path/to/repo/nginx/server.conf
# More configuration
}
}
The next step would be to add the configuration files, but there is a little problem. Those files contain some data that should be private or is only relevant to the server where the blog is currently deployed. This data should not be part of the repository. Instead, the repository should contain configuration templates in which the server admin fills out the fields with the private data.
In my case, I use m4 (macro language) as my template engine. It doesn’t seem to be very popular among web developers, but it’s available pretty much on every Linux distro and it’s good enough for me. I found this site to be very helpful to learn how to use it. I’m curious if there are better tools for this problem.
If you take a look into my blog’s config templates. You’ll see that there are only 3 variables:
API_PORT
: The port in which the blog’s backend listens.SITE_PATH
: The location of the blog’s static files (HTML, CSS & JS).
CI-SECRET
: A secret string used by Circle CI to update the blog after agit push
.
To generate the config files, all I have to do is define those variables when
invoking m4
in the command line:
m4 -DAPI_PORT=9999 -DSITE_PATH=/path/to/_site/ -DCI_SECRET=6yhfdri-ed45-24on-5342-0b24q85m4452 ./server.m4 > ./server.conf
m4 ./http.m4 > ./http.conf
Serving static files
The blog is mostly HTML and CSS files. I won’t cache these files because they are small, and I might need to correct them in case I publish a typo or something like that. Serving these files is very simple.
root SITE_PATH;
location / {
try_files $uri $uri/ =404;
}
First, set the directory containing the static files as the “server root”. Then
the location /
block uses the path of any request received by the server and
looks for a file that matches the path. If it can’t find anything, it retries
appending a /
. If it still can’t find a file to serve, it returns 404.
Some static files that are too big, or just don’t ever need to be updated,
should be cached by clients to improve the site’s performance. To inform
clients that these files should be cached I put them under /assets
and use
this configuration:
location /assets {
add_header Cache-Control "public, max-age=31536000, immutable";
try_files $uri $uri/ =404;
}
This block adds a Cache-Control
header to the response so that browsers can
avoid re-downloading the file if it’s already cached. I use immutable
to declare
that the file never changes so it should never download it more than once. If
the browser does not support immutable
it can fallback to max-age=31536000
,
which tells the browser to cache the file for a ridiculously long time.
Another important thing when serving static files is GZIP compression. Text files like HTML, CSS and JS can be reduced up to like a third of its original size when using GZIP. This is a great way to save bandwidth. I enable this on my global configuration because it is so good.
http {
gzip on;
gzip_min_length 1000;
gzip_types application/json application/xml image/svg+xml text/css text/plain;
}
Nginx has a lot more optimizations that for static files. My configuration
was originally created by certbot
when I set up Let’s Encrypt, so it already
had some nice things like tcp_nodelay
, tcp_nopush
, and sendfile
activated.
You can read more about those in this article.
Reverse Proxy
My blog exposes an HTTP API to like blog posts using a Scotty server that records every like button click into SQLite database. SQLite is not really designed for web applications, but it’s still pretty performant and good enough for small sites like this one. I love the simplicity of an embedded database and not having to deal with another huge system like MySQL or Postgres. I don’t expect to receive enough traffic to crash the system any time soon, but I still think that it would be irresponsible to expose this server directly to the internet.
This is why I’m using Nginx as a reverse proxy. Nginx communicates clients using
SSL (provided by Let’s Encrypt) and optionally, HTTP/2. If a client sends an API
request it is forwarded to the Scotty server using simpler protocols like
unencrypted HTTP/1.1. Here’s the configuration for /blogapi/like
which records
a liked post when it receives a POST
request:
location /blogapi/like {
limit_req zone=post_like_limit burst=5 nodelay;
rewrite ^/blogapi/like/(.*)$ /like/$1 break;
proxy_set_header X-Real-IP $remote_addr;
proxy_pass http://localhost:API_PORT;
}
The requests are rate limited and rewritten before being forwarded to the Scotty server listening on a different port on the same VPS. Rate limiting is super important to prevent DoS attacks. Nginx has a great blog post about rate limiting that I used as reference. The request is modified in two ways before being forwarded:
- The
rewrite
directive simply removes the/blogapi/
prefix from the path, keeping any other path parameters intact. - The
proxy-set-header
directive adds an HTTP header to the proxied request with the original request’s IP address so that the Scotty server can insert it to the database.
The configuration for blogapi/likes
is a bit different because it returns the
number of likes that a post has when it receives a GET
request:
location /blogapi/likes {
limit_req zone=api_limit burst=5 nodelay;
rewrite ^/blogapi/likes/(.*)$ /likes/$1 break;
proxy_pass http://localhost:API_PORT;
proxy_cache blog;
proxy_cache_valid 200 10s;
}
Just like the previous endpoint, the requests here are also rate limited and
rewritten. There’s no need to pass the IP address as a header since there is
nothing to insert. Since this is a GET
request, the response is set to be
cached for a few seconds if the status code is 200. Caching is very useful
because it avoids repeated database queries on short intervals. Because of
caching, this endpoint uses a more lenient rate limit “zone”.
That’s it for today. I’m pretty happy with Nginx although maybe I could do just fine with a simpler tool. The next post will be about continuous integration and how I use CircleCI to test my code and deploy the blog automatically after pushing to the master branch of my GitHub repository.