In part 5 of Building a Blog I will talk about how I automate deployments of new blog posts using CircleCI. Every time I push to the master branch of my GitHub repository, a web hook is triggered and CircleCI checks out the latest code, runs a few tests, and finally deploys it. This makes it really easy for me to add new posts. However, the process isn’t that simple under the hood, and I want to explain in this post how it works.
Why CircleCI
If you want to build high quality software you need a continuous integration system. This is specially true if at some point you want have multiple collaborators. Almost every successful project or library hosted on GitHub uses either Travis CI or CircleCI to verify that new contributions don’t break the code. For web applications, the CI system can also deploy the app after successfully building it. If you want to automate your deployments and make your life easier, you need a CI system.
There are few options on the internet for CI, a lot of them are even free (as in free beer). Pretty much all of them integrate nicely with GitHub, requiring little or no configuration. I have used both Travis CI and CircleCI in the past, and they are both great services. I prefer CircleCI for mostly subjective reasons. What I like the most is that every build runs inside a container, and you can pick any image you want for that matter. CircleCI has huge variety of pre-built images that should cover basic needs. If you need something more complex you can also choose any image uploaded to DockerHub, including your own. This is essential for projects like mine that require multiple unrelated compilers or runtimes for building.
Preparing the build environment
Since this blog uses mostly Haskell and a bit of Node.js to generate the HTML, CSS and JS files, I need an image that has both Stack and Node.js installed in order to build the blog in a CircleCI job. It’s not a common combination so there is no pre-built image that I can use. The good thing is that I can create my own.
I need an image with multiple tools so I should pick a Linux distro that can make it easy to install them. I have used Arch Linux on all my personal computers for years. I think it’s a great minimalistic distro so I went with that and installed all dependencies via Pacman. This way I can build the blog in the cloud with same environment of my development machine.
# Set base image
FROM base/archlinux:latest
# install circleCI dependencies
RUN pacman -S --noconfirm git openssh tar gzip ca-certificates
# install blog dependencies
# stack needs make & gcc during setup
RUN pacman -S --noconfirm make gcc nodejs yarn stack
# Add utf-8 support so that hakyll can compile files with unicode chars
RUN echo 'en_US.UTF-8 UTF-8' > /etc/locale.gen \
&& locale-gen \
&& echo 'LANG=en_US.UTF-8' > /etc/locale.conf
ENV LANG en_US.UTF-8
Since a base Arch Linux install is very bare bones, containing only the
absolutely indispensable for the OS to function there were a few gotchas like
having to explicitly install gcc
and make
so that Stack can compile and
install GHC; and configuring UTF-8 support so that Hakyll can compile files with
Unicode characters. Once the image is uploaded to DockerHub I can reference it
in my CircleCI config file and all future builds will run in that container.
version: 2
jobs:
build:
docker:
- image: gaumala/blog-arch:0.4
Running Haskell code in CircleCI
Unlike JavaScript, Haskell code needs to be compiled before it can be executed. This process can be very intensive and unfit for constrained cloud environments. CircleCI builds run in virtualized machines with 2 CPUs and 4 GB RAM. If your build needs more memory, it will fail.
In my case, I want to run Hakyll in the cloud to generate the static pages and deploy them. So I need to run Stack to compile my static site generator. At first, I used this configuration:
jobs:
build:
- run:
name: Resolve/Update Static Site Dependencies
working_directory: './static'
command: |
stack setup stack build --dependencies-only
This would fail after 30 minutes or so. It turns out that Hakyll pulls a lot
of dependencies, generating a dependency tree of a size comparable to the one
generated by your typical npm project. Since all this dependencies have to be
compiled, the process takes significantly longer than running npm install
.
Probably the biggest dependency here is is Pandoc. The
build would always fail while compiling Pandoc. Every single time. When I
ran stack build
on my local machine I noticed that RAM usage peaked at just
over 4 GB while compiling Pandoc, just over the limit. I was about to give up
on this and just generate the pages on my local machine instead of the cloud,
but then I found out that stack build
supports a few flags that can reduce
RAM usage:
-j1
. This limits Stack’s concurrency to 1 job only. This means that Stack can only compile 1 one package at any given time. Less concurrent processes means less RAM usage.--fast
. This disables compiler optimizations. Since the compiler has to do less work, it uses less RAM. I don’t need optimizations on my static site generator because I don’t need to invoke it very often. In this case reducing RAM usage is far more important than speed.
Here is the updated configuration:
jobs:
build:
- run:
name: Resolve/Update Static Site Dependencies
working_directory: './static'
command: |
stack setup
# this is pretty big so disable optimizations and limit
# to one job to avoid running out of memory stack build --dependencies-only -j1 --fast
This finally works. The build succeeds but there is one little problem here, though. It takes almost a full hour to compile the whole thing! Waiting an hour to have a new post published sounds like bad automation. Luckily, CircleCI supports caching dependencies and generated binaries to drastically speed up builds like this one.
Caching for faster builds
To avoid wasting time compiling Pandoc every single time, I can declare in my CircleCI config which paths on the file system should be cached for the next builds. Here’s the configuration:
version: 2
jobs:
build:
docker:
- image: gaumala/blog-arch:0.4
steps:
- checkout
- restore_cache:
name: Restore Cached Dependencies
keys:
# Always cache everything
- deps-
# Build all Haskell code & pull all npm packages here
- save_cache:
name: Cache Dependencies
key: deps-
paths:
- ~/.stack
- ./api/.stack-work
- ./static/.stack-work
- ./frontend/node_modules
# Run tests, build the site and deploy it here
The steps can be summarized like this:
- Checkout code from Git.
- Restore the cache.
- Run steps that generate things that should be cached. In this case the steps
are:
- Download & compile the static site generator’s dependencies.
- Download & compile the Scotty server’s dependencies.
- Pull npm dependencies used in the frontend.
- Store outputs in the cache. I just have to specify the paths that should be saved and on the next build they will be added to the container when restoring the cache.
- After caching, I can finally run tests, build and deploy the site.
You might be wondering why I chose these paths for caching, so I’ll explain:
- The
~/.stack
directory is where Stack stores the GHC compilers it uses. This saves time duringstack setup
. - The
.stack-work
directories are used by Stack to place the compiled binaries. Caching this enables incremental builds. If the next builds don’t change the Haskell source code, which is the common case when adding new posts because they are simply markdown files, nothing needs to be compiled. The binaries are already there. Zero time wasted. - The
./frontend/node_modules
contains the npm dependencies used by the frontend. Yarn will check this directory with its lockfile and avoid installing modules if everything is already there.
Now that everything is cached, pushing new posts can take less than 2 minutes to deploy. Nice!
Deploying with webhooks
After running tests and building the site, the final step is to deploy the thing. How do we achieve that? There are many approaches.
Probably the easiest one is to and copy files to the server using scp
.
This is a very solid approach, but I personally dislike it because I have to
manage SSH keys on the server, which means extra configuration that is not part
of my repository. Also it would be nice to have version control of the files
that get deployed. Instead of blindly copying files to the server I’d prefer
to commit them in a new new orphan branch
and have the server pull from that branch during every deploy. This way, if
something goes wrong I can use Git to quickly rollback to a previous version.
What I ended up doing is having a server
branch on GitHub with the blog’s
generated static files and the sources for the backend. This branch is synced
with the server using GitHub webhooks. Everytime a new post is pushed to the
server
branch, GitHub sends an HTTP POST request to the blog’s server so that
it can update itself.
Listening for GitHub events on the server
In order to use GitHub webhooks, I need a program on the server that listens
for POST requests and pulls the latest code whenever it gets updated. I already
have a Scotty server for my blog’s
backend so I can add a new endpoint that runs a bash script that runs a
git pull
every time it receives a POST request. Here’s the code that runs
the script:
module System.Scripts (runUpdateScriptAtDir) where
import System.Process
import System.IO (IOMode(WriteMode))
import System.Exit (ExitCode)
import GHC.IO.Handle.FD (openFile)
updateScript :: CreateProcess
= proc "bash" ["-eo", "pipefail", "update.sh"]
updateScript
runUpdateScriptAtDir :: FilePath -> IO ExitCode
= do
runUpdateScriptAtDir workingDir <- openFile "/dev/null" WriteMode
devNullHandle <- createProcess updateScript {
(_, _, _, process) = Just workingDir,
cwd = UseHandle devNullHandle,
std_out = UseHandle devNullHandle
std_err
} waitForProcess process
It expects an update.sh
script to exist in the specified working directory.
It spawns a new child process that runs this script, ignoring stdout and stderr
and waits for its completion returning the exit code. Here is the Scotty handler
that runs this code:
updateStaticContent :: FilePath -> ActionM ()
= do
updateStaticContent staticContentDir <- liftIO $ runUpdateScriptAtDir staticContentDir
exitCode case exitCode of
ExitSuccess -> status ok200
ExitFailure code -> do
let msg = pack $ "Script exited with code " ++ (show code)
status internalServerError500
text msg
runBlogAPI :: Text -> ScottyM ()
= do
runBlogAPI contentDir "/ci-hook" $ updateStaticContent contentDir
post defaultHandler errorHandler
This handler runs whenever a POST /ci-hook
request is received. It runs the
update script at the configured working directory and returns status code 200
if it succeeds, otherwise it returns status code 500.
This works pretty well. The only problem is that anyone could trigger a git pull
just by sending a POST request if it was a public endpoint. Fortunately
my Scotty server only listens to requests forwarded by my Nginx instance. In
order to “hide” the endpoint, I use this configuration:
location = /ci/CI_SECRET {
rewrite ^ /ci-hook break;
proxy_pass http://localhost:API_PORT;
}
The =
means that the path has to be exactly the same as /ci/CI_SECRET
where
CI_SECRET
is a random string like “e7093f2c-4b04-421c-8c41-d81dba20aa0a”. The
random string is only kept by GitHub to send the requests so it is secure for
potential attackers.
Triggering the webhooks
The server is ready to receive payloads from GitHub so now we need the code that
triggers the GitHub webhooks. I just use a bash script that pushes the updated
pages to the server
branch. Note that to push to GitHub CircleCI needs write
permissions to repo. By default it only has read permissions because that’s
enough for building the project. This is easily configured from settings though.
If we go back to the CircleCI configuration, the final two steps are to generate the static pages, and commit & push the updated pages to GitHub, triggering the webhooks. Here’s the result:
version: 2
jobs:
build:
steps:
# ...Initial steps here...
# These are the final two steps
- run:
name: Generate Pages
working_directory: './static'
command: stack exec site build
- run:
name: Deploy
command: bash .circleci/push_to_server.sh
If you are curious about push_to_server.sh
, the bash script that does all
the Git work, here it is:
if [ "${CIRCLE_BRANCH}" != "master" ]; then
echo "Only master branch can deploy. Bye!"
exit 0
fi
# Store the new static pages somewhere safe
# before checking out the server branch
mv ./static/_site /tmp
git checkout server
# update static files
rm -rf _site
mv /tmp/_site .
# Other paths should not diverge from master
git checkout master -- .circleci/config.yml .gitignore nginx api
# Only push to server if something did change
if [[ `git status --porcelain` ]]; then
git config --global user.email job@circleci.com
git config --global user.name CircleCI
git add .
git commit -m "deploy #$CIRCLE_BUILD_NUM"
git push origin server
else
echo "No changes to deploy. Bye!"
fi
And that is all! Here’s the recap of the whole deploy process:
- CircleCI runs tests and builds the site.
- If new pages have to be deployed, they are pushed to GitHub.
- Github sends a POST request to the Scotty server after the repo is updated.
- The server pulls the latest code from the
server
branch and Ngninx automatically servers the updated pages.
That’s all I can say about my blog. I think it was pretty fun to do it. Some parts are a bit too convoluted and/or over-engineered for such a small site, but I did it because I don’t really get to play with these tools on my day job. Hope you enjoyed reading this. See you around!