How I build my blog (part 1)

It has been some time since I started using Jekyll, because of my frustration with CMS like Blogger and WordPress. I picked up Jekyll and learned it because I wanted complete control over how I build my site, where I write some fiction (go, check it out if it interests you). I wanted to control the design, the content, the features, pretty much everything. The code is available on GitLab.

The setup is rather long. I split the process across two posts: build and deployment. This is the first part, build, and here is what this post discusses about:

It took me a week or so to get a hang of Jekyll, and I managed to tweak the design and the build logic of my blog on my personal computer. The basic workflow was:

One-time setup

Use a pre-made template for the base design.
Tweak the includes (or partials if you are a Hugo person).
Build your own includes and tweak the template.

Day-to-day

Write posts in Markdown with the necessary metadata in YAML.
Commit the new file or the changes to the repository.
Push the commits to the remote repository so that I can work on them anywhere.
Build the site locally into a directory.
Push the contents of the directory to the target S3 bucket.

If you notice, a few steps from above can be avoided. Also, there is dependency at step 4—I have to have Jekyll installed on all the computers where I work. Of course, if you used GitHub Pages, you did not have to worry about much; all you would have to do is push the changes, and GitHub would build the site for you (that is how this site works). But there are a few compromises in doing so. First of all, you cannot schedule posts. Second, GitHub runs Jekyll in what is known as the Safe mode. It disallows any Ruby plugins that you may be using to enhance your site. You are stuck with what Jekyll considers “safe”, and that is restrictive.

Programmers are not comfortable with such limitations. I am such a person. I want freedom to spread my wings (whether I spread them wide in reality or not—the choice of spreading should be left to me). One way of doing this was building the site on my own machine. But again, would I always work on my machine? Enter: automation.

Take a moment to think about this. You write your code and you commit the changes to the repository. You can use CI/CD (Continuous Integration/Continuous Deployment/Continuous Delivery) to build your code. You would need somewhere this pipeline can run on; we will get to that in a moment. Next, you want a place that can host the final product. In my case, it is a static site. Amazon S3 is capable of this. All that is left now is the how-to. And here is how I do it.

Set up Jekyll

Of course, we are looking at a way to build your site without dependencies on your machine. But you have to have a platform where you can make the initial build. It does not make sense to work directly with a container or the cloud right at the beginning. You must install Jekyll on your computer. I use Linux, and the process of setting up Jekyll is fairly simple on Linux. I use Jekyll on Manjaro, as well as Ubuntu. The process is similar on both the distros. I will write the steps for Ubuntu.

Install ruby, ruby-dev, make and build-essential.

sudo apt-get install ruby ruby-dev make build-essential

Define GEM_HOME and add the path to gems to PATH.

export GEM_HOME=$HOME/gems
export PATH=$HOME/gems/bin:$PATH

Install the gems, jekyll and bundler.
```
gem install jekyll bundler
```

Installing bundler is not necessary, per se. But doing so would make your gem management much easier, especially if you have multiple Jekyll sites. I use the bundler method because I have multiple Jekyll sites.

Build your site

If you are building a barebones site, you can use the jekyll new command to create a new site with the default Jekyll boilerplate template. Me, I built my theme as a gem (which you can use, too). I’ve added the instructions to use the theme in the theme documentation. I will refrain from adding those steps here, not only because it is redundant, but also because the document may change based on the changes I make to the gem.

Whether you used the default Jekyll theme or my theme, you would add your posts to the _posts directory as Markdown files. Test the site build locally and tweak the code if or as needed. When everything looks good, go ahead and add the pipeline configuration. Remember that the configuration is different for different CI/CD providers. I use GitLab CI because my project is on GitLab, and GitLab CI is tightly integrated with GitLab repositories; all you have to do is add a valid configuration—no need of additional hooks.

Configure the CI/CD pipeline

If you do want to enable CI/CD using GitLab CI, you need the config placed within the base directory of the repository. Create a YAML file as .gitlab-ci.yml. The name is important (including the dot). I have two sites (one staging and one production), and therefore, there are two stages. Details in a moment. Here is how you would configure the pipeline.

A brief note on pipelines

CI/CD pipelines are part of DevOps, and are used to perform testing and deployment. In large projects, integration testing is one of the most important aspects. The basic idea of a CI/CD pipeline is to test the build and the integration and if everything succeeds, deploy or deliver the product, based on how you use the pipeline.

Breaking down the configuration

GitLab CI builds your code on Docker container(s). Therefore, the very first step in your configuration would be specifying the container image. Docker (unlike Packer, if you know Packer) uses an image file as the base. On top of it, you add layers. Here, I use the image created by Benny Chew. You specify the image by identifying the user who published it, and then, the name of the image.

image: bchew/jekyll-build

Next, you specify what you would like to cache. In general, CI/CD pipelines have multiple jobs within them. For example, you could set up a single pipeline called “Build, test and deploy to staging”, and you could have three jobs under it, one each to build, test and deploy the code. There may be resources that can be used across jobs. They are specified under cache. In my case, I cache the vendor directory, where I would store the gems.

cache:
  paths:
    - vendor/

Every job has a script. In GitLab CI (and many CIs for that matter), you can specify commands that you would like to run before the jobs. Specify this under the key, before_script. This is an array in YAML, and therefore, you would use a hyphen followed by the actual command, bundle install --path vendor. It will now make sense given that this is a Jekyll site. Therefore, it needs Ruby and some gems. These gems are common across the entire site, common for build and deployment. Therefore, we fetch the gems in the pre-script call. We store these within vendor, which in turn is cached across jobs.

before_script:
  - bundle install --path vendor

Next, we specify what the stages in the pipeline would be. Each stage could have a script. In other words, I have created one stage, and one script within the stage. Next, I specify the environment.

stages:
  - build

Next, we specify the job name and what the job has to do. I have specified parameters such as the name of the environment. Next, you specify the script. This is the set of commands that actually constitute what the job is supposed to do. In our case, it is the build and the deployment. I use the jekyll doctor command to perform a few checks on the built site, such as avoiding duplicate links. If there is an issue at this stage, the build will fail and you will be notified about it. That is the purpose of CI/CD.

I have four commands here. The first is to call jekyll doctor, the second to perform the actual build. Once the site is built, the files are saved in the public directory in the main site directory (do not worry about the command for now; it is all Jekyll stuff). The third step is to rename the files—to remove the .html extension because I like my links to be plain, without the extension. Now, in general, when you run the site locally, you could use a trailing / to have extension-free URL to posts—in this case, your page will be built as an index.html under the specific URL. For instance, if I had a URL, {{ site.url }}/welcome, it would either be a welcome file with no extension, or an index.html under <siteURL>/welcome/. S3 does not handle this consistently. Therefore, I chose to rename the files to remove the extension. Apache Tika is capable of finding the correct mime type of the files. Finally, the fourth command deploys the files within public to the S3 bucket specified in the repository configuration.

Finally, you specify the build artifacts. You tell GitLab CI which files are the artifacts that should be preserved from the build. To be honest, this is not necessary in case of my blog. The files are being deployed to an S3 bucket. However, if you are building a piece of software that you would like to make available as a download, or would like to save the artifacts, you could do that. You can set for how long the artifacts should be stored in the GitLab CI servers. I do not need the files. I have specified a period of one day. Here is all the configuration we just discussed:

build and deploy to test:
  stage: build
  environment:
    name: staging
      script:
    - bundle exec jekyll doctor
    - JEKYLL_ENV=staging bundle exec jekyll build --future --unpublished --draft -d public/ --config _config.yml,_config_test.yml
    - find public/ -type f ! -iname 'index.html' ! -iname '404.html' -iname '*.html' -print0 | while read -d $'\0' f; do mv "$f" "${f%.html}"; done
    - ENV=staging bundle exec s3_website push
  artifacts:
    paths:
      - public
    expire_in: 1 day

I would also like to have a separate configuration for build and deployment to the production environment. In this case, the environment name, the URL, and the Jekyll command would be different. I have also specified different buckets for the two sites. But that is configured separately; not as part of CI/CD. Here is the other configuration:

build and deploy to production:
  stage: build
  environment:
    name: production
      only:
    - master
  script:
    - bundle exec jekyll doctor
    - JEKYLL_ENV=production bundle exec jekyll build -d public/
    - find public/ -type f ! -iname 'index.html' ! -iname '404.html' -iname '*.html' -print0 | while read -d $'\0' f; do mv "$f" "${f%.html}"; done
    - ENV=production bundle exec s3_website push
  artifacts:
    paths:
      - public
    expire_in: 1 day

Notice the only parameter. This means that this job will be run only for commits to the master branch. The deployment to test does not have this parameter, which means that the build and deployment to the test site will happen regardless of the branch.

Of course, this configuration can be refactored with before_script and after_script. For instance, bundle install and bundle exec jekyll doctor can be moved to before_script. Next, we could take the S3 deployment to a different stage called deployment, and the rename can be made part of after_script for the build job. It is up to you to decide what is good for your project. You know the trade-off situations better. Here is the complete config we just discussed:

image: bchew/jekyll-build

cache:
  paths:
    - vendor/

before_script:
  - bundle install --path vendor

stages:
  - build

build and deploy to test:
  stage: build
  environment:
    name: staging
      script:
    - bundle exec jekyll doctor
    - JEKYLL_ENV=staging bundle exec jekyll build --future --unpublished --draft -d public/ --config _config.yml,_config_test.yml
    - find public/ -type f ! -iname 'index.html' ! -iname '404.html' -iname '*.html' -print0 | while read -d $'\0' f; do mv "$f" "${f%.html}"; done
    - ENV=staging bundle exec s3_website push
  artifacts:
    paths:
      - public
    expire_in: 1 day

build and deploy to production:
  stage: build
  environment:
    name: production
      only:
    - master
  script:
    - bundle exec jekyll doctor
    - JEKYLL_ENV=production bundle exec jekyll build -d public/
    - find public/ -type f ! -iname 'index.html' ! -iname '404.html' -iname '*.html' -print0 | while read -d $'\0' f; do mv "$f" "${f%.html}"; done
    - ENV=production bundle exec s3_website push
  artifacts:
    paths:
      - public
    expire_in: 1 day

Building the site is only one half of the story. The deployment will be shown in a separate post.