Johnathan.org

Showing only: web development

Cloudflare Argo: 228 Days Later

It seems like such a random amount of time to review (and it kind of is) but I wanted to start of 2019 right and review a topic I touched on in 2018: Cloudflare‘s smart-routing product, Argo.

In my previous post about Argo, I covered the vast improvements to response times just by enabling the service. Response times were practically cut in half. Since then, I’ve made some more tweaks to my site so it felt fair to review if Argo is still picking up the slack it claims to. If you’re unsure of how Argo works, my previous post has a good explainer.

Considering Aggressive Caching

One improvement I made was to lean heavy on Cloudflare’s Page Rules functionality. I purchased myself a set of five Rules for an additional +$5.00/month and got to work. I focused on wielding caching for everything that isn’t likely to change often if ever. In this case, most static assets will live on Cloudflare’s servers and in a visitor’s browser for quite a while.

When I first implemented this, I didn’t consider plugin JS, but in reality, most of what’s being caught by that rule is WordPress-related (read: Jetpack), and I haven’t experienced issues thus far.

With the majority of /wp-content being taken care of with page rules, it was time to re-evaluate the now decreased load and its effect on the benefits Argo provides.

Argo Post-Aggressive Caching

There’s a reason Cloudflare recommends Argo regardless of how you cache. Even with aggressive caching in place, I’m still seeing about 25% response time improvements:

The average runs between 23-27%, depending on the days I’m checking, but the 23.28% in the image above is pretty close to “most of the time.” What’s also worth pointing out is the peaks and valleys largely follow the same percentage improvement across the board, and it’s no wonder: 75% of requests end up going through Argo’s pipeline.

With the aggressive Page Rules and Argo, I’m comfortable in saying Argo has a permanent home with this site and any future projects I take on. It’s a no-brainer and still remains highly cost-effective.

Google Isn’t the Company That We Should Have Handed the Web Over to

Put simply:

This is a company that, time and again, has tried to push the Web into a Google-controlled proprietary direction to improve the performance of Google’s online services when used in conjunction with Google’s browser, consolidating Google’s market positioning and putting everyone else at a disadvantage. Each time, pushback has come from the wider community, and so far, at least, the result has been industry standards that wrest control from Google’s hands. This action might already provoke doubts about the wisdom of handing effective control of the Web’s direction to Google, but at least a case could be made that, in the end, the right thing was done.

Why should you care? For reasons like this (emphasis mine):

For no obvious reason, Google changed YouTube to add a hidden, empty HTML element that overlaid each video. This element disabled Edge’s fastest, most efficient hardware accelerated video decoding. It hurt Edge’s battery-life performance and took it below Chrome’s. The change didn’t improve Chrome’s performance and didn’t appear to serve any real purpose; it just hurt Edge, allowing Google to claim that Chrome’s battery life was actually superior to Edge’s. Microsoft asked Google if the company could remove the element, to no avail.

In any other industry, we’d call that grounds for antitrust lawsuits.

Microsoft isn’t blameless, either. They opted to take the easy way out and Firefox will likely have to pay the price:

By relegating Firefox to being the sole secondary browser, Microsoft has just made it that much harder to justify making sites work in Firefox. The company has made designing for Chrome and ignoring everything else a bit more palatable, and Mozilla’s continued existence is now that bit more marginal. Microsoft’s move puts Google in charge of the direction of the Web’s development. Google’s track record shows it shouldn’t be trusted with such a position.

At the end of the day, one thing’s clear: competition is good. We see it in all walks of life. With Microsoft turning tail and succumbing to the Chrome overlords, they’re admitting they don’t care about the openness of the Web… just their market share and numbers.

permalink

Deploying A Jekyll Static Site with Circle CI

One of the primary steps in making each iteration of this site happen is deploying its generated HTML files to my Digital Ocean server. Since these are just static files and there isn’t a CMS backing them up (in a traditional sense), there needs to be an automatic process that takes care of it after I make a change. If I had to manually push or build the site every time I added something, I’d:

  1. never do it
  2. go back to a CMS

This is where Circle CI enters the picture. Used mainly for software development, Circle CI allows those who are more inclined in the software development realm to build, test, and deploy code. If it can run on a Linux command line, Circle CI can run it.

For the grand starting price of zero and the promise of keeping code open source, I’m offered up to four concurrent builds and 25 hours of build time. As the operator of a static site powered by Jekyll, this is way more than I’ll need, but I’m glad it’s there.

Goals

  • Break down how I make this site happen with Circle CI.
  • We’ll take a look at my Circle CI config file and go over my workflow a bit.

It used to be way more complicated before I wrote this post but as I was thinking about what to write, I realized I was doing way more work than I needed to in the build process.

If you’d like to follow along, this entire site is available to browse through it in GitHub repo form and the Circle CI config file is here.

Table of Contents

The Circle CI File

Starting off first, let’s take a look at the defaults I have set:

defaults: &defaults
  docker:
    - image: circleci/ruby:2.5.1-node-browsers
  working_directory: ~/repo

What you’re looking at here are values that I’ll always need, no matter how many steps, jobs, etc. I’ll end up with. Since I have only one job, this is more of my “do not touch” section in that these values will never change and only new ones will be added. (You can find a more in-depth explanation of the purpose of defaults here.)

version: 2
jobs:
  the_only_job:
    <<: *defaults

Now we’re entering job territory. This is where I specify the actual tasks I need Circle CI to run. I only have one job, now–the_only_job–but if I had more, they’d be broken down like this:

version: 2
jobs:
  the_only_job:
    <<: *defaults
  except_its_not:
    <<: *defaults
  a_third_job:
    <<: *defaults

Each job would call upon the defaults because Circle CI treats each job as a separate build and would need its own container. In multi-job scenarios, having a set of defaults to share across all jobs is truly a no-brainer.

Deployment

Inside our job, we have a set of steps:.

Note: This is a list of tasks that should be performed by the container. Everything in this section is in the context of:

jobs:
  the_only_job:
    <<: *defaults
    steps:

So never mind the lack of full indentation. It saves me from repeating lines a dozen times.

Pre-game Tasks

- add_ssh_keys:
    fingerprints:
      - "69:fe:2c:df:c8:34:c5:e6:3f:6e:18:64:43:97:58:02"

The very first thing I have the container do is add an SSH key using the add_ssh_keys step. I’ve provided Circle CI with a key to the production server as a specific deploy-only user. This adds the key to the container so it can connect to the server later without me needing to provide hardcoded credentials. Doing so would be a massive security breach as my Circle CI builds are open to the public.

- checkout

Once that’s good to go, I have Circle CI checkout the latest code from the master branch of johlym/johnathan.org. Simple enough.

- attach_workspace:
    at: ~/repo

The third step is attach_workspace. This was more relevant when I had multiple jobs but the idea here is that we’re creating a persistent and consistent location within the job container to do all our task work. In this case, I need to make it clear that we’ll be doing all our work in ~/repo from here on out.

Cache Handling, Part 1

- restore_cache:
    keys:
      - v1-bundle-{{ checksum "Gemfile.lock" }}-{{ checksum "package.json" }}

This part is important if there’s even a stretch goal of having a speedy build process. The restore_cache step looks for a cache file that we’ve already built (something we’ll do at the end) to save time with things like bundle and npm. Without this, we could spend a few minutes just installing Rubygems and Node modules. bleh.

The cache file uses MD5 hashes of the Gemfile.lock and package.json files combined. If those files never change, the MD5s won’t either, so this cache will remain valid. If I were to update a gem, for example, the cache would be invalid and Rubygems and Node modules would be installed.

One potential spot for improvement here is to break this out into two separate caches, but Circle CI doesn’t handle that well, so this’ll be fine.

Installations

- run: 
    name: Install Rubygems if necessary
    command: |
      bundle install --path vendor/bundle --jobs 4 --retry 3
- run: 
    name: Install Node modules if necessary 
    command: |
      cd ~/repo && npm install
- run: 
    name: Install Rsync
    command: |
      sudo apt install rsync

This part is pretty straight forward. I need to make sure all the required Rubygems, Node modules, and Rsync are installed. In this case, I’m making sure bundle puts everything in vendor/bundle (remember, this is relative to ~/repo since we declared that to be our workspace earlier) when it installs. The Node modules I don’t need to worry so much about. The package.json would be located in the ~/repo directory since that’s where the code was checked out to so we hop in there and get to it. It’ll plop its node_modules file at ~/repo/node_modules as a result. This is totally acceptable. Lastly, we install rsync via apt. Nothing special.

Site building

- run: 
    name: Build site
    command: |
      bundle exec jekyll build --profile --verbose --destination /home/circleci/repo/_site

For those who’ve worked with Jekyll before, this command shouldn’t come as a surprise. We’re asking Jekyll to build out the site and place it at ~/repo/_site. The --profile and --verbose flags are for CI output, only, in case there’s an error or my curiosity gets the better of me.

Site tweaking

- run:
    name: Install and run Gulp
    command: |
      cd ~/repo && npx gulp

When considering how I wanted to handle minification of HTML and JavaScript, I considered the jekyll-assets plugin, but decided against it because of the amount of overhead and work that would be required to implement it in my already moderately-sized site. This is where I decided to bring in Gulp, instead. I have a simple Gulpfile that’s set up to use a couple Gulp modules to minify all the HTML and local JavaScript. Over the 400-something pages I have, this saves me about 20% on the site size overall. Not too shabby.

You’ll notice we need to use npx here. For some reason, I was never able to get Gulp to run on its own… it would look for the gulp binary in strange places I could not control. npx allows me to run gulp wherever, so long as it can find the corresponding node_modules folder for reference. Brilliant, eh? Portable Gulp.

Server Push

- run: 
    name: Deploy to prod server if triggered via master branch change
    command: |
      if [ $CIRCLE_BRANCH = 'master' ]; then rsync -e "ssh -o StrictHostKeyChecking=no" -va --delete ~/repo/_site [email protected]:/var/www/johnathan.org/static; fi

This is pretty straight forward, as well, though it can look complicated to the untrained eye. What we’re doing is here is first checking if the branch this build is based off of is the master branch. We’ll find that value in the CIRCLE_BRANCH ENV variable. If it is not, we’ll skip this, but if it is, we’ll run rsync to push the contents of ~/repo/site over to the production Digital Ocean server. I’m using the IP here because of Cloudflare, though I have a TODO item to use a hostname instead.

Post-Deployment

For all intents and purposes, the deployment is done, but because of Cloudflare, we have one additional step to make sure everyone’s seeing the freshest code.

Cloudflare

(we’re still in the jobs context)

- run: 
    name: Bust Cloudflare cache if triggered via master branch change
    command: |
      if [ $CIRCLE_BRANCH = 'master' ]; then 
        curl -X POST "https://api.cloudflare.com/client/v4/zones/$CLOUDFLARE_ZONE_ID/purge_cache" \
        -H "X-Auth-Email: $CLOUDFLARE_API_EMAIL" \
        -H "X-Auth-Key: $CLOUDFLARE_API_KEY" \
        -H "Content-Type: application/json" \
        --data '{"purge_everything":true}'; 

Using the Cloudflare API, we’re submitting a POST request to dump the entire cache for the johnathan.org DNS zone. I’ve provided Circle CI with the necessary information as ENV variables and am calling upon them here. This keeps them safe and the job step functional.

I wouldn’t recommend this for high-volume sites, but because I have Cloudflare caching just about everything combined with the fact that I maybe do this a couple times a week, this feels like the right level of effort and precision.

Cache Handling, Part 2

- save_cache:
    key: v1-bundle-{{ checksum "Gemfile.lock" }}-{{ checksum "package.json" }}
    paths:
      - ~/repo/vendor/bundle
      - ~/repo/node_modules

Earlier, we called upon the generated cache. Here is where we create it if necessary. This’ll do the same check step before acting. If the cache file already exists with the same MD5s, we’ll skip creating it, but if it’s missing, we’ll build it out, making sure to capture everything from the ~/repo/vendor/bundle and ~/repo/node_modules folders we referenced with Rubygems and NPM.

Workflow Management

workflows:
  version: 2
  build_site:
    jobs:
      - the_only_job

I used to have multiple jobs running in a breakout-combine pattern, and this is leftover from that. Although I only have one job, now, I didn’t want to re-craft it to not use Workflows, so I just operate with a one-job Workflow instead. XD

Wrap Up

That about does it for my overview. This process is turning out to work very well for me and I’m glad I took the time to both develop it and explain it for posterity. Over time, it’ll morph, I’m sure, but right now this feels like a really good bass to work off of.

Thanks for taking the time to read this. Cheers!

Lazy-loading Retina Images in a Jekyll Site

Something I’ve wanted to touch on ever since a couple posts ago was lazy loading images. Now that I’m trying to consciously serve 2x images for those with such a pixel density, I’m setting myself up for increased page loads and folks may never even make it that far down the page. Since there’s no need to load what one won’t see, I set out to add lazy image loading support to the blog.

This really doesn’t apply specifically to Jekyll, in fact there’s nothing about this that’s unique to Jekyll but during my searches, I initially felt I needed to find a Jekyll plugin to solve this problem. Since I had that idea, I’m certain others will, too, so I want to catch as many of those folks as I can and let them know it’s easier than that!

As far as the JavaScript goes, I’m using the vanilla-JavaScript-based lazyload.js. I just need to toss it into the footer (no point in loading it before everything else, which technically means we’re lazy-loading the lazy-loader):

<script>
  (function(w, d){
    var b = d.getElementsByTagName('body')[0];
    var s = d.createElement("script"); s.async = true;
    var v = !("IntersectionObserver" in w) ? "8.7.1" : "10.5.2";
    s.src = "https://cdnjs.cloudflare.com/ajax/libs/vanilla-lazyload/" + v + "/lazyload.min.js";
    w.lazyLoadOptions = {}; // Your options here. See "recipes" for more information about async.
    b.appendChild(s);
}(window, document));
</script>

What we’re doing here (courtesy of the lazyload.js documentation) is selecting the best version of it based on the browser. Version 8.x plays better with everything and since not all browsers support IntersectionObserver, it’s important to be backwards compatible. Version 10.x will load for Firefox, Chrome and Edge, while 8.x will load for pretty much everything else.

Now that our JavaScript is in play, the only other step is to update the image tags. I have a TextExpander snippet to help me out here, since I’m also inserting retina (@2x) images. It looks like this:

[![](){: data-src="%filltext:name=1x image%" data-srcset="%filltext:name=2x field% 2x" data-proofer-ignore}](%filltext:name=full image%){: data-lightbox="%filltext:name=lightbox tag%"}

This creates four fields, the paths to the 1x small image 2x small image, the full-size image loaded and the tag needed by lightbox.js. The finished markdown product looks like this (taken from a previous post):

[![](){: data-src="/assets/images/2018/05/07/gtmetrix-0512-sm.jpg" data-srcset="/assets/images/2018/05/07/[email protected] 2x"}](/assets/images/2018/05/07/gtmetrix-0512.jpg){: data-lightbox="image-3"}

Pretty slick, eh? With this in place, some bytes will be saved every day.

You might have noticed the data-proofer-ignore attribute. That’s to keep htmlproofer from tripping over these images because they have no src or srcset attributes defined. It’ll essentially skip them during its check. The downside is if I need to do an audit on images, I’ll have to bulk eliminate data-proofer-ignore at least temporarily.

Once this is all said and done, let the better test results come pouring in!

Easy Redirect Links Within Jekyll

I am buttoning up the final touches on my new Jekyll-powered blog. One thing I wanted to attempt without having to spin up yet another app or service was short link redirects.

My goal was to be able to take http://johnathan.org/goto/something and have it redirect to a full URL of my choice. Jekyll is trigger happy with creating a page for everything. I knew I had to find some way to manage the data long term.

Forestry offers the ability to manage arbitrary data sets within Jekyll. Since I’m using it, I knew it was going to be a breeze so that’s the path I took.

I first started by creating a new file in the _data folder called shortlinks.yml. Anything I plug into this YAML file becomes an extension of the site object. My data file has three fields per entry: title, key, and destination. The title and destination are self-explanatory. The key is the short URL keyword. This is what we’ll use after /goto/ in the path.

Having these fields in mind, my shortlinks.yml would look something like this:

- title: A cool page
  key: coolpage
  destination: https://google.com
- title: A mediocre page
  key: mehpage
  destination: https://bing.com

This means I can now access my links under the site.shortlinks array by iterating over it. Unfortunately, we still have a bit of a roadblock. Since everything about Jekyll is static, I can’t create a dynamic page that would access the data. Wwll, I could, like something in PHP, but I don’t want to. Instead, we’ll have to use the data as a base for a set of pages we’ll create that act as redirects.

This is like how Jekyll Collections work, except we don’t want to create the pages ahead of time, only on jekyll build. This is where data_page_generator.rb comes into play. By placing it in_ _plugins and feeding it a few settings in config.yml, we can instruct it to build pages based on shortlinks.yml.

That code would look something like this:

page_gen:
  - data: shortlinks
    template: redirect
    name: key
    dir: goto

Seems easy enough. Let’s break it down. data is the data file we want to use. It makes assumptions about the file type. template directs the plugin what base template to use. What that template looks like is in the next section. name is the key from earlier. This is the file name that we create within the dir directory. In this example, the redirect files land as _site/goto/key.html__, provided the base directory is_ _site.

Now that we have the configuration squared away, we need to create an _includes/redirect.html template. Since we’re doing immediate redirects, it doesn’t need to be fancy, nor does it need to have style. This will do:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta http-equiv="refresh" content="0; url={{ page.destination }}" />
  <script type="text/javascript">
    window.location.href = "{{ page.destination }}"
  </script>
  <title>Redirecting...</title>
</head>
<body>
Redirecting to {{ page.destination }}. If it doesn't load, click <a href="{{ page.destination }}" />here</a>.
</body>
</html>

data_page_generator.rb will take each object in the data file and pump the values into this template. In our example, we’re only interested in destination.

With our template set, there’s one last thing we need to do.

You might need to make a tweak to convince it that /goto/something and /goto/something.html are alike and not return a `404 (dependent on your Webserver configuration). In my case with Nginx, all I had to do was swap:

location / {
    try_files $uri $uri/ =404;
}

with

location / {
    try_files $uri $uri.html $uri/ =404;
}

For those following along at home, all I added was $uri.html. What we’re doing is instructing Nginx to take the presented URI–in the case of /goto/something, the URI is something and try it against something by itself, then something.html (this is what we’ve added), and finally something/ before giving up and looking for a 404 page. If it matches something, try to render it no matter what.

Now, we can push everything and give it a go. In my real-world example, I have a couple links already set up as I write this. My new favorite air purifier/filtration system is Molekule so clicking that link will take you straight there.

Long-term, I’m uncertain about this solution’s long-term scalability. This will be fine on the scale of a few hundred links as Jekyll can turn through a few hundred pages in a matter of seconds (especially when paired with Ruby >= 2.5.x). How well this works long term will come down to a couple things:

  • The effort required to manage the data file
  • The tools in place to automate the building of the Jekyll site

For the latter, I use Circle CI so I’m fine with it taking a handful of seconds or even a half-minute longer to update. For the former, I use Forestry. I haven’t pushed it to the point where it has several hundred items in the shortlinks.yml data file. The limits of its capability are unknown in this regard.

My alternative plans were to move to Bit.ly (using a short domain of some sort) or setting up Polr on the server. I’ll have to spend some time thinking about how I can track clickthroughs. As I wrapped this up, I pondered plopping Google Analytics on the redirect page. This allows me to measure the movements into /goto pages as clicks.

I’m happy with how this turned out. Like all things I do, there’ll be persistent tweaking involved.

Johnathan Lyman
Kenmore, WA,
United States
 
blogging, design, technology, software, development, gaming, photography