Jekyll: Category and Tag Paging and Feeds

Jekyll is a very popular and very powerful static blog generator. Out of the box it’s able to generate sophisticated site structures, and has a ton of configurability. One of the areas where I feel that Jekyll lacks some sophistication is around the handling of categories and tags; these are two data-sets that are core to Jekyll, but there isn’t a lot of functionality actually built around them. This is in contrast to dynamic blogging platforms like WordPress, or possibly Drupal, where these two data points are used to drive a lot of central navigation for the site.

To be fair, Jekyll is really intended to be a framework for expansion into larger degrees of customization and sophistication, and thankfully it has a very powerful plugin model. Higher-level frameworks like Octopress and Jekyll Bootstrap have shown what you can do with a little extra tweaking - as have the long list of Jekyll plugins.

When I set out to move my site over to Jekyll, one of my key goals was to still support all of the key navigation my site was capable of with my custom platform code, and Wordpress before it. That pretty much amounts to:

  • A date descending paging root for all blog entries /index.html.
  • A matching Atom feed for the root index.
  • Static pages like /about.html and /contact.html
  • Individual blog pages (I suppose this one is obvious).
  • Date desceding paging indexes for all categories and tags I use (for example: /category/article/ and /tag/jruby/.
  • Matching atom feeds for each of the paging indexes above (for example: /category/article/atom.xml and /tag/jruby/atom.xml.

It was the last two where I hit a hurdle when converting over. Jekyll simply doesn’t have built-in support for this. What surprised me was that I couldn’t find an open source plugin that did it either. The closest I found was over at marran.com, where Keith Marran built category pagination using a Jekyll plugin. It wasn’t exactly what I was looking for, but by far the closest - so props to the author for leading me down the path to understanding a little better.

So long story short, I chose to build my own - and it turns out to be quite simple to do, with a significant amount of added navigation as the end result.

Let’s recap a bit what we want to achieve:

  • A page in the form /category/[category name]/index.html for every category, with a Jekyll style paginator available to the page at render time.
  • Subsequent pages in the form /category/[category name]/pageN/index.html for every extra page of entries for that category, again with a paginator available at render time.
  • A top-ten feed list for the category at /category/[category name]/atom.xml.
  • Similar support for every tag via /tag/[tag name]/index.html and /tag/[tag name]/pageN/index.html.

The first step to generating new pages that Jekyll will output is to extend the Jekyll Generator class. We can do this in the Jekyll module so we have access to all of the context we need:

1module Jekyll
2  class CatsAndTags < Generator
3    def generate(site)
4      # Generate pages here!
5    end
6  end
7end

Next, we simply need to orient ourselves and actually iterate over what we want to generate. Turns out that’s quite easy - the passed in site object has everything we need:

 1def generate(site)
 2  site.categories.each do |category|
 3    build_subpages(site, "category", category)
 4  end
 5
 6  site.tags.each do |tag|
 7    build_subpages(site, "tag", tag)
 8  end
 9end
10
11# Do the actual generation.
12def build_subpages(site, type, posts)
13  posts[1] = posts[1].sort_by { |p| -p.date.to_f }     
14  atomize(site, type, posts)
15  paginate(site, type, posts)
16end
17
18def atomize(site, type, posts)
19  # TODO
20end
21
22def paginate(site, type, posts)
23  # TODO
24end

So here you can see we’re iterating over all of the sites categories and tags, and for each one calling the new method build_subpages. What the site actually has on it for each category and tag is a two-position array, with position 0 being the category or tag name, and position 1 being the posts associated with that category or tag.

Now we can narrow down into what the subpage generation actually looks like - as you can see we have three main things we do: sort the pages by descending date, call atomize, and then call paginate.

Creating the atom pages is a little bit easier, so we’ll start there:

 1def atomize(site, type, posts)
 2  path = "/#{type}/#{posts[0]}"
 3  atom = AtomPage.new(site, site.source, path, type, posts[0], posts[1])
 4  site.pages << atom
 5end
 6
 7class AtomPage < Page
 8  def initialize(site, base, dir, type, val, posts)
 9    @site = site
10    @base = base
11    @dir = dir
12    @name = 'atom.xml'
13
14    self.process(@name)
15    self.read_yaml(File.join(base, '_layouts'), "group_atom.xml")
16    self.data[type] = val
17    self.data["grouptype"] = type
18    self.data["posts"] = posts[0..9]
19  end
20end

Let’s break this down:

  • We generate a path for the new page based on the passed in type, and position-0 of the posts array. This will generate our /category/[cat name] and /tag/[tag name] as desired.
  • We create a new AtomPage, which is a custom class we’ll get to momentarily.
  • We add the new atom page to the site’s pages collection.

The custom atom page we have created has a few specific goals. Since it’s not backed by an actual file (like most Jekyll pages would be), we need to mix and match pieces and parts to generate our page into existence:

  • The file name is hard-coded (if you were to use/extend this you may wish to change this or pull it from the site config)
  • The YAML front-matter is read from a layout file we expect to find at _layouts/group_atom.xml (again, the location of this atom layout could be something that comes from the site config if you desire).
  • We bind the tag or category name (val) to the page’s data hash, so the page source can know what it actually represents.
  • We bind the type of group we’re dealing with (“category” or “tag”) to the grouptype data element.
  • We bind the actual list of posts we want to render to the output - in this case the first 10 items in the list. How many this actually renders could also come from the site configuration if desired.

Now we just need to look at the actual atom layout page itself - group_atom.xml:

 1{% raw %}
 2---
 3title: nil
 4---
 5<?xml version="1.0" encoding="UTF-8" ?>
 6
 7<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
 8  <channel>
 9   {% if page.grouptype == 'tag' %}
10   	<title>RealJenius.com - Tag: {{page.tag}}</title>
11   {% elsif page.grouptype == 'category' %}
12    <title>RealJenius.com - Category: {{page.category}}</title>
13   {% else %}
14    <title>RealJenius.com</title>
15   {% endif %}
16   <link>http://realjenius.com</link>
17   <description>I'm a software developer in the game industry, and have been (for better or worse) coding on the Java platform for the last decade. I also do all my own stunts.</description>
18   <language>en-us</language>
19   <managingEditor>R.J. Lorimer</managingEditor>
20   <atom:link href="rss" rel="self" type="application/rss+xml" />
21
22    {% for post in page.posts %}
23	  <item>
24        <title>{{ post.title }}</title>
25        <link>http://realjenius.com{{ post.url }}</link>
26		<author>R.J. Lorimer</author>
27		<pubDate>{{ post.date | date_to_xmlschema }}</pubDate>
28		<guid>http://realjenius.com{{ post.url }}</guid>
29		<description><![CDATA[
30		   {{ post.content | expand_urls : site.url }}
31		]]></description>
32	  </item>
33    {% endfor %}
34  </channel>
35</rss>
36{% endraw %}

(Yes, I know this is RSS 2.0 and not Atom 1.0 - so sue me. I plan to fix it some day!)

In this case, all we do is layout the main feed layout, and then iterate over the posts attached to the data set and print out item content for each.

Note that the title is switched based on the value of the page.grouptype element.

Note: This snippet uses the expand_urls Liquid filter, which is a custom filter to make URLs absolute in the body of the post since feeds are not rendered on your site, so relative links won’t work. I’m not going to go into detail about it here, but it’s available in my site source, and also available in a varied form in the Octopress platform.

So that’s it for generating our feeds for every category and tag - not too tricky! Next, we need to look at how we do the paged indexes in the paginate method above:

 1def paginate(site, type, posts)
 2  pages = Pager.calculate_pages(posts[1], site.config['paginate'].to_i)
 3  (1..pages).each do |num_page|
 4    pager = Pager.new(site.config, num_page, posts[1], pages)
 5    path = "/#{type}/#{posts[0]}"
 6    if num_page > 1
 7      path = path + "/page#{num_page}"
 8    end
 9    newpage = GroupSubPage.new(site, site.source, path, type, posts[0])
10    newpage.pager = pager
11    site.pages << newpage
12
13  end
14end
15
16class GroupSubPage < Page
17	def initialize(site, base, dir, type, val)
18	  @site = site
19	  @base = base
20	  @dir = dir
21	  @name = 'index.html'
22
23	  self.process(@name)
24	  self.read_yaml(File.join(base, '_layouts'), "group_index.html")
25	  self.data[type] = val
26	end
27end

This is borrowed closely from Keith Marran’s plugin. Basically it does this:

  • Ask the paging system to count our pages based on the core pagination configuration value.
  • For each page build a new Pager object and calculate a base path.
  • For all pages but the base page, append an additional element to the path so we get that pageN in the URL.
  • All sub-pages will be called “index.html”, and will use the file in the _layouts folder with the name group_index.html (look familiar to the atom stuff?? It should!)

I won’t bore you with a full wall of HTML for my article list, but here are the relevant bits for doing the output in group_index.html.

 1{% raw %}
 2{% if page.grouptype == 'tag' %}
 3 <!-- Print tag title here -->
 4{% elsif page.grouptype == 'category' %}
 5 <!-- Print category title here -->
 6{% endif %}
 7
 8{% for post in paginator.posts %}
 9  <!-- Print post entry short stuff -->
10{% endfor %}
11
12{% if paginator.next_page %}
13  <!-- Render next page link -->
14{% endif %}
15
16{% if paginator.previous_page %}
17  <!-- Render previous page link -->
18{% endif %}
19
20{% endraw %}

If you’re familiar with doing a normal paginated index in Jekyll this should look pretty familiar; all of the mechanics are identical at the Liquid+HTML level.

That’s about it for the custom page generation. Not too bad!

If you’d like to see the implementation I use on my actual size, you can see it over on GitHub here: RealJenius.com - cat_and_tag_generator.rb.

jekyll  ruby  site 

Clean it Up: Article Series List With Jekyll, Part 2

Previously I showed how you could build an article series list with Jekyll by scrapping together some Liquid scriptlets and some clever looping. The implementation certainly works, but it’s a little bit ugly, inefficient, and hard to maintain. The main goal was to see how far we could stress it using only Liquid.

This time I’d like to show how you could achieve the same by actually implementing a proper Liquid tag; implementing some Ruby code to achieve the same goal.

There are a lot of ways you can extend Jekyll, including custom generators, filters, and converters. Another way is to add custom tags to Liquid, since it’s the foundation for Jekyll. In fact, Jekyll adds a couple out of the box, like the include and highlight tags.

We’ll create a new tag called series, and the end result would be that we can simply put it in our article like this:

Welcome to my article about Fish in the United States. This is the first entry in a series about fish throughout the world!

{% raw %}
{% series_list %}
{% endraw %}

Obesity rates in the Fish in US have hit epidemic proportions...

In practice, not a whole lot different than what we had in part one, but the code will be oh-so-much-more-rewarding. So how do we get there?

First, we need to implement a new Jekyll tag - we’ll start by getting all of the declaration boiler-plate out of the way:

 1module Jekyll
 2
 3  class SeriesTag < Liquid::Tag
 4    def initialize(tag_name, params, tokens)
 5      super
 6    end
 7
 8    def render(context)
 9      ""
10    end
11  end
12end
13
14Liquid::Template.register_tag('series_list', Jekyll::SeriesTag)

We can put this new class definition in our Jekyll _plugins folder as _plugins/series_tag.rb. In reality, you can put any Ruby code in the plugins folder that you want to load on Jekyll page-generation time; in practice it’s going to extend Jekyll in some way, or there probably isn’t much point to having it.

All we’re doing here is creating a new Liquid::Tag, and registering it in Liquid as series_list. The render method is expected to generate replacement markup that will be processed by the next stages of the generation process (markdown/textile, and then final HTML output).

Now we need to figure out how to get a handle on a few things:

  • Our current page so we know which post we’re currently rendering.
  • The series we’re supposed to be rendering.
  • All posts for the site, so we can generate the list.

Here’s what that looks like:

1site = context.registers[:site]
2page_data = context.environments.first["page"]
3series_name = page_data['series']
4if !series_name
5	puts "Unable to find series name for page: #{page.title}"
6    return "<!-- Error with series tag -->"
7end

We get the site object out of the “context.registers” hash. The page is more cryptic - in this case we’re collecting the page data out of the environments, and then simply fetch the series off of it. There isn’t a lot of documentation about what is available where, but there are some good examples on the web already, and you always have access to the Jekyll source!

We also do a simple series value-set check, and fail fast if it’s not there.

Next, we need to get our filtered list of posts, and make sure they’re ordered as we need:

1all_entries = []
2  site.posts.each do |p|
3    if p.data['series'] == series_name
4      all_entries << p
5    end
6  end
7
8  all_entries.sort_by { |p| p.date.to_f }
9

Now that we have the subset of posts as well as our current post, it’s really just a matter of looping and rendering a bunch of HTML:

 1text = "<div class='seriesNote'>"
 2  list = "<ul>"
 3  all_entries.each_with_index do |post, idx|
 4    list += "<li><strong>Part #{idx+1}</strong> - "
 5    if post.data['title'] == page_data['title']
 6      list += "This Article"
 7      text += "<p>This article is <strong>Part #{idx+1}</strong> in a <strong>#{all_entries.size}-Part</strong> Series.</p>"
 8    else
 9      list += "<a href='#{post.url}'>#{post.data['title']}</a>"
10    end
11    list += "</li>"
12  end
13  text += list += "</ul></div>"

And that’s it! We now have our finished HTML generating series list, fresh from Ruby code instead of a bunch of Liquid scripts.

You can get the entire tag by forking or downloading from Github.

jekyll  ruby 

Dirty Tricks: Building an Article Series List With Jekyll

Jekyll is one of the most popular “static blogging” tools available right now, and is the foundation for a number of popular tools at a more sophisticated level, including OctoPress, and Jekyll Bootstrap. Since the end result of a built Jekyll site is plain, vanilla HTML, it allows for fairly complex sites to be built with a bare minimum of hosting requirements, and it’s also pretty easy to secure and make perform in the process!

That said, there are some features Jekyll just doesn’t have that big dynamic content management systems do; but that doesn’t mean they can’t be built. For the more sophisticated enhancements, you will likely need to look at implementing custom Jekyll plugins, tags, and filters – but for some features, you can get away with wrangling Liquid scripts into the shape you need. Liquid is a templating engine, and like many templating engines, it has a little bit of programming support mixed in with its ability to generate dynamic markup, and sometimes you can leverage that to hit your goal.

One of the features that RealJenius.com has is a “series” list - you can see this on my Distilling JRuby articles:

An article count, an ordered list, and links to all neighbors.

An article count, an ordered list, and links to all neighbors.

Obviously, if you’re running a site with server code behind it, you can simply fetch by an index in the database, or possibly iterate over relevant entries in memory, or something similar when you’re rendering the blog entry, and easily generate something like this. But if you’re generating a blog at build time, how can you fill this in?

It turns out it’s not all that different in Jekyll. One of the things Jekyll makes available at the global level when performing blog generation is a list of all posts, ordered by their post time in descending order, and with those you can get all sorts of data about the post. From that, I imagine you can probably see how to do this with a plugin to Jekyll; just:

  • Iterate the post list with some Ruby code
  • Pluck out the right entries
  • Sort them
  • … and render some HTML

But let’s say we want to take the role of “site designer”, and not write any Ruby code. How can we translate this raw list of posts into a series list, with all of the information above?

First off we need to identify the posts. When you write a post in Jekyll, you have to fill in the YAML front-matter. It’s just a block of YAML leading the entry that has property-like details for the post, like the title, the page layout to use, etc. You can add any custom fields you want as well. So let’s tag our articles as belonging to a particular series:

 1---
 2title: The Fish of the United States
 3layout: post
 4series: fish-series
 5---
 6{% endhighlight %}
 7
 8Next, we need to create a reusable chunk of Liquid+HTML for the logic. We can simply use an include in the `_includes` directory for this:
 9
10**`_includes/series.html`**:
11
12```html
13<div class="seriesNote">
14<!-- A series block will go here -->
15</div>

Now, in our article, we can simply include the series like this:

Welcome to my article about Fish in the United States. This is the first entry in a series about fish throughout the world!

{% raw %}
{% include series.html %}
{% endraw %}

Obesity rates in the Fish in US have hit epidemic proportions...

Finally, we just need to implement our series block. There are a number of things we need to render the example above:

  • A count of all of the articles.
  • An index for the current article.
  • A list of posts in chronological order for the series.
  • The URL for each post.

Here is a big wall of hack-y liquid tags to achieve just that.

 1{% raw %}
 2{% assign count = '0' %}
 3{% assign idx = '0' %}
 4{% for post in site.posts reversed %}
 5	{% if post.series == series %}
 6		{% capture count %}{{ count | plus: '1' }}{% endcapture %}
 7		{% if post.url == page.url %}
 8			{% capture idx %}{{count}}{% endcapture %}
 9		{% endif %}
10	{% endif %}
11{% endfor %}
12
13<div class="seriesNote">
14	<p>This article is <strong>Part {{ idx }}</strong> in a <strong>{{ count }}-Part</strong> Series.</p>
15	<ul>
16	{% assign count = '0' %}
17	{% for post in site.posts reversed %}
18	{% if post.series == series %}
19		{% capture count %}{{ count | plus: '1' }}{% endcapture %}
20		<li>Part {{ count }} -
21		{% if page.url == post.url %}
22			This Article
23		{% else %}
24			<a href="{{post.url}}">{{post.title}}</a>
25		{% endif %}
26		</li>
27	{% endif %}
28	{% endfor %}
29	</ul>
30</div>
31
32{% assign count = nil %}
33{% assign idx = nil %}
34{% endraw %}

You can download or fork this source directly from Github.

Let’s work through this. The script is roughly broken into three parts:

  1. The first part iterates over the total post list counting all entries that match the series key, and also finds the index of the post.
  2. The second part generates the series summary text, and then iterates over the total post list again, generating links for each post.
  3. Finally will clear out any variables we used since they all float around in a global namespace.

A few interesting things to note:

  • Properly incrementing and tracking a count when you’re doing filtering in a for loop is pretty nasty in liquid. This is done by using the “plus” filter and recapturing the value: {% raw %}{% capture count %}{{ count | plus: '1' }}{% endcapture %}{% endraw %}. This is effectively like saying count = count + 1.
  • We use the special for loop keyword reversed because the posts in Liquid are reverse-chronological, and we want to list them in true chronological order.
  • We have to iterate twice because the series summary text comes before the actual list, and there aren’t any liquid constructs (that I know of) for creating a new array off of the first iteration.

And that’s it! Could this be done cleaner and probably easier via some judicious use of Jekyll plugins? Absolutely! And what would be the fun of that?

jekyll  ruby  site  hacks