Last weekend I started looking at the problem of content categorisation on this website. More accurately, I was trying to work out how I wanted information to interconnect in order to implement site search; I needed to create an architecture – or taxonomy – into which I could slot everything I've ever written, recorded, or thought about. At the time, I referenced A List Apart as a really good example of what I was trying to achieve, but obviously there's a pretty big difference. ALA has a specific focus: web development. That means their taxonomy is a fairly simple affair, with relatively neat delineations. Sure, there will always be outliers, but for the most part they can be pretty specific whilst still maintaining a relatively sparse number of categories.
I am a person with an extremely wide range of interests (like most people). For instance, one of my interests is web development. Straight away, that means I would need to replicate the ALA taxonomy entirely and it would only cover one core interest. What about evolutionary biology? Music? Photography? Linguistics? Well, they would need their own categories, of course... but then how many would I have?
I ended my previous article saying:
There's an actionable path here, but also a lot more thinking to be done. I knew this was going to be a can of worms 🐛
I had no idea.
Dissecting The Past
I think I first introduced some form of categories back in version two. Certainly, if you go back any further they become sparse and almost non-existent. Still, that gave me a starting point, so for phase one I exported a JSON file of my entire Notes, Articles, and Journal feeds. With a little bit of data manipulation, I managed to get this into Excel in a vaguely useable format, from which I extracted all my current categories, along with a vague map of how these interacted. In other words, I could see whether "Web" and "Technology" were often paired together, or if they had never been.
That let me begin grouping my old categories into... well, categories. Thinking like that became confusing, so I began referring to these top-level categories as silos. Some silos seemed intuitive: technology, web, software, and hardware naturally grouped together. Naming them was less easy. For example, take that "tech" grouping; you want to call it just that – "Tech" or "Technology" – but then one of the subcategories is also called "Technology", which is just asking for trouble. You can't have a taxonomic position of Technology technology (well, unless you're a gorilla, I guess). As a result, some old categories were elevated to the position of silos, which felt like a good compromise.
Part of the reason behind creating a new taxonomy, though, was to begin amalgamating all the different information stores that I've ended up trying to use over the years. Between Tumblr, Pocket, Evernote, and about a half dozen more (not to mention browser bookmarks, Twitter likes etc.), I've been trying to augment my memory for well over a decade at this point. Each one of those stores had its own way of categorising content. Some had genuine taxonomies, like Evernote; others just filtered on datestamps, like Pocket. Still, they now all needed to be combined, so I began pulling new silos and subcategories out of them and adding them to the spreadsheet.
If At First You Don't Succeed
My first draft had 11 silos (too many). Some of those silos had zero subcategories; others had almost ten (too big a difference). Whilst far from ideal, it at least gave me a baseline to begin testing against, so I created a new table with just my entry titles and blank columns for silo and subcategory; then I began sorting. This took a while. With nothing but a title and almost six years of accumulated content (227 entries total) even categorising posts from theAdhocracy took several hours. It was a worthwhile exercise though because it highlighted some glaring issues.
First, my new taxonomy was too stiff. Having kept ALA and similar site architectures in mind, I'd created the kind of categories you'd expect to see at a library: user experience, biology, anthropology, graphic design etc. These worked great for about 60% of my content (mainly the notes) but failed utterly for the remainder. Where does a taxonomy like that place a blog post on struggling with the anxiety of deadlines? It could go into psychology, or it could go into health, or possibly even blogging, but none of them really fit perfectly. And what about Journal articles that are just about my life? Or posts about blogging challenges?
My initial response was to create a new silo called "Meta" and pretty much stick every one of these problem articles into either a "Personal" or "Thoughts" category. That let me continue categorising everything else, but when I was done almost 40% of the content was sat in those two categories (and mainly in Personal). They'd become wastebasket taxa and that meant my taxonomy was failing.
The second issue was that the information structure was preventing high-level cross-pollination of ideas. The whole point of creating an information architecture is to map out the crosslinks between topics, making search navigation more robust and information discovery as easy as possible. But I couldn't do that, which was creating a kind of information dissonance. A post on going to watch a group play medieval instruments was being torn between fitting into the Music category, the Personal category, and the History category, when really it should sit in all three.
On top of which, when I'd first started mapping my existing taxonomy, I'd failed to truly take note of the full structure. Yes, I was using WordPress categories and tags, but I'd also created a higher-level of categorisation; my content equivalent of a biologist's Kingdom. Content was also sorted into "types": article, note, review, journal. But that distinction had only been made relatively recently, which meant that older posts were often sitting in the wrong pot. Take that post on medieval instruments: today that would probably have been a Journal post, but I'd still want it to appear under the Music category. That meant an initial assumption was wrong: my taxonomy needed to be flexible enough to cater for all content types, not just Articles and Notes.
Try, Try, Try Again
At this stage, I went back to the drawing board. I created a new spreadsheet, once again copied over all my content titles, and began re-sorting. This time I also made a note of each entry's content type, including whether it should be updated as part of this cataloguing process. Of course, all the old MiMs have been waiting to be divided into reviews for a while, but plenty of other old articles quickly filtered into the Journal bucket, and a few were even earmarked to be split up into multiple Notes moving forward.
I then took a second shot at defining my silos, resulting in a few being merged or broadened, before going back through the list categorising everything for a second time. Except, now I gave myself a further taxonomic division: secondary categories. The primary category was often the same as I had sorted content into the first time around, but adding secondary options allowed me to begin building up that web of connections I was after. For now I've set a soft limit on three secondary categories, but I actually think I'm okay with it being infinite.
Those connections proved even more useful than I'd thought, as they quickly began to highlight that some of my silos contained content which rarely clustered together, whereas others frequently did. Take what I (at this stage) called the Media silo. This contained categories for Movies, TV, Music etc. I found that Movies and TV routinely ended up tagged on the same entry, and that Music would often also have a secondary category of one of those too. As a result, it made sense to merge Movies with TV into one category (which I dubbed Moving Pictures after the Terry Pratchett book). It also proved that these two categories (Moving Pictures and Music) made sense to sit alongside one another within a Media silo.
On the other hand, a Content silo that contained categories for UX, Web Culture, Photography, and Technology showed little-to-no interlinking. UX primarily linked with Coding categories – like Frontend and Accessibility – whilst Technology and Web Culture often linked with various Science categories. Photography, unsurprisingly in hindsight, shared a close relationship with the Personal category. As a result, the Content silo went the way of the dodo as I shuffled its categories into other silos with a higher level of shared interest. In turn, that began forcing some silos to be renamed; Science, for example, became Quite Interesting (yes, after the British TV show), a general mix of things going on in outside of my personal sphere. As a result, it neatly absorbed categories like Politics and Philosophy which had been homeless up until now. Articles were starting to coalesce into a distinct shape.
It was around this point that I asked Alison to take a look over. Her second opinion helped massively, as it forced me to kill some of my darlings and simplify a few other areas of the taxonomy. It also highlighted a few categories which were clearly right, even if they hadn't quite yet found the best silo. After an hour or so of back and forth, we realised that a large part of that was because my silos were still too narrowly focused. Coding was an obvious problem, as it didn't really leave enough scope. Alison came up with Building and a few more pieces fell into place. As they did, those wastebaskets began to shrink, a sure sign that we were on the right track.
The "Final" Taxonomic Web
As I continued to refine the overall structure in Excel I also began experimenting with other information stores. I ran through the entirety of Evernote and Tumblr in my head, quickly sorting posts into the structure I'd begun to settle on. For the most part, it worked. Where it didn't, I made notes or tweaked things slightly until it did. By the time I was done, a couple of major shifts had occurred:
- Most of my categories (and all of my silos) were now extremely broad, containing more than just one topic (and mainly half a dozen or so);
- I'd realised that the topics that I wrote about had clearly shifted over time, and would likely do so in the future.
These two points supported one another, clearly, but they also made me aware of the few topics which were still large enough to warrant their own categories. Topics like Photography had finally found homes within specific silos, but they were (largely) still one-trick ponies. That wasn't an issue, but it meant that, in the future, I might want to do the same thing again. Say I suddenly decide to shift my writing towards conservation; I'd probably want to further subdivide the Natural World category as it began to take over as the main priority.
That realisation was quite freeing, as it categorically (no pun intended) ruled out using taxonomic terms in any permanent way, such as URLs. With that potential issue firmly set aside, having to find the perfect name for each and every category began to feel silly. I felt happier about potentially changing names or redividing categories as-and-when needed in the future. Rather than create a definitive ruleset, I began focusing on one which served my needs right now without sacrificing scope to grow.
It did, however, also highlight that the same rules should apply to all levels of the taxonomic hierarchy, including content type. After all, I'd uncovered several old articles that needed to be converted into Reviews, Notes, and Journals already, so the chances are good there might be a similar need in the future. That means my current URL patterns are going to need to change, but I'd rather standardise that now than have to keep making adjustments with each taxonomic rebalance moving forward.
So, with all that said, here's what I landed on. Keep in mind (as I've said above) that this is an evolving architecture and will almost certainly be shifted around in the future. It will also require a certain level of structural change on the website to properly implement, but that's all part of the bigger picture and reasoning behind all this effort in the first place.
Content Types: Article, Journal, Note, Review
- HTML & CSS
- Nuts & Bolts
- Inclusion (a11y)
- Content Design
- User Experience (UX)
- Graphic Design
- Web Design
- Arts & Crafts
- World Building
- It's My Life
- Musical Notes
- Moving Pictures
- People & Places
- Notes from the Editor
- Quite Interesting
- The World Wide Web
- Natural World
- To Boldly Go
- Anthropocenic View
You'll notice that there's an unusual mixture right now of typical label-type categories (e.g. Photography) and more abstract (read: meaningless) ones, such as To Boldly Go. That's deliberate, to a degree. I increasingly found myself wanting to put more of, well, myself into the taxonomy. After all, this is the one website I'll ever work on where my personal UX is actually more valid than anyone else's.
The trend of adopting pop-culture references started with Moving Pictures and spread out from there, initially just as a bit of fun but over time it proved extremely useful in broadening groups. Take Anthropocenic View. This covers every facet of human life and culture, from linguistics through to history through to philosophy. On the one hand that name is gibberish but, on the other, the category makes sense at a personal level as a specific, delineated cluster of knowledge. Just because it doesn't have a strict term in the English language doesn't mean I should scrap it, so I made one of my own.
Currently, I plan to merge Articles, Notes, and Journals all under this schema, with room for it to grow to include Reviews in the future. Oh, and as discussed in the original article, tagging will remain on all entries and will continue to be an arbitrary, unrefined smörgasbord of whatever I feel that specific entry should link to. After all, a degree of taxonomic chaos and unconformity will allow for greater flexibility overall.
Now all that's left is to just, y'know, implement the damn thing. What was that about the whole can of worms again... 🐛