Book review: Structured Writing by Mark Baker

12 November 2018

Structured Writing when I was a few pages from the end
Structured Writing when I was a few pages from the end.

This is a long review of Mark Baker’s new book Structured Writing, Rhetoric and Process (XML Press). Baker is also the author of a popular book, Every Page Is Page One (“EPPO”) from 2013, which has been influential in the tech comm world, at least among many tech writers I know of.

At almost twice the length of EPPO, Structured Writing is a big book (472 pages before you get to the backmatter). As far as I know, no other book covers the subject in this depth. It’s remarkable that a book this long could be written about structured writing. But it validly covers many real things. And at one point this book was the number 1 New Release on Amazon in the Technical Writing Reference category.

Baker is an opinion leader in the field of content (especially tech comm and the tools side of it) and I personally have found his ideas to be interesting and informative over the last few years, even if I did not agree with everything he said. Since this is right up my alley, I immediately bought the paper version of the book even though I usually wait for the ebook. (The ebook is now on Amazon, and it’s also on Safari Books Online if you subscribe to that.)

The intended audience for this book is a little uncertain. It does not say who it’s for, unlike EPPO. My opinion is that it would be good for information architects and doc tools people (“content engineers”) like me. A technical writer without a lot of interest in the subject might not find it very useful, unlike EPPO. But anyone with an interest in structured authoring could get something out of it.

This is not a beginner’s book, and it’s not a tutorial on using structured authoring tools. It’s more like an extended treatise that gives you a framework for thinking about why and how to implement structured authoring. Baker is smart and experienced in this area and he goes about his job in an organized and thorough manner.

Benefits of Structure

I’m a proponent of structure in content. Structure makes content much easier to process programmatically, which makes the content much more flexible than it would be without the structure. Even simple structure like a bit of metadata at the beginning of a topic describing its subject category can make it a lot easier to do something with it.

Today, lots of different people consider structured content to be a good thing. I have heard the following groups talk about it:

  • Tech writers
  • Web developers
  • Other kinds of programmers (like Jeff Eaton)
  • Content strategists

These different groups often use entirely different tools and processes, and they may not even be aware of each other. To me this indicates there is wide value in structured content. By the way I have rarely heard marketing people talking about structure, even though they could use it also.

Baker covers these aspects of structure and adds something I had not really thought much about: rhetoric. He says rhetoric can be improved as a function of structured writing. He gives a lot of importance to maintaining the quality of rhetoric. His definition of rhetoric:

“…rhetoric is figuring out what to say and how to say it to persuade, inform, entertain, or enable the reader to act.”

I read Structured Writing with tech comm content in mind because that’s where I do most of my work, and so his use of the word rhetoric is odd to me. I don’t hear people calling user guides rhetoric. But he includes “to inform” and “enable to act” in his definition, so I can work with that for my purposes. And the book is written generically enough to be useful for marketing and other types of content.

Baker thinks quality rhetoric is a business asset, and this is one of those things that nearly everyone would profess to agree with. But it seems that the quality of the rhetoric almost always takes a back seat to things like saving visible money. This is probably because it’s easy to quantify that you saved some money by implementing process XYZ or tool ABC. The effects of good rhetoric are not as easy to quantify. You will have to come up with your own measurement of the benefits of good rhetoric.

This book provides a lot of raw material for constructing your ROI arguments. Baker does not provide any easy spreadsheet method for justifying a focus on the subject domain on a content project, acknowledging that it is a complex calculation. But the book makes an extended, sensible argument for just that.

Interlocking Structure, Process, and Rhetoric

He lays out clearly and at great length how structure, process (programming, publishing, search, administrative, planning) and rhetoric are all interlocking. Each one has an effect on the others, and it’s usually a complex situation. You can move the complexity around but you can’t get rid of all of it. The best ways to partition this complexity are covered again and again in the book.

Difficulties of Structure

The eternal conflict of structure versus easy editing
The eternal conflict.

Getting the structure into the content can be difficult. The human side of it is probably the hardest. People want the good things that structure can bring, but often they don’t want to do the necessary things to add the structure that would enable it. One of the things they don’t want to do is use some hideous XML editor, or deal with XML files. Baker doesn’t get into this aspect of it very much.

XML editor with content
XML editor – many people run screaming from editors like this (apologies to Oxygen).

If you have technical content that has some repeatability, you should be able to use ideas in this book to add structure that will make your content production process more effective, and at the same time keep the quality of the rhetoric high, and also make things easier for your writers. By high quality rhetoric I mean content that informs the reader easily because it’s easy to find and has consistent presentation, enabling the reader to find and absorb the information with the least amount of effort.

To make adding structure worth the trouble (and it can be a lot of trouble), the content must have aspects that are repeatable. Baker uses the example of a recipe (extensively), which has components that are generally expected by people who use recipes. These repeatable components can become part of the structure of your recipes, and this structure can have several beneficial effects: The writer has less to remember, the reader gets what they need, and the recipe is easier to use programmatically.

Standards

Baker doesn’t mention standards for structured writing as such. He often mentions markup that has gone through a standards body like DocBook and DITA, but he does not say if he sees any value of having a standard of that nature.

Baker’s own structured authoring tools, SAM and SPFE, are in public GitHub repos. If they catch on and become popular, I’m interested to see what might happen from a standards perspective.

The Big Idea

Here I will attempt to summarize the main multifaceted idea of the book in its ideal form. But first let me define a couple of Baker’s terms in a simplified manner.

media domain – This would be writing where you are also formatting the content as you write. Example: Microsoft Word in its usual usage.

document domain – Writing without having to specify formatting, while identifying structures like lists, headings and steps. Example: DocBook.

subject domain – This is the ideal according to Baker’s ideas. It’s writing without specifying formatting, and also without specifying structures like lists and tables. You are recording raw information about your subject, aided in your writing with structures that apply to the subject you are writing about. So if you are writing a recipe, you might have fields called ingredients and quantity. Of course, this needs to be set up for the writer, based on the subject being dealt with.  Example: SAM (Baker’s markup language, slightly similar to Markdown but much more flexible)

Here is my version the ideal, according to the book:

Writers should have subject domain editing set up for them to keep their tool distractions to a minimum (this has to be done by someone who knows what they’re doing). SAM is the best for this because it can be fully in the subject domain. Subject domain content is the easiest to automate and process with algorithms. It’s the easiest to search for as well. So the rest of the content system (publishing, search, etc.) would be set up to take full advantage of it.

That’s oversimplified but I hope it gives you an idea. Of course, this is the ideal form and Baker goes to great lengths to describe how you might implement his ideas inside the context of many other systems, so he acknowledges the real world.

So what content systems do meet Baker’s detailed specifications? Well, none that I can see. That’s why he had to make his own (SAM and SPFE).

He puts these things together: rhetoric, authoring without extraneous technical distractions, publishing algorithms, search algorithms. You have to analyze your situation and judge the best balance between author experience, automatability, and rhetorical quality. He calls this “partitioning complexity.” You move the complexity to the people who have the skills and time to handle that piece of the complexity.

Looking for Trouble

Here are some things that Baker is up against with his ideas about structured writing:

  • CMS vendors
  • Existing mindshare
  • Existing popular tools and markup languages
  • The Microsoft Word UI effect

I mention MS Word because it appears that Baker wants writers to use his subject domain markup language, SAM, in plain text form, similar to the way you’d write with Markdown in plain text. This would be fine with me but to many, many people, the immutable expectation is that editing software will act like MS Word. It seems crazy to me but no one has found a way around it yet. Maybe it’s with us forever.

AI and Content

Baker writes only one footnote about artificial intelligence in relation to structured writing, but I wanted to mention it. With all the hype about AI, you’d think that a proper machine learning setup could figure out an unstructured text blob and give you everything that a laboriously structured text could give. But that’s apparently not the case, at least not yet or anytime soon. In support of this, one of the acknowledged leaders in AI, Google, still needs you to add certain structured info to your web pages to help them know more about it, so they can add features like answer boxes, definition boxes, etc., and improve SEO. That said, I know of a CMS vendor who is currently touting AI and content with their systems.

Even when AI is more fully developed, more structure should still be better than less structure.

DITA Note

Baker has has a reputation as a critic of DITA, but this book is not about DITA. It mentions DITA many times but is fairly evenhanded about it. He even says a few things that seem complimentary of DITA. Personally, I see so many parallels between what Mark Baker promotes and DITA, an argument does not seem productive. A love fest seems more appropriate. This book could help me improve a DITA system.

A Few Nits

  • The book really needs a glossary for Baker’s many new terms. A few examples of his terms: document domain, subject domain, functional lucidity, differential single sourcing, and others. To his credit, he uses his terms consistently throughout the book.
  • I was hoping for a really beefy example of a complex technical document that implements his structured writing ideas. Like a setup guide for an enterprise database.
  • No bibliography, few references to outside works, but one might be useful.
  • I read the paperback and the page headers really need to have the chapter number, because in the text he often refers to “Chapter 19,” “Chapter 14,” etc.

A Few More Points of Note

  • Chapter 13, Reuse, is a good detailed treatment of this subject.
  • Chapter 15, Extract, is an excellent discussion of extracting information. It might be a description of the future of docs.
  • Chapter 18, Linking, had several interesting ideas. Generic markup of “subject affinities” might give more flexibility in linking, if you come up with tools. I had never heard of this kind of linking before. I don’t know if the idea is original with Baker but I plan to experiment with it.
  • Interesting discussion of the markup idea of “mixed content” in Chapter 22, Blocks, Fragments, Paragraphs, and Phrases.

Structure Nirvana

Mark Baker’s Structured Writing may not get us all the way to Structure Nirvana, but it should help.

structure nirvana
Goal: Structure Nirvana that pays for itself.

Related Books

If you’re interested in Structured Writing, here are some other books you might find useful.

  • Designing Connected Content: Plan and Model Digital Products for Today and Tomorrow by Carrie Hane, Mike Atherton
  • Author Experience by Rick Yagodich

Welcome to Bizarre Jumble of Tools

This blog will cover topics related to documentation systems based on DITA, XML, Markdown and various other things. It is written by a certified doc tools geek.

I hope to make the details of modern documentation systems a little less bizarre and jumbled.

The name of this blog comes from an old email list of tech writers, several years ago. There was a discussion about a doc system that someone had developed for his client. It involved DocBook XML, some kind of source control program, the XMLMind XML editor, and a few other pieces. I thought it sounded like a lovely system, but some other person was not impressed and commented that he wanted a nice clean unified system and not a “bizarre jumble of tools.”

I’m happy I finally found a home for that phrase!