Blog RSS Feed

Bible Microformats

Sean from Blogos proposes a microformat for marking up Bible references on the web.

About Microformats

Microformats are a way of marking up semantic data in HTML without inventing new elements or attributes. For example, here’s how you might mark up geographic coordinates:

<div class="geo">GEO: <span class="latitude">37.386013</span>, <span class="longitude">-122.082932</span></div>

In this way, computer programs can figure out without any ambiguity that the above sequence of numbers refers to latitude and longitude. Browsers, for example, might automatically link the coordinates to Google Maps or your mapping application of choice. Firefox 3 is evolving along these lines.

Bible Microformat

I’ve been thinking for a while about the best syntax to use for a Bible microformat. The problem I kept running into was in coming up with the One True Representation of a Bible verse (i.e., is it “John 3:16” or “Jn 3:16” or “John.3.16” or something else).

Sean neatly sidesteps the problem with a “good enough” solution. He proposes a format akin to the following:

<abbr class="bibleref" title="John 3:16">Jn 3:16</abbr>

The crucial aspect is that it doesn’t matter exactly how you specify the Bible verse—once you do the hard part of indicating that a string of text is a verse reference (the class="bibleref"), any decent reference parser should be able to figure out which verses you mean. It’s so simple it’s brilliant.

Now let’s push it a little further.

I suggest that the microformat should take advantage of the underused and, in this case, semantically more meaningful <cite> tag rather than the <abbr> tag. You are, after all, citing the Bible.

<cite class="bibleref">John 3:16</cite>

However, you also have to account for people who link the verse to their favorite online Bible. You could double-up the tags:

<a href="…"><cite class="bibleref">John 3:16</cite></a>

But it’s messier than need be. Since the practice of linking is widespread, why not overload the <a> tag with the appropriate class:

<a href="…" class="bibleref">John 3:16</a>

Both <cite> and <a> have a title attribute in which you can place a human- (and machine-) readable version of the verse if you choose. The title is optional as long as the verse reference is the only text inside the tag. Indeed, a title is required only if the element’s text is ambiguous (a verse without a chapter and book, for example, or completely unrelated text). (The practice of not recording duplicate information is the Don’t Repeat Yourself principle.) For example:

<p>God <a href="…" class="bibleref" title="John 3:16">loves</a> us.</p>

Corner Cases

So how would you specify a Bible translation if a specific translation were germane to the citation’s context? (In theory, when you don’t specify a translation, people consuming the microformat could choose to see the passage in the translation of their choice, similar to how some people prefer to look up an address in Google Maps, while others prefer Yahoo Maps.) I’m sympathetic to the OSIS practice of indicating the translation first, followed by the reference. For example:

<cite class="bibleref" title="ESV: John 3:16">Jn 3:16</cite>

This practice follows the logical progression of going from general to specific:

[Implied Language] → Translation → Book → Chapter → Verse

The title is also human-readable, though it departs from the standard practice of placing the translation identifier after the reference.

Sean mentions two other cases of note: verse ranges (e.g., “John 3:16-17”) and compound verses (e.g., “John 3:16, 18, 20”). Personally, I see no reason for a biblerefrange attribute as he suggests. A bible reference parser should be able to handle a continuous range as easily as a single verse.

But compound verses present a more complex problem. How do you mark them up? The above examples all stand on their own, which is one of the principles of microformats—you parse the element and get everything you need. But let’s say you have the text “John 3:16, 18.” Treating the range as a unit is easy:

<cite class="bibleref">John 3:16, 18</cite>

Any parser will handle that text; though it could be ambiguous (do you mean John 3:16 and John 18?), in practice it rarely is. But what if you mark them up separately?

<cite class="bibleref">John 3:16<cite>, <cite class="bibleref">18</cite>

In this case, the “18” doesn’t communicate enough information to the parser. The parser could maintain a state and know that the previous reference was to John 3:16, but state requirements increase the parser’s complexity, which in turn defeats the purpose of the microformat in the first place. In such cases, then, I would argue that a title attribute is necessary:

<cite class="bibleref">John 3:16<cite>, <cite class="bibleref" title="John 3:18">18</cite>

Putting It All Together

Here’s my Bible microformat proposal:

Citing a Bible Verse without Linking to It

<cite class="bibleref">[reference]</cite>

Citing a Bible Verse while Linking to It

<a class="bibleref">[reference]</a>

Citing a Bible Verse Indirectly (or When the Text Is Ambiguous) without Linking to It

<cite class="bibleref" title="[reference]">[any text]</cite>

Citing a Bible Verse Indirectly (or When the Link Text Is Ambiguous) while Linking to It

<a class="bibleref" title="[reference]">[any text]</a>

Verse Reference Format

The [reference] in the above examples refers to a machine-parsable and human-readable representation of a single verse, a range of verses, or a series of verses. You should use unambiguous abbreviations if you use abbreviations. See Appendix C in the OSIS spec (pdf) for a list of possible abbreviations.

When you’re in doubt about whether the reference text is parsable, use the title attribute to encode a fuller representation. In particular, when the reference doesn’t include all the text necessary to produce an unambiguous book/chapter/verse reference, place an unambiguous reference in title.

About the title Attribute

The title attribute, when present, takes precedence over the contents of the element (<cite> or <a>). When the title is not present, the contents of the element are assumed to be the verse reference. The title attribute contains an unambiguous machine-parsable representation of the verse reference.

The attribute can also contain an optional translation identifier at the beginning of the value, followed by a colon. Appendix D in the OSIS spec (pdf) has a list of translation identifiers. For example:

<cite title="ESV: John 3:16">…</cite>

To be comprehensive, you would ideally include a language identifier (e.g., “en:ESV: John 3:16”) before the translation identifier. I would argue that a language identifier is only necessary if you’re using a non-standard abbreviation.

However, you should only include a translation identifier if it is important that your readers see a particular translation or language. Otherwise, you should allow the parsing software to use your readers’ preferred translation and language.

Here is a Perl regex for allowed formats in the title. $1 is the optional language identifier. $2 is the optional translation identifier. $3 is the verse reference, which is deliberately wide-open to accommodate many different reference formats.

title="([\w\-]+:)?([\w\-]+:)?\s*([^"]+)"

6 Responses to “Bible Microformats”

  1. OpenBible.info picked up on my post about a microformat for Scripture references, and very helpfully spelled out a […]

  2. Axel says:

    It’s a good idea! I am working on a bible project and run over a problem that book, chapter, verse is not a unique Identifier. For Example Psalms have different verse counting in different translations and languages. Do you have a unique identifier? (I also posted this at blogos)

  3. openbible says:

    Sean has a good reply to Axel at Blogos.

  4. […] Update: Chris Heard has recommended Bible microformats. You can read about it here: Bible Microformats. […]

  5. Hi,

    Just want to let you know that I read this post and the related ones about 1.5 years ago and they helped me to basically consolidate what I was thinking anyway with using the cite tag. When building the Christian Assemblies website (http://www.cai.org/) I had another class in mind but we opted to go for the “bibleref” class in the name of standardisation. Check the site out for a fairly intensive example of use of the microformat! We have 10 languages and it’s working pretty well for us.

    There’s only one thing where we’ve broken the standard – we fill the title attribute with the full verse text. This is needed so that we can produce the cool pop-ups with the full text when you hover your mouse over a scripture reference. But if you check the code you’ll see that we at least start with the reference.

    Additionally we use the full book names since the site is aimed largely at non-Christians and we want them to understand what we are talking about! We also use [q class=”bibletxt”] to mark up our actual scripture text. Perhaps this will become the next standard?? 🙂

    I think it’s a good microformat.

    Regards,
    John

  6. […] good work has been done on citation of online Bible quotes, including BibleRef and OpenBible.info. There are even some WordPress plugins for BibleRef. Bibleref is a simple approach to automatically […]