Blog RSS Feed

Archive for March, 2008

Yahoo!, Bibleref, and RDFa

Saturday, March 15th, 2008

Yahoo! last week announced that it’s going to start indexing semantic data, including support for certain microformats.

Bibleref isn’t one of those microformats. Should Bibleref proponents lobby Yahoo! to index Bibleref, or should Bibleref change its syntax to be compatible with RDFa or another semantic web standard?

Background

Earlier, Sim.plified.com noted a Yahoo! semantic-web announcement and mused about the possibility of using the Yahoo! engine for Bibleref.

Then David Peterson, author of a Sitepoint article about the earlier Yahoo! announcement, wrote a comment that encapsulates the chicken/egg problem inherent in getting a new microformat off the ground:

Currently the search engine only indexes 3 microformats (hCard, hCalendar, hReview). So if you started indexing your hBible [i.e., Bibleref] it wouldn’t pick it up.

The problem with microformats is that each time a new one is created the search indexer needs to develop a custom extractor to make sense of your microformat. That is why Yahoo microsearch is only indexing 3 of the most popular format.

So what should Bibleref’s proponents do? It’s possible we could convince Yahoo! to index Bibleref, giving it the traction it needs to take off. However, I wouldn’t necessarily expect Yahoo! to do a good job understanding the data, in part because of the looseness of the standard (which I see as a good thing). And if Yahoo! doesn’t understand it well, then search results based on Bibleref won’t be very high quality. But a lot depends on how Yahoo! exposes the data. (And they may not even want to index Bibleref.)

RDFa

Another possibility is to change Bibleref to be compatible with RDFa, an emerging standard that Yahoo! does understand. The RDFa syntax fits Bibleref well:

  1. <a property="br:ref" href="#">John 3:16-17</a>
  2. <span property="br:ref" content="John 3:16-17">God loves us</span>

Compare to standard Bibleref markup:

  1. <a class="bibleref" href="#">John 3:16-17</a>
  2. <cite class="bibleref" title="John 3:16-17">God loves us</cite>

As you can see, the markup is similar. However, it still has some problems: neither example provides an unambiguous machine-readable representation that would allow an unspecialized search engine like Yahoo! to extract meaning.

I’m not entirely sure whether that’s a problem in either the current or RDFa flavors of Bibleref. Yahoo! plans to release tools to let developers build on semantic search data, so Yahoo!’s inability to understand something may not necessarily matter. (Without enforcing a formal representation, such as an OSIS identifier, one Bible passage can have any number of representations—the “John 3:16” vs. “Jn 3.16” problem.) It’s too early to tell.

The second example, with an explicit content attribute, can provide a formal, machine-readable representation of a Bible reference (like an OSIS identifier). But providing one is a lot more work for software that lets people write documents using Bibleref—it needs to understand references and create such identifiers. But would retaining the looseness of the current Bibleref standard defeat the purpose of using RDF, which thrives on strict interoperability?

Again, I don’t know. A lot depends on the tools and API that Yahoo! is making available.

But the central question is whether Bibleref syntax should move to be compatible with RDFa. The advantage is automatic pickup by new semantic search engines. The disadvantages are the syntax change and increased complexity (including DOCTYPE and namespace changes that I haven’t discussed here). It may be too early to tell.