2010 « OpenBible.info Blog

Archive for 2010

Quantifying Traditional vs. Contemporary Language in English Bibles Using Google NGram Data

Monday, December 27th, 2010

Using data from Google’s new ngram corpus, here’s how English Bible translations compare in their use of traditional vs. contemporary vocabulary:

Relative Traditional vs. Contemporary Language in English Bible Translations
* Partial Bible (New Testament except for The Voice, which only has the Gospel of John). The colors represent somewhat arbitrary groups.

Here’s similar data with the most recent publication year (since 1970) as the x-axis:

Relative Traditional vs. Contemporary Language in English Bible Translations by Publication Year

Discussion

The result accords well with my expectations of translations. It generally follows the “word for word/thought for thought” continuum often used to categorize translations, suggesting that word-for-word, functionally equivalent translations tend toward traditional language, while thought-for-thought, dynamic-equivalent translations sometimes find replacements for traditional words. For reference, here’s how Bible publisher Zondervan categorizes translations along that continuum:

A word-for-word to thought-for-thought continuum lists about twenty English translations, from an interlinear to The Message.

I’m not sure what to make of the curious NLT grouping in the first chart above: the five translations are more similar than any others. In particular, I’d expect the new Common English Bible to be more contemporary–perhaps it will become so once the Old Testament is available and it’s more comparable to other translations.

In the chart with publication years, notice how no one tries to occupy the same space as the NIV for twenty years until the HCSB comes along.

The World English Bible appears where it does largely because it uses “Yahweh” instead of “LORD.” If you ignore that word, the WEB shows up between the Amplified and the NASB. (The word Yahweh has become more popular recently.) Similarly, the New Jerusalem Bible would appear between the HCSB and the NET for the same reason.

The more contemporary versions often use contractions (e.g., you’ll), which pulls their score considerably toward the contemporary side.

Religious words (“God,” “Jesus”) pull translations to the traditional side, since a greater percentage of books in the past dealt with religious subjects. A religious text such as the Bible therefore naturally tends toward older language.

If you’re looking for translations largely free from copyright restrictions, most of the KJV-grouped translations are public domain. The Lexham English Bible and the World English Bible are available in the ESV/NASB group. The NET Bible is available in the NIV group. Interestingly, all the more contemporary-style translations are under standard copyright; I don’t know of a project to produce an open thought-for-thought translation–maybe because there’s more room for disagreement in such a project?

Not included in the above chart is the LOLCat Bible, a non-academic attempt to translate the Bible into LOLspeak. If charted, it appears well to the contemporary side of The Message:

Methodology

I downloaded the English 1-gram corpus from Google, normalized the words (stripping combining characters and making them case insensitive), and inserted the five million or so unique words into a database table. I combined individual years into decades to lower the row count. Next, I ran a percentage-wise comparison (similar to what Google’s ngram viewer does) for each word to determine when they were most popular.

Then, I created word counts for a variety of translations, dropped stopwords, and multiplied the counts by the above ngram percentages to arrive at a median year for each translation.

The year scale (x-axis on the first chart, y-axis on the second) runs from 1838 to 1878, largely, as mentioned before, because Bibles use religious language. Even the LOLCat Bible dates to 1921 because it uses words (e.g., “ceiling cat”) that don’t particularly tie it to the present.

Caveats

The data doesn’t present a complete picture of a translation’s suitability for a particular audience or overall readability. For example, it doesn’t take into account word order (“fear not” vs. “do not fear”). (I wanted to use Google’s two- or three-gram data to see what differences they make, but as of this writing, Google hasn’t finished uploading them.)

I work for Zondervan, which publishes the NIV family of Bibles, but the work here is my own and I don’t speak for them.

Posted in Bible, Linguistics, Visualizations | 3 Comments »

Evaluating Bible Reading Levels with Google

Saturday, December 11th, 2010

Google recently introduced a “Reading Level” feature on their Advanced Search page that allows you to see the distribution of reading levels for a query.

If we constrain a search to Bible Gateway and restrict URLs to individual translations, we get a decent picture of how English translations stack up in terms of reading levels:

According to this methodology, the Amplified Bible is the hardest to read (probably because its nature is to have long sentences), and the NIrV is the easiest.

Caveats abound:

URLs don’t have a 1:1 correspondence to passages, so some passages get counted twice while others don’t get counted at all.
Google doesn’t publish its criteria for what constitutes different reading levels.
These numbers are probably best thought of in relative, rather than absolute, terms.
Searching translation-specific websites yields different numbers. For example, constraining the search to esvonline.org results in 57% Basic / 42% Intermediate results for the ESV, massively different from the 18% Basic / 80% Intermediate results above.

Download the raw spreadsheet if you’re interested in exploring more.

Posted in Bible | 1 Comment »

Venn Diagram of Google Bible Searches

Monday, October 25th, 2010

Technomancy.org just released a Google Suggest Venn Diagram Generator, where you enter a phrase and three ways to finish it: for example, “(Bible, New Testament, Old Testament) verses on….” It then creates a Venn diagram showing you how Google autocompletes the phrase and where the suggestions overlap.

The below diagram shows the result for “(Bible, New Testament, Old Testament) verses on….” The overlapping words–faith, hope, love, forgiveness, prayer–present a decent (though incomplete) summary of Christianity.

Posted in Collective Intelligence, Topics, Visualizations | Comments Off on Venn Diagram of Google Bible Searches

Procedurally Generating Archaeological Sites

Tuesday, October 12th, 2010

Walking around an archaeological site–whether an active dig or excavated ruins–makes you wonder what it would be like to see the site in its glory days. Existing computer tools make it possible to model small-scale sites virtually (a building, perhaps), but anything larger than a city block would take a long time to create. Even a small city is beyond the capabilities of any but the most dedicated team.

One solution is procedural generation, where a human designer lays down a few rules–a basic city plan, for example–and a computer fills in the rest according to those rules. The result is a complete rendering of a city filled with buildings that plausibly inhabit the space, with a human only having to set up the initial parameters. Consider this reconstruction of Pompeii:

The creators of this video started with street plans and a variety of historically correct architectural models. A computer then generated buildings that fit the excavated ruins, resulting in a city that you can tour virtually. While it undoubtedly has inaccuracies, the result is compelling.

Pompeii is better-preserved than most ancient cities, but you can apply a similar technique to any archaeological site. Archaeologists have partially excavated many biblical cities; they know at least some of the city’s layout. Even if they don’t know the whole thing, they can guess at what features a city of a given size needs; by starting with what archaeologists know, a computer can extrapolate a plausible street plan for the rest of the city. (I suppose that you could run the simulation many times and generate a probability of where a certain building–such as a synagogue–is likely to be.)

These projects don’t often generate interior spaces or simulate objects like furniture, both of which dramatically increase the complexity of the simulation for only a modest benefit. But there’s no reason why we couldn’t model interior spaces. A forthcoming game called Subversion, for example, uses procedural generation on both macro and micro scales: it generates both complete cityscapes and architectural floorplans of the buildings that it creates.

A screenshot from Subversion shows a building's procedurally generated floorplan.

Recreating interiors for ancient houses is fairly straightforward: floorplans weren’t nearly as complicated as they are today. Imagine walking around ancient Capernaum, for example, and visiting the house where people lowered a paralytic through the roof. Architecture plays a crucial role in the story, a role that a virtual-reality model would help illuminate.

Bible Geography in Tableau

Saturday, September 25th, 2010

Robert Rouse combines Bible geo data with Tableau to produce an interactive map that uses size to emphasize important places and allows you to filter the data by book:

Bible Places

He ‘s also written a blog post that goes into detail about the map. The blog is about technology and the Bible, so take a look at his other posts if those topics interest you.

Posted in Geo | Comments Off on Bible Geography in Tableau

Visualizing Pericope Similarity in the New Testament

Monday, September 13th, 2010

This diagram plots the similarity of pericopes (sections) in the New Testament based on their linguistic similarity in Greek:

Blue = Gospels, Purple = Acts, Green = Paul’s Epistles, Red = General Epistles, Gray = Revelation

If you don’t have Silverlight installed (or are reading this post via RSS–I suggest you click through to the original post), here’s a thumbnail:

Download the full-size PDF (300KB) or PNG (22 MB, 12,000 pixels wide).

Do we actually learn anything from this kind of diagram? The most interesting part to me is how the gospels on the right flow primarily through the Gospel of John to the epistles on the left. I wonder why that is.

Methodology

I calculated the cosine similarity between the full text of the pericopes using the Greek lemmas (after removing about forty stopwords). The pericope titles come from the ESV. I produced the diagram with Cytoscape. The widget at the top of the post comes from zoom.it, Microsoft’s Deep-Zoom-as-a-Service.

Bill Mounce’s excellent free New Testament Greek dictionary served as the source of the lemmas.

Posted in Linguistics, Visualizations | Comments Off on Visualizing Pericope Similarity in the New Testament

Apologetics and Anti-Apologetics Apps

Friday, July 2nd, 2010

The New York Times discusses the latest iPhone trend: apps that provide talking points for the existence of God, pro and con.

In a dozen new phone applications, whether faith-based or faith-bashing, the prospective debater is given a primer on the basic rules of engagement — how to parry the circular argument, the false dichotomy, the ad hominem attack, the straw man — and then coached on all the likely flashpoints of contention….

Users can scroll from topic to topic to prepare themselves or, in the heat of a dispute, search for the point at hand — and the perfect retort.

I expect that, eventually, we’ll just let our phones argue the basic questions of existence among themselves, and then they’ll let us know what they’ve decided.

A screenshot from the LifeWay Fast Facts app

Posted in Technology | Comments Off on Apologetics and Anti-Apologetics Apps

Automatically Deciphering Ugaritic

Thursday, July 1st, 2010

Let’s say you have text in front of you written in a language you don’t read–worse, no one has understood the language for thousands of years. How can you begin to understand it? MIT researchers have created a way to for a computer to decipher Ugaritic in a few hours without knowing anything about the language besides its general similarity to Hebrew.

One of the researchers says:

The decipherment of Ugaritic [in the mid-twentieth century] took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic. “The output of our system would have made the process orders of magnitude shorter.”

I love that someone felt the need to write “axe” on an axe. Maybe it was the original brand name.

I’m most curious whether the program can help crack the Minoan language on Linear A.

Incidentally, the article What’s Ugaritic Got to Do with Anything? explains why Ugaritic is important to understanding the Hebrew of the Old Testament.

Via.

Posted in Linguistics | Comments Off on Automatically Deciphering Ugaritic

Church Names in the U.S.

Saturday, June 5th, 2010

If you were to name a church, how would you go about it? Let’s see how people have named churches in the U.S. by looking at a random sample of 300,000 church names gathered using the Yahoo! Local Search API.

The Word “Church”

You might start with the word “church” in your name. About 2/3 of churches in the dataset have the word “church.” (The number of churches with the word “church” in reality is higher, as the dataset often truncates names.) Other popular nouns: center, fellowship, chapel, assembly, ministries (or ministry), temple, tabernacle, and iglesia.

Most Common Words in Church Names

Here’s a wordle of the most common words that appear in church names (excluding the word “church”):

Most Common Words in Church Names, Excluding Denominations

Denominations overpower the raw list. Here’s a wordle of the same data without denomination names:

Full Church Names

Here are the most common church names in the U.S. Also download the top 1,000 church names in the U.S. (29 KB Excel document), covering 95,000 churches, if you want more information.

Church Name	Churches
first baptist church	5,115
church of christ	2,854
first united methodist church	2,149
first presbyterian church	1,960
united methodist church	1,488
seventh-day adventist church	1,478
first christian church	1,309
calvary baptist church	1,197
church of the nazarene	915
trinity lutheran church	892
salvation army	867
first assembly of god	744
church of god	677
faith baptist church	663
st john’s lutheran church	601
grace baptist church	600
first congregational church	575
assembly of god church	565
new hope baptist church	540
zion lutheran church	523

Saints

The word “St.” (Saint) is common in more traditional churches, such as Catholic, Episcopalian, and Lutheran. Here are the most popular saint names used for churches:

Saint	Churches
John	3,713
Paul	3,210
Mary	1,832
Peter	1,362
James	1,270
Joseph	1,153
Mark	1,062
Luke	1,053
Andrew	789
Matthew	724
Stephen	582
Michael	532
Francis	530
Thomas	511
Patrick	431
Anthony	381
George	329
Ann(e)	282
Nicholas	253
Elizabeth	220

Mountains

Mountains and hills are common names for churches. I suspect that “Spring Hill” and the assorted “Pleasant”s come from the city names where the churches stand.

Mountain	Churches
mt zion	1,587
mt olive(t)	1,122
mt calvary	619
mt carmel	528
mt pleasant	475
pleasant hill	394
mt moriah	329
mt sinai	316
zion hill	200
mt pisgah	191
mt nebo	139
mt tabor	136
mt pilgrim	112
mt hope	108
mt gilead	95
mt bethel	90
spring hill	90
mt hermon	85
mt lebanon	74
mars hill	59

Denominations

Here are the most-common two-word phrases (including denomination names) used in some of the more popular denominations. “First” appears in 12% of Baptist church names, 10% of Methodist church names, only 3% of Lutheran church names, and fully 21% of Presbyterian church names.

Baptist	Methodist	Lutheran	Presbyterian
first baptist	united methodist	trinity lutheran	first presbyterian
missionary baptist	first united	st john’s	united presbyterian
freewill baptist	chapel united	st paul’s	cumberland presbyterian
grove baptist	free methodist	evangelical lutheran	korean presbyterian
calvary baptist	trinity united	zion lutheran	westminster presbyterian
hill baptist	memorial united	grace lutheran	covenant presbyterian
zion baptist	st paul	our savior’s	community presbyterian
hope baptist	grace united	faith lutheran	memorial presbyterian
creek baptist	grove united	immanuel lutheran	trinity presbyterian
faith baptist	wesley united	christ lutheran	orthodox presbyterian
mt zion	zion united	peace lutheran	reformed presbyterian
bethel baptist	hill united	redeemer lutheran	grace presbyterian
new hope	christ united	first lutheran	hill presbyterian
grace baptist	bethel united	good shepherd	faith presbyterian
bible baptist	faith united	hope lutheran	hope presbyterian
chapel baptist	hope united	calvary lutheran	christ presbyterian
community baptist	street united	bethlehem lutheran	central presbyterian
mt olive	asbury united	st peter’s	valley presbyterian
memorial baptist	park united	bethany lutheran	creek presbyterian
southern baptist	salem united	messiah lutheran	park presbyterian

Posted in Churches | 5 Comments »

Most Highlighted Bible Passages on Kindle

Thursday, May 20th, 2010

Ray Fowler analyzes Amazon’s Kindle data to find the most-highlighted Bible passages on Kindle.

Ten of the fourteen verses overlap the thirty most popular verses on Twitter, a higher percentage than I expected.

Posted in Collective Intelligence | Comments Off on Most Highlighted Bible Passages on Kindle

Blog

Archive for 2010

Quantifying Traditional vs. Contemporary Language in English Bibles Using Google NGram Data

Discussion

Methodology

Caveats

Evaluating Bible Reading Levels with Google

Venn Diagram of Google Bible Searches

Procedurally Generating Archaeological Sites

Further Reading

Bible Geography in Tableau

Visualizing Pericope Similarity in the New Testament

Methodology

Apologetics and Anti-Apologetics Apps

Automatically Deciphering Ugaritic

Church Names in the U.S.

The Word “Church”

Most Common Words in Church Names

Most Common Words in Church Names, Excluding Denominations

Full Church Names

Saints

Mountains

Denominations

Most Highlighted Bible Passages on Kindle

Navigate

Archives

Categories