Blog RSS Feed

Archive for the ‘Visualizations’ Category

Quantifying Traditional vs. Contemporary Language in English Bibles Using Google NGram Data

Monday, December 27th, 2010

Using data from Google’s new ngram corpus, here’s how English Bible translations compare in their use of traditional vs. contemporary vocabulary:

Relative Traditional vs. Contemporary Language in English Bible Translations
* Partial Bible (New Testament except for The Voice, which only has the Gospel of John). The colors represent somewhat arbitrary groups.

Here’s similar data with the most recent publication year (since 1970) as the x-axis:

Relative Traditional vs. Contemporary Language in English Bible Translations by Publication Year

Discussion

The result accords well with my expectations of translations. It generally follows the “word for word/thought for thought” continuum often used to categorize translations, suggesting that word-for-word, functionally equivalent translations tend toward traditional language, while thought-for-thought, dynamic-equivalent translations sometimes find replacements for traditional words. For reference, here’s how Bible publisher Zondervan categorizes translations along that continuum:

A word-for-word to thought-for-thought continuum lists about twenty English translations, from an interlinear to The Message.

I’m not sure what to make of the curious NLT grouping in the first chart above: the five translations are more similar than any others. In particular, I’d expect the new Common English Bible to be more contemporary–perhaps it will become so once the Old Testament is available and it’s more comparable to other translations.

In the chart with publication years, notice how no one tries to occupy the same space as the NIV for twenty years until the HCSB comes along.

The World English Bible appears where it does largely because it uses “Yahweh” instead of “LORD.” If you ignore that word, the WEB shows up between the Amplified and the NASB. (The word Yahweh has become more popular recently.) Similarly, the New Jerusalem Bible would appear between the HCSB and the NET for the same reason.

The more contemporary versions often use contractions (e.g., you’ll), which pulls their score considerably toward the contemporary side.

Religious words (“God,” “Jesus”) pull translations to the traditional side, since a greater percentage of books in the past dealt with religious subjects. A religious text such as the Bible therefore naturally tends toward older language.

If you’re looking for translations largely free from copyright restrictions, most of the KJV-grouped translations are public domain. The Lexham English Bible and the World English Bible are available in the ESV/NASB group. The NET Bible is available in the NIV group. Interestingly, all the more contemporary-style translations are under standard copyright; I don’t know of a project to produce an open thought-for-thought translation–maybe because there’s more room for disagreement in such a project?

Not included in the above chart is the LOLCat Bible, a non-academic attempt to translate the Bible into LOLspeak. If charted, it appears well to the contemporary side of The Message:

The KJV is on the far left, The Message is in the middle, and the LOLCat Bible is on the far right.

Methodology

I downloaded the English 1-gram corpus from Google, normalized the words (stripping combining characters and making them case insensitive), and inserted the five million or so unique words into a database table. I combined individual years into decades to lower the row count. Next, I ran a percentage-wise comparison (similar to what Google’s ngram viewer does) for each word to determine when they were most popular.

Then, I created word counts for a variety of translations, dropped stopwords, and multiplied the counts by the above ngram percentages to arrive at a median year for each translation.

The year scale (x-axis on the first chart, y-axis on the second) runs from 1838 to 1878, largely, as mentioned before, because Bibles use religious language. Even the LOLCat Bible dates to 1921 because it uses words (e.g., “ceiling cat”) that don’t particularly tie it to the present.

Caveats

The data doesn’t present a complete picture of a translation’s suitability for a particular audience or overall readability. For example, it doesn’t take into account word order (“fear not” vs. “do not fear”). (I wanted to use Google’s two- or three-gram data to see what differences they make, but as of this writing, Google hasn’t finished uploading them.)

I work for Zondervan, which publishes the NIV family of Bibles, but the work here is my own and I don’t speak for them.

Venn Diagram of Google Bible Searches

Monday, October 25th, 2010

Technomancy.org just released a Google Suggest Venn Diagram Generator, where you enter a phrase and three ways to finish it: for example, “(Bible, New Testament, Old Testament) verses on….” It then creates a Venn diagram showing you how Google autocompletes the phrase and where the suggestions overlap.

The below diagram shows the result for “(Bible, New Testament, Old Testament) verses on….” The overlapping words–faith, hope, love, forgiveness, prayer–present a decent (though incomplete) summary of Christianity.

A Venn diagram shows completions for (X Verses on...): Bible (courage, death, friendship, patience), New Testament (divorce, homosexuality, justice, tithing), Old Testament (Jesus), NT + Bible (hope, strength), OT + Bible (faith), OT + NT (marriage), and all three (forgiveness, love, prayer).

Visualizing Pericope Similarity in the New Testament

Monday, September 13th, 2010

This diagram plots the similarity of pericopes (sections) in the New Testament based on their linguistic similarity in Greek:

Blue = Gospels, Purple = Acts, Green = Paul’s Epistles, Red = General Epistles, Gray = Revelation

If you don’t have Silverlight installed (or are reading this post via RSS–I suggest you click through to the original post), here’s a thumbnail:

Pericope similarity in the New Testament (thumbnail).

Download the full-size PDF (300KB) or PNG (22 MB, 12,000 pixels wide).

Do we actually learn anything from this kind of diagram? The most interesting part to me is how the gospels on the right flow primarily through the Gospel of John to the epistles on the left. I wonder why that is.

Methodology

I calculated the cosine similarity between the full text of the pericopes using the Greek lemmas (after removing about forty stopwords). The pericope titles come from the ESV. I produced the diagram with Cytoscape. The widget at the top of the post comes from zoom.it, Microsoft’s Deep-Zoom-as-a-Service.

Bill Mounce’s excellent free New Testament Greek dictionary served as the source of the lemmas.

Bible Cross References Visualization

Friday, April 16th, 2010

Here’s a visualization of 340,000 Bible cross references:

Visualization of Bible cross references.
Larger version (2,000 x 1,600 pixels).

Does anything strike you as intriguing? A few trends jump out at me:

  1. The frequency of dense New Testament streaks in the Old Testament, especially in Leviticus and Deuteronomy; I didn’t expect to see them there.
  2. The loops in Samuel / Kings / Chronicles and in the Gospels indicating parallel stories.
  3. The sudden increased density of New Testament references in Psalms through Isaiah.
  4. The eschatological references in Isaiah and Daniel.
  5. The density of references from the Minor Prophets back to both the Major Prophets and earlier in the Old Testament.
  6. The surprising density of cross references in Hebrew-Jude.
  7. The asymmetry. If verse A cites verse B, verse B doesn’t necessarily cite verse A. I wonder if I should make the data symmetrical.

You can also download the full-size image (10,000 x 8,000 pixels, 75 MB PNG). It’s a very large image that could crash your browser. If you want it, I strongly recommend that you save it to your computer rather than trying to open it in your browser.

This visualization uses data from the Bible Cross References project. I used PHP’s GD library to create the graphic.

Inspired by Chris Harrison and Christian Swinehart’s wonderful Choose Your Own Adventure work.

Presentation on Tweeting the Bible

Friday, March 26th, 2010

Here’s a presentation I just gave at the BibleTech 2010 conference about how people tweet the Bible:

Also: PowerPoint, PDF.

I distributed the following handout at the presentation, showing the popularity of Bible chapters and verses cited on Twitter. It displays a lot of data: darker chapters are more popular, the number in the middle of each box is the most popular verse in the chapter, and sparklines in each box show the distribution of the popularity in each chapter. (Genesis 1:1 is by far the most popular verse in Genesis 1, while Genesis 3:15 is only a little more popular than other verses in the chapter.)

The grid shows the popularity of chapters and verses in the Bible as cited on Twitter.

Delving into Lent Data

Sunday, March 7th, 2010

Let’s look a little more at some of the data on what Twitterers are giving up for Lent.

Categories of Things Given up by Location

As I only track in English what people are giving up, there are concentrations in English-speaking countries.

Categories by Country
Size indicates the relative number of Twitterers in each country giving up something for Lent.

Categories by Location

Categories of Things Given up by State

These visualizations show the differences (or lack thereof) in what people are giving up among U.S. states.

Categories by State
Size indicates the relative number of Twitterers in each state giving up something for Lent. Sorry, Alaska and Hawaii.

Categories by State (%)
The composition of each state’s categories of tweets shows mostly minor variations among states. Some states (like Wyoming on the far right) have small numbers of tweets. I would have liked to use opacity or width to indicate this disparity but couldn’t figure out how to do it.

Comparison between 2009 and 2010

This treemap shows how the data changed between 2009 and 2010. The size of the box shows the number of people giving up each category and thing, while color indicates the percentage change from last year: dark blue indicates the steepest drop; dark orange indicates the steepest rise. The second chart shows the same data more conventionally expressed.

Categories and Terms: Term Changes: 2009-2010

Categories and Terms: Term Changes: 2009-2010

About the Visualizations

I created these charts mostly to explore how the new data-analysis software Tableau Public works. One of its claims to fame is that you can publish interactive visualizations to the web, a feature I didn’t take advantage of here. Tableau doesn’t do treemaps, so I used Many Eyes to create the treemap; the closest Tableau equivalent appears below the treemap.

What Twitterers Are Giving up for Lent (2010 Edition)

Tuesday, February 23rd, 2010

The top 100 things that Twitterers are giving up for Lent in 2010.

Snow makes the list this year, understandable given the Snowpocalypse and Snowmageddon that gripped much of the Eastern U.S. in the weeks preceding Ash Wednesday. IPods also made the list after the Bishop of Liverpool asked people to consider praying instead of listening to them. This year a celebrity, Justin Bieber, cracks the top 100. He beat out the Jonas Brothers, 64 votes to 11; draw your own conclusions.

The list largely tracks last year’s list. It draws from 40,000 tweets retrieved February 14-20, 2010.

Complete List of the Top 100

Rank Word Count Change from last year’s rank
1. Twitter 2089 +1
2. Facebook 1874 -1
3. Chocolate 1323 0
4. Alcohol 1258 +1
5. Swearing 1158 +5
6. Soda 1126 0
7. Lent 792 -3
8. Meat 720 0
9. Sex 701 +7
10. Fast food 695 +7
11. Sweets 627 0
12. Coffee 445 -5
13. iPod 437  
14. Candy 325 +18
15. Religion 305 -6
16. Catholicism 264 -4
17. Smoking 254 +5
18. Junk food 251 +34
19. Giving up things 241 -6
20. Beer 241 -5
21. Chips 234 +24
22. You 233 +13
23. Stuff 217 -3
24. Fried food 199 +33
25. Red meat 193 +19
26. Bread 187 +13
27. Sugar 183 -8
28. Work 176 -14
29. Shopping 174 +11
30. Food 162 -7
31. Shame 150  
32. Social networking 147 -2
33. Caffeine 136 -6
34. Rice 136 +44
35. Procrastination 127 -11
36. Internet 126 -11
37. Cheese 120 +1
38. Coke 120 +41
39. Starbucks 119 +14
40. School 118 +36
41. Ice cream 118 +13
42. Booze 117 -21
43. Texting 114 +28
44. Masturbation 111  
45. Cookies 110 +11
46. TV 97 -18
47. Christianity 96 0
48. Snow 96  
49. Wine 92 -13
50. Pizza 91 +12
51. MySpace 91 +4
52. Men 90 +31
53. Giving up 89 -19
54. Sobriety 89 -13
55. Liquor 87  
56. Desserts 87  
57. Lint 87 -20
58. Pancakes 82 -29
59. Homework 81 +28
60. Marijuana 80  
61. Diet Coke 80 -28
62. Hope 78 +15
63. Virginity 76  
64. French fries 75 -15
65. Laziness 71 +5
66. Boys 67  
67. Nothing 67 -19
68. Carbs 66 -4
69. Justin Bieber 64  
70. Pork 64  
71. Porn 63 +9
72. Me 62 0
73. Sleep 61 -42
74. Complaining 58 -16
75. Eating out 58 -8
76. Jesus 55 -26
77. McDonald’s 55  
78. Beef 54 +18
79. Church 54 +6
80. God 53 -21
81. Abstinence 53 -39
82. Cake 52  
83. Negativity 52  
84. Him 49  
85. Juice 47  
86. Celibacy 44 +13
87. Chicken 42  
88. Lying 42  
89. New Year’s resolutions 42 -29
90. Sarcasm 42 -39
91. Snacking 41  
92. My wife 39  
93. Tea 37  
94. iPhone 37  
95. Exercise 36 -6
96. Sweet tea 35  
97. People 35  
98. Vegetables 34  
99. Pasta 33  
100. Self control 33  

Image created using Wordle.

Interview with Yingyan Huang

Wednesday, August 26th, 2009

Yingyan Huang, who created the two visualizations on the unity of Scripture from the last post, graciously agreed to answer a few questions for me about her work:

Where did you get the inspiration for your pieces? What was your motivation to create them?

Initially I had started with ideas of creating an information visualizations on Paul’s journeys and the spread of Christianity. As I explored the topic further, I realized that the focus, as a Christian, should not be on Paul but really on Christ Himself. Since the objective of Paul’s journeys and ministry was Christ. So I began exploring the Gospels and the role of Christ in the Bible. Ultimately my motivation is to visually demonstrate, given the Bible as a historical relic and literature, Christ’s centrality in the Bible and His reality because of the sheer number of references to Him found in the OT even before His birth.

What was the process you used to create them? Where did you get the data? What tools did you use?

My data was obtained through many reading and primarily comes from the Tyndale Life Application Study Bible, ESV Study Bible, and The Christ of the Covenants by O. Palmer Robertson. I cross referenced the passages from ESV’s history of salvation and Palmer’s book against the 250 events in the life of Christ from the Tyndale study bible. Some computer programs I used include Adobe Illustrator, Acrobat, and Microsoft Excel.

With infographics, there’s often a tension between making something attractive and clearly portraying the information you want to communicate. As a designer, how did you resolve that tension with these pieces?

As mentioned above, I basically cross-referenced the bible verses/passages to create data points. I did not create my own data, merely synthesized available data that have been around for centuries to create the data sets.

Why do graphic designers create infographics about the Bible?

Perhaps the Bible remains a choice subject matter because it is rich with intricacies and complexities that come together so harmoniously.

What are you up to these days?

I am currently interning/working at a magazine as a art/photo assistant and assisting in teaching an information design class at Parsons. I am also cooking and baking to test and create recipes. Additionally I am involved in starting a new college ministry (City Campus Ministry) and a teaching assistant for the Children’s Ministry at my church, Redeemer Presbyterian Church in NYC.

Two Bible Visualizations on the Unity of Scripture

Sunday, August 2nd, 2009

Yingyan Huang, a graphic designer, recently created two Bible visualizations that emphasize the unity of Scripture.

Harmony of the Gospels. The first visualization, “The Harmony of the Gospels—250 Events in the Life of Christ,” identifies 250 events recorded in the Gospels, arranges them chronologically, and plots them to reveal which events occurred in which Gospels. In effect, it’s a visualization of the Composite Gospel Index, though the visualization is apparently using a different data set.

A Single Story. The second visualization, “A Single Story—The Bible through the Lens of 250 Events,” starts with the same 250 events but extends references to them into the Old Testament and the remainder of the New Testament. This visualization shows the centrality of Christ in the Bible.

Both visualizations convey useful information. I wish they were available in larger sizes, but even the low-resolution versions give you a sense of the message behind the images.

Phrase Net Bible Visualizations

Wednesday, March 25th, 2009

The social data visualization site Many Eyes just unveiled a new visualization type, the Phrase Net, which illustrates phrase connections in textual content. The hard part is finding the right phrase to produce a good visualization.

Here are a couple of visualizations of the Old and New Testaments in the KJV Bible, using the pattern “[word] of [word].”

Old Testament

The Old Testament visualization illustrates the centrality of ideas like “Children of Israel,” “King of Israel,” and “Land of Israel.”

See the Old Testament visualization at Many Eyes (requires Java)

New Testament

The New Testament visualization shows the shift in emphasis to “Son of God” and “Kingdom of God.”

See the New Testament visualization at Many Eyes (requires Java)

Both visualizations require Java.