Blog RSS Feed

Archive for the ‘Twitter’ Category

The Bible on Twitter in 2014

Tuesday, December 30th, 2014

Bible Gateway recently shared their most-popular Bible verses of 2014, and I wanted to discuss this chart a little more:

Popular Bible verses by day in 2014 on Bible Gateway

The chart stems from the idea that if someone is equally likely to see a verse on any day of the year, each day should have 1/365, or 0.27%, of a verse’s yearly popularity. This chart shows days when there’s a spike in pageviews for each verse for a particular day (whenever it was over 0.4% of the annual total).

The theme of the chart is that people follow certain paths through the Bible during the year; I labeled a few of them on the chart. But there are definitely a few patterns I can’t explain:

  1. At the beginning of the year, two lines emanate from Genesis that look like they’re on track to read the full Bible in a year, but one of them is faster than the other. Why are there two?
  2. At the bottom right of the chart is a shallow line that looks like it involves reading Genesis and Exodus starting in May and ending in December. There’s a similar line in the New Testament running through Matthew from June to November. What are those?

I was curious whether the same patterns would appear in Twitter for the year, so I ran a similar analysis on the 43 million tweets this year that mentioned Bible verses. The answer is that, yes, you can see many of the same paths in both charts:

Popular Bible verses by day in 2014 on Twitter

They even include the same two (or three or four) fast readings of the Bible at the beginning of the year and the slow reading of Genesis and Exodus in the second half of the year. You can see similar peaks around the Passion stories leading to Easter and the Nativity story leading to Christmas. (Christmas is the last day that appears on this chart.) The Twitter chart more clearly shows the weekly rhythms of the devotional life, with vertical lines just barely visible every Sunday. The main difference is that there’s not as clear a path through the New Testament.

The Twitter chart also shows some horizontal bands where sharing is pretty light. These “sharing shadows” appear in the opening chapters of Numbers, 1 Kings, and 1 Chronicles.

Prolific Verse Sharers

A quirk of the Twitter chart is that some Twitterers tweet (and are retweeted) a lot. I suspect many of them are bots, but it’s hard to say whether they constitute “Bible spam”–many people do appear to find them helpful by retweeting them. The top fifty or so Twitterers are responsible for 16 million of the 43 million tweets this year. The chart doesn’t look too different if you remove them (mostly, the frequent repetition of Matthew disappears), but that just could be because I didn’t remove enough users to affect the results meaningfully. For all I know, this chart mostly just shows how Twitter bots share the Bible during the year. The consistency with the Bible Gateway data (in which I have more confidence), however, leads me to think that this picture is reasonably accurate.

Here are the top non-bot (as far as I can tell) sharers of Bible verses–these people tweeted the most Bible verses (and, more importantly, were retweeted most) throughout the year. Some of these people I recognize, and others… not so much. The “tweet” numbers reflect only tweets containing Bible verses and include others’ retweets of their tweets.

  1. JohnPiper (105,836 tweets)
  2. DangeRussWilson (87,382 tweets)
  3. WeLiftYourName (52,638 tweets)
  4. JosephPrince (50,889 tweets)
  5. BishopJakes (49,109 tweets)
  6. siwon407 (48,994 tweets)
  7. RickWarren (42,637 tweets)
  8. JoyceMeyer (39,703 tweets)
  9. jeremycamp (32,003 tweets)
  10. DaveRamsey (28,173 tweets)
  11. RCCGworldwide (26,731 tweets)
  12. AdamCappa (25,976 tweets)
  13. Creflo_Dollar (24,422 tweets)
  14. sadierob (20,068 tweets)
  15. Carson_Case (19,846 tweets)
  16. TimTebow (18,303 tweets)
  17. Kevinwoo91 (17,230 tweets)
  18. levimitchell (16,355 tweets)
  19. jesse_duplantis (15,755 tweets)
  20. kutless (14,806 tweets)

Most-Popular Verses

Here are the most-popular verses shared on Twitter in 2014:

  1. Phil 4:13 (613,161 tweets)
  2. 1Pet 5:7 (261,417 tweets)
  3. Prov 3:5 (218,019 tweets)
  4. John 14:6 (212,883 tweets)
  5. John 13:7 (207,084 tweets)
  6. 1Cor 13:4 (197,379 tweets)
  7. Matt 28:20 (187,407 tweets)
  8. Ps 118:24 (183,475 tweets)
  9. 2Tim 1:7 (182,758 tweets)
  10. Ps 56:3 (180,139 tweets)

You can also download a text file (411 KB) with the complete list of 2014’s popular verses.

John 13:7 (“Jesus replied, ‘You do not realize now what I am doing, but later you will understand.'”) is the oddball here, but it turns out that it’s mostly from over 100,000 retweets of a single tweet in April. (Since it was a one-off, I omitted him from the list of top sharers above, although his tweet count of 163,497 would put him in first place.)

How do the year’s most-popular verses compare among Bible Gateway, YouVersion, and Twitter? The answer: there’s a good deal of variation. Below are the top ten from each service; only Proverbs 3:5 appears in all three lists, and YouVersion and Twitter only have one verse that overlaps, which surprises me (given that they’re both based on sharing).

If we look only at Bible Gateway and Twitter, the average verse differs in its ranking by about 3,000 places, or nearly 10% of the Bible. The largest differences in rank: 1 Kings 20:14 is much more popular on Twitter (rank 4,380) than on Bible Gateway (rank 27,119), while Ezra 5:14 is way more popular on Bible Gateway (rank 13,995) than Twitter (rank 30,018).

Ranking Bible Gateway YouVersion Twitter
1. John 3:16 Rom 12:2 Phil 4:13
2. Jer 29:11 Phil 4:8 1Pet 5:7
3. Phil 4:13 Phil 4:6 Prov 3:5
4. Rom 8:28 Jer 29:11 John 14:6
5. Ps 23:4 Matt 6:33 John 13:7
6. Phil 4:6 Phil 4:7 1Cor 13:4
7. 1Cor 13:4 Prov 3:5 Matt 28:20
8. Prov 3:5 Isa 41:10 Ps 118:24
9. 1Cor 13:7 Matt 6:34 2Tim 1:7
10. Rom 12:2 Prov 3:6 Ps 56:7

Bold entries appear in at least two lists.

Data Source

The Twitter data is from Bible Verses on Twitter. A program connects to the Twitter Streaming API with a query for every chapter of the Bible (“Gen 1”, “Genesis 1”, and so on). I run a Bible reference parser on the tweet to pull out all the references. Then an SVM algorithm tries to guess whether the tweet is actually talking about a Bible verse or just happens to contain a string that looks like a Bible reference (“Gen 1 XBox for sale,” where “Gen” is short for “Generation”).

Sidenote: How I Calculate Verse Views

A note on methodology: I’ve never documented how I determine a particular verse’s popularity; now’s a good time, because you can do it a number of ways to reach different answers. Let’s say that someone is looking at Genesis 1, which has 31 verses. That counts as one pageview, but if you’re looking for the number of pageviews that, say, Genesis 1:1 receives, how do attribute a chapter-length view like this? You could give each verse credit for a full pageview, but then verses in long chapters will appear to have a disproportionately high number of pageviews. Instead, I prefer to divide the pageview into the number of verses in the passage: in this case, each verse in Genesis 1 will receive 1/31, or 0.032 pageviews.

Now, what if someone is looking at, say, Genesis 1:1 and Matthew 1 (25 verses) on the same page? In this case, I divide the pageview by the number of separate passages: Genesis 1:1 receives credit for a full 0.5 pageviews, as does Matthew 1. Each verse in Matthew 1 therefore receives 0.5/25, or 0.02 pageviews.

I feel that this approach best respects people’s intentions whether they want to look at multiple verses, several independent passages, or just individual verses.

What Twitterers Are Giving up for Lent (2014 Edition)

Saturday, March 8th, 2014

The top 100 things that people on Twitter are giving up for Lent in 2014.

This year, “School” topped the list of things Twitterers are giving up for Lent, up 44 places from last year. Remaining in the top ten from last year are Swearing, Alcohol, Soda, Social Networking, and Fast Food. Chocolate, Twitter, Sweets, and Lent round out the new additions to the top ten.

I don’t have a great explanation for why School is #1 this year–it could be that Ash Wednesday is later this year, so spring break is closer (for some it even starts today). It’s also possible that Twitter’s audience is skewing younger than it used to, or that younger Twitter users are more likely to tweet about Lent.

Timely topics this year are Boosie, referring to rapper Lil Boosie, who was released from prison this week (people joked that the prison was giving him up for Lent); and Electricity, referring to a widespread power outage in South Africa.

This list draws from 646,000 tweets during March 2 to 8 that mention giving up something for Lent and excludes retweets.

Rank Word Count Change from last year’s rank
1. School 11,757 +44
2. Chocolate 9,515 +15
3. Twitter 8,642 +8
4. Swearing 7,132 -2
5. Alcohol 6,325 0
6. Soda 5,446 -3
7. Social networking 4,197 -3
8. Sweets 4,188 +8
9. Fast food 4,088 0
10. Lent 2,842 +118
11. Meat 2,790 +26
12. Homework 2,760 +61
13. Junk food 2,723 +8
14. Coffee 2,678 +123
15. Sex 2,392 +112
16. Chips 2,129 -10
17. Bread 2,020 +114
18. You 2,016 +21
19. Facebook 1,926 0
20. Pizza 1,628 +122
21. Starbucks 1,566 +120
22. Candy 1,412 +87
23. Instagram 1,212 -13
24. Religion 1,147 +104
25. Virginity 1,143 -18
26. Cookies 1,053 -14
27. Work 1,031 +4
28. Ice cream 1,025 +27
29. Boys 1,021 +99
30. Marijuana 1,018 -22
31. Smoking 994 -9
32. Beer 939 +104
33. Life 933 -8
34. Food 930 +27
35. McDonalds 926 +20
36. Winter 853  
37. Netflix 851 -7
38. College 819 +16
39. My phone 777 +6
40. Shopping 748 +103
41. Stuff 733 +100
42. Selfies 731 +1
43. Chipotle 726 +100
44. Masturbation 725 -30
45. Sugar 682 +82
46. Cheese 670 +94
47. Me 656 +87
48. Sobriety 655 -17
49. Wine 652 +94
50. Carbs 648 +81
51. Boosie 581  
52. Fried food 574 -23
53. Caffeine 563 +70
54. Rice 562 +86
55. Catholicism 561 -28
56. Snapchat 543 +11
57. Coke 541 -22
58. Procrastination 517 -18
59. People 516 +54
60. Snow 506  
61. Desserts 486 -37
62. Fizzy drinks 480 +81
63. French fries 475 -29
64. Takeout 464 -49
65. Obama 452 +74
66. Makeup 451 -25
67. Taco Bell 434 +39
68. Feelings 434 -32
68. Porn 430  
69. Nothing 427 +74
70. My swag 420 -47
71. Negativity 417 +28
72. Red meat 396 +59
73. Diet Coke 390 +69
74. Sarcasm 380  
75. Breathing 369  
76. Caring 357 +66
77. Complaining 354  
78. Tea 352 +64
79. Pancakes 340 +63
80. Peanut butter 336  
81. Sweet tea 335  
82. Booze 325 +61
83. Sleep 320 +33
84. Hope 316 +46
85. Cake 313 -13
86. Pasta 303 +57
87. TV 302 +30
88. Texting 297 +52
89. Eating out 275 -29
90. Exercise 274 -47
91. Pants 270 +5
92. Electricity 268 +41
93. The gym 258 +16
94. Liquor 245  
95. Church 243 -46
96. Tinder 237 +35
97. Tumblr 236 +46
98. Math 236 +20
98. Juice 232 +35
99. Being mean 230  
100. Chick Fil A 228 -38

Categories

Rank Category Number of Tweets
1. food 62,453
2. school/work 18,148
3. technology 17,615
4. habits 16,616
5. smoking/drugs/alcohol 12,665
6. irony 7,319
7. relationship 6,563
8. sex 5,483
9. health/hygiene 3,476
10. religion 2,784
11. generic 2,504
12. entertainment 1,959
13. weather 1,496
14. shopping 1,183
15. celebrity 961
16. sports 780
17. politics 547
18. clothes 540
19. money 492
20. habit 393
21. possessions 217
22. clothing 62

Historical Trends

This year I added a new Historical Lent Tracker that you can use to investigate Lenten trends on your own over the past six years.

Here are some of my favorite graphs:

Second-Wave Social Media

Tumblr peaked in 2011, and WhatsApp, which Facebook recently paid $19 billion for, doesn’t register highly.

Instagram is highest, followed by Snapchat, Tumblr, Tinder, and Whatsapp

Fast Food Restaurants

Chipotle is much higher on the list than I expected–is that because people love it or because they hate it?

McDonald's is highest, followed by Chipotle, Taco Bell, Chick-Fil-A, Dunkin Donuts, Whataburger, KFC, and Subway.

One Direction vs. Justin Bieber

One Direction has been outpacing Justin Bieber since 2012.

Snack Foods

Congratulations, Hot Cheetos, on being the snack the most people want to give up.

Hot Cheetos is highest, followed by popcorn, Doritos, potato chips, and Cheetos.

Media Coverage

The Lent Tracker got some media attention this year. In roughly chronological order:

Finally, this Wall Street Journal article doesn’t talk about the Lent tracker, but it discusses the fraught phenomenon of Ash Wednesday selfies: Selfies Bring Ashtags to Lent. (This article may or may not be behind a paywall for you.)

Track What People Are Giving Up in 2014 for Lent in Real Time

Monday, March 3rd, 2014

See the top 100 things people are giving up in 2014 for Lent on Twitter, continually updated until March 7, 2014.

As I write this post, with about 5,000 tweets analyzed, the new hot topics so far this year are: “Netflix,” “Flappy Bird,” and “Getting an Oscar.” “Social Networking” is currently way out in front, with twice as many tweets as perennial favorites “Swearing” and “Alcohol.” (Last year, Social Networking came in at #4.)

Look for the usual post-mortem on March 8, 2014.

What Twitterers Are Giving up for Lent (2013 Edition)

Saturday, February 16th, 2013

The top 100 things that people on Twitter are giving up for Lent in 2013.

This year saw a lot of churn in the top 100 things people were giving up for Lent.

The pope announced his resignation on Monday, leading many to say that he was giving up “being pope” for Lent. It came in at #1. (Related, at #18, people said they were giving up “the pope” for Lent.)

Specific social networking sites like Twitter and Facebook generally dropped this year, with the generic term “social networking” (#4) taking over as a catchall. Instagram (#10), Pinterest (#52), and Snapchat (#78) were all new to the top 100.

With Valentine’s Day falling on the day after Ash Wednesday this year, it came in at #13. My wife suggests that the timing may also have contributed to the drop in “chocolate” from #2 last year to #17 this year. “Valentines” is #97.

“Horse meat” (#20) refers to the ongoing European scandal.

The only celebrity to make the list was British boy band One Direction, up substantially at #41.

I learned several new words this year: “twerking” (#34), a type of dance move, “selfies” (#46), or self-shot photos taken with a phone, “subtweeting” (#57), or tweeting about someone without mentioning them by name, “oomf” (#71), or “one of my [Twitter] followers,” and “Nando’s” (#76), a chicken restaurant.

This list draws from 263,000 tweets from February 10-15, 2013, and excludes most retweets.

Rank Word Count Change from last year’s rank
1. Being pope 5,654  
2. Swearing 4,944 +1
3. Soda 2,648 +2
4. Social networking 2,264 +19
5. Alcohol 2,217 -1
6. Chips 1,690 +8
7. Virginity 933 +23
8. Marijuana 784 +17
9. Fast food 776 -2
10. Instagram 755 +270
11. Twitter 672 -10
12. Cookies 643 +19
13. Valentine’s day 514  
14. Masturbation 510 +18
15. Takeout 465 +59
16. Sweets 444 -7
17. Chocolate 417 -15
18. The pope 394 +10,224
19. Facebook 380 -13
20. Horse meat 375  
21. Junk food 362 -8
22. Smoking 355 -3
23. My swag 331 +373
24. Desserts 325 +21
25. Life 325 +40
26. New year’s resolutions 313 +47
27. My boyfriend 309 +99
28. Catholicism 255 +11
29. Straightening my hair 228 +89
30. Fried food 225 +5
31. Netflix 216 +255
32. Work 216 -5
33. Sobriety 213 +4
34. Twerking 185 +698
35. The playoffs 184 +3,556
36. French fries 173 +19
37. Coke 168 +1
38. Feelings 168 +207
39. Laziness 160 +28
40. Meat 158 -30
41. Onedirection 155 +103
42. You 154 -24
43. Procrastination 153 +1
44. Makeup 150 +16
45. Internet 149 +61
46. Selfies 149 +2,328
47. Exercise 144 +58
48. School 141 -36
49. My phone 135 +15
50. Classes 129 +84
51. Dip 127 +132
52. Pinterest 125 +133
53. Church 124 +33
54. Emotions 122 +397
55. Going to school 119 +163
56. My girlfriend 111 +207
57. Subtweeting 110 +253
58. College 106 +5
59. My face 106 +4,168
60. Ice cream 106 -27
61. McDonald’s 102 -32
62. Being ugly 101 +256
63. Snacking 99 +19
64. Spending 96 +89
65. Dunkin Donuts 96 +475
66. Chew 95 +418
67. Eating out 94 +28
68. Elevators 94 +99
69. Food 93 -47
70. Moaning 93 +123
71. Oomf 93 +78
72. Chick Fil A 90 +135
73. Healthy food 88 +180
74. Football 87 +145
75. Swimming 87 +200
76. Nando’s 86 +72
77. DVDs 84 +1,326
78. Snapchat 84  
79. Broccoli 83 +206
80. Ranch 81 +250
81. The snooze button 80 +176
82. Crystal meth 80 +219
83. Dignity 79 +116
84. Cake 77 -13
85. Unhealthy food 77 +34
86. Homework 76 -65
87. Busyness 75  
88. Schoolwork 74 +88
89. Chemistry 74 +34,949
90. Frozen yogurt 72 +480
91. iPhone 72 +100
92. FIFA 71 +143
93. Betting 70 +315
94. Doing homework 69 +158
95. Myself 68 +267
96. Supermarkets 67 +1,797
97. Valentines 66  
98. Domino’s 63 +323
99. Being negative 63 +212
100. Hookah 63 +340

Categories

Rank Category Number of Tweets
1. food 11,642
2. habits 8,083
3. religion 6,519
4. technology 4,782
5. smoking/drugs/alcohol 3,928
6. sex 1,771
7. relationship 1,399
8. health/hygiene 1,270
9. school work 1,095
10. irony 792
11. sports 648
12. entertainment 392
13. celebrity 246
14. clothes 235
15. money 133
16. shopping 111

The image is a Wordle.

Track What People Are Giving Up in 2013 for Lent in Real Time

Monday, February 11th, 2013

See the top 100 things people are giving up in 2013 for Lent on Twitter, continually updated until February 15, 2013.

As I write this post, with about 5,000 tweets analyzed, the new hot topics so far this year are: meowing, Valentine’s Day, and Snapchat.

Look for the usual post-mortem on February 16, 2013.

What Twitterers Are Giving up for Lent (2012 Edition)

Saturday, February 25th, 2012

The top 100 things that people on Twitter are giving up for Lent in 2012.

This year, Twitter continues to take top honors. Facebook drops a few places compared to last year–has it become less-central to people’s lives? This year’s hot new site, Pinterest, almost makes the list, showing up at #118. (Incidentally, Pinterest has a number of Lent-related boards.)

Chocolate comes in at #2–however, if you add up all the mentions of chocolate in its various forms (“chocs,” “chocolate chips,” etc.), it totals over 14,000 mentions, enough to put it at #1.

This year’s biggest gainers are “breathing” and “makeup,” both of which jumped up more than 30 places in the list.

No celebrities make the top 100 this year. Boy band One Direction (aka #1D) is at #144, followed by Justin Bieber at #194 and Tim Tebow at #221. Last year’s curiosity, Charlie Sheen, only got two mentions; he dropped to #10,000 or so.

Overall, food was by far the most popular thing given up.

This list draws from about 300,000 tweets from February 19-25, 2012, and excludes retweets.

Rank Word Count Change from last year’s rank
1. Twitter 13,937 0
2. Chocolate 13,001 +1
3. Swearing 11,737 +1
4. Alcohol 9,998 +1
5. Soda 9,942 +2
6. Facebook 9,025 -4
7. Fast food 6,529 +3
8. Sex 6,146 -2
9. Sweets 4,973 +2
10. Meat 4,444 -1
11. Lent 4,171 -3
12. School 3,976 +1
13. Junk food 3,388 +6
14. Chips 3,150 +4
15. Coffee 2,263 0
16. Candy 2,217 +6
17. Bread 2,124 +3
18. You 2,056 -2
19. Smoking 2,002 +2
20. Giving up things 2,001 -8
21. Homework 1,908 +11
22. Food 1,800 +5
23. Social networking 1,754 -6
24. Religion 1,701 -10
25. Marijuana 1,594 +4
26. Beer 1,359 +4
27. Work 1,331 -3
28. Stuff 1,302 -3
29. McDonald’s 1,249 +21
30. Virginity 1,152 +7
31. Cookies 1,137 +3
32. Masturbation 1,134 +4
33. Ice cream 1,113 +15
34. Shopping 1,068 -6
35. Fried food 993 -4
36. Boys 956 +6
37. Sobriety 910 +7
38. Coke 899 +3
39. Catholicism 881 -13
40. Cheese 858 -7
41. Nothing 831 +5
42. Carbs 818 +16
43. Red meat 758 -8
44. Procrastination 738 +1
45. Desserts 733 +26
46. Pizza 714 +15
47. Pancakes 650 -9
48. Sugar 645 -5
49. Rice 633 -10
50. Breathing 631 +34
51. Me 628 +12
52. Texting 627 +3
53. Starbucks 623 +1
54. Fizzy drinks 595 +12
55. French fries 593 +7
56. Diet Coke 572 +21
57. Porn 562 +10
58. Tumblr 548 +12
59. Wine 546 -7
60. Makeup 539 +31
61. Liquor 534 -5
62. Booze 530 -22
63. College 524 +18
64. My phone 508 +30
65. Life 486  
66. Caffeine 466 -17
67. Laziness 453 +11
68. Chipotle 452 +30
69. Tea 445 +6
70. Chicken 442 +2
71. Cake 440 +3
72. Sarcasm 429 +4
73. New Year’s resolutions 422 +15
74. Takeout 417 +11
75. Men 412 -10
76. Pork 394 -3
77. Christianity 388 -18
78. Sleep 386 +1
79. People 384 +8
80. Caring 377  
81. Juice 357 +11
82. Snacking 345  
83. Lying 333  
84. TV 332 -31
85. Complaining 331 -2
86. Church 328 -35
87. Him 327 +2
88. Sweet tea 326  
89. Lint 326 -21
90. Vegetables 324  
91. Talking 323  
92. Bacon 321  
93. Being mean 320  
94. Pasta 316  
95. Eating out 316 +5
96. Negativity 314 -39
97. Eating 298  
98. Biting my nails 294  
99. Nutella 291  
100. Being nice 258  

Categories

Rank Category Number of Tweets
1. food 79,977
2. habits 21,836
3. technology 19,190
4. smoking/drugs/alcohol 19,073
5. health/hygiene 11,101
6. sex 9,948
7. irony 9,352
8. school/work 8,567
9. relationship 6,919
10. religion 4,157
11. generic 2,841
12. shopping 1,491
13. entertainment 1,344
14. money 526
15. sports 512
16. celebrity 461
17. possessions 376
18. clothes 111
19. politics 105

The image is a Wordle.

Track What People Are Giving Up for Lent in Real Time

Wednesday, February 22nd, 2012

See the top 100 things people are giving up for Lent on Twitter, continually updated for the next few days.

Look for the usual post-mortem later this week.

Bible Annotation Modeling and Querying in MySQL and CouchDB

Thursday, September 1st, 2011

If you’re storing people’s Bible annotations (notes, bookmarks, highlights, etc.) digitally, you want to be able to retrieve them later. Let’s look at some strategies for how to store and look up these annotations.

Know What You’re Modeling

First you need to understand the shape of the data. I don’t have access to a large repository of Bible annotations, but the Twitter and Facebook Bible citations from the Realtime Bible Search section of this website provide a good approximation of how people cite the Bible. (Quite a few Facebook posts appear to involve people responding to their daily devotions.) These tweets and posts are public, and private annotations may take on a slightly different form, but the general shape of the data should be similar: nearly all (99%) refer to a chapter or less.

Large dots at the bottom indicate many single-verse references. Chapter references are also fairly prominent. See below for more discussion.

Compare Bible Gateway reading habits, which are much heavier on chapter-level usage, but 98% of accesses still involve a chapter or less.

The Numbers

The data consists of about 35 million total references.

Percent of Total Description Example
73.5 Single verse John 3:16
17.1 Verse range in a single chapter John 3:16-17
8.4 Exactly one chapter John 3
0.7 Two or more chapters (at chapter boundaries) John 3-4
0.1 Verses spanning two chapters (not at chapter boundaries) John 3:16-4:2
0.1 Verses spanning three or more chapters (not at chapter boundaries) John 3:16-5:2

About 92.9% of posts or tweets cited only one verse or verse range; 7.1% mentioned more than one verse range. Of the latter, 77% cited exactly two verse ranges; the highest had 323 independent verse ranges. Of Facebook posts, 9.1% contained multiple verse ranges, compared to 4.2% of tweets. When there were multiple ranges, 43% of the time they referred to verses in different books from the other ranges; 39% referred to verses in the same book (but not in the same chapter); and 18% referred to verses in the same chapter. (This distribution is a unusual—normally close verses stick together.)

The data, oddly, doesn’t contain any references that span multiple books. Less than 0.01% of passage accesses span multiple books on Bible Gateway, which is probably a useful upper bound for this type of data.

Key Points

  1. Nearly all citations involve verses in the same chapter; only 1% involve verses in multiple chapters.
  2. Of the 1% spanning two or more chapters, most refer to exact chapter boundaries.
  3. Multiple-book references are even more unusual (under 0.01%) but have outsize effects: an annotation that references Genesis 1 to Revelation 22 would be relevant for every verse in the Bible.
  4. Around 7% of notes contained multiple independent ranges of verses—the more text you allow for an annotation, the more likely someone is to mention multiple verses.

Download

Download the raw social data (1.4 MB zip) under the usual CC-Attribution license.

Data Modeling

A Bible annotation consists of arbitrary content (a highlight might have one kind of content, while a proper note might have a title, body, attachments, etc., but modeling the content itself isn’t the point of this piece) tied to one or more Bible references:

  1. A single verse (John 3:16).
  2. A single range (John 3:16-17).
  3. Multiple verses or ranges (John 3:16, John 3:18-19)

The Relational Model

One user can have many rows of annotations, and one annotation can have many rows of verses that it refers to. To model a Bible annotation relationally, we set up three tables that look something like this:

users

user_id name
1

annotations

user_id annotation_id content
1 101
1 102
1 103

annotation_verses

The verse references here are integers to allow for easy range searches: 43 = John (the 43rd book in the typical Protestant Bible); 003 = the third chapter; the last three digits = the verse number.

I like using this approach over others (sequential integer or separate columns for book, chapter, and verse) because it limits the need for a lookup table. (You just need to know that 43 = John, and then you can find any verse or range of verses in that book.) It also lets you find all the annotations for a particular chapter without having to know how many verses are in the chapter. (The longest chapter in the Bible has 176 verses, so you know that all the verses in John 3, for example, fall between 43003001 and 43003176.) This main disadvantage is that you don’t necessarily know how many verses you’re selecting until after you’ve selected them. And using individual columns, unlike here, does allow you to run group by queries to get easy counts.

annotation_id start_verse end_verse
101 43003016 43003016
102 43003016 43003017
103 43003016 43003016
103 43003019 43003020

Querying

In a Bible application, the usual mode of accessing annotations is by passage: if you’re looking at John 3:16-18, you want to see all your annotations that apply to that passage.

Querying MySQL

In SQL terms:

select distinct(annotations.annotation_id)
from annotations, annotation_verses
where annotation_verses.start_verse <= 43003018 and
annotation_verses.end_verse >= 43003016 and
annotations.user_id = 1 and
annotations.annotation_id = annotation_verses.annotation_id
order by annotation_verses.start_verse asc, annotation_verses.end_verse desc

The quirkiest part of the SQL is the first part of the “where” clause, which at first glance looks backward: why is the last verse in the start_verse field and the first verse in the end_verse field? Because the start_verse and end_verse can span any range of verses, you need to make sure that you get any range that overlaps the verses you’re looking for: in other words, the start_verse is before the end of the range, and the end_verse is after the start.

Visually, you can think of each start_verse and end_verse pair as a line: if the line overlaps the shaded area you’re looking for, then it’s a relevant annotation. If not, it’s not relevant. There are six cases:

Start before, end before: John 3:15 / Start before, end inside: John 3:15-17 / Start before, end after: John 3:15-19 / Start inside, end inside: John 3:16-18 / Start inside, end after: John 3:17-19 / Start after, end after: John 3:19

The other trick in the SQL is the sort order: you generally want to see annotations in canonical order, starting with the longest range first. In other words, you start with an annotation about John 3, then to a section inside John 3, then to individual verses. In this way, you move from the broadest annotations to the narrowest annotations. You may want to switch up this order, but it makes a good default.

The relational approach works pretty well. If you worry about the performance implications of the SQL join, you can always put the user_id in annotation_verses or use a view or something.

Querying CouchDB

CouchDB is one of the oldest entrants in the NoSQL space and distinguishes itself by being both a key-value store and queryable using map-reduce: the usual way to access more than one document in a single query is to write Javascript to output the data you want. It lets you create complex keys to query by, so you might think that you can generate a key like [start_verse,end_verse] and query it like this: ?startkey=[0,43003016]&endkey=[43003018,99999999]

But no. Views are one-dimensional, meaning that CouchDB doesn’t even look at the second element in the key if the first one matches the query. For example, an annotation with both a start and end verse of 19001001 matches the above query, which isn’t useful for this purpose.

I can think of two ways to get around this limitation, both of which have drawbacks.

GeoCouch

CouchDB has a plugin called GeoCouch that lets you query geographic data, which actually maps well to this data model. (I didn’t come up with this approach on my own: see Efficient Time-based Range Queries in CouchDB using GeoCouch for the background.)

The basic idea is to treat each start_verse,end_verse pair as a point on a two-dimensional grid. Here’s the above social data plotted this way:

A diagonal line starts in the bottom left corner and continues to the top right. Large dots indicate popular verses, and book outlines are visible.

The line bisects the grid diagonally since an end_verse never precedes a start_verse: the diagonal line where start_verse = end_verse indicates the lower bound of any reference. Here are some points indicating where ranges fall on the plot:

This chart looks the same as the previous one but has points marked to illustrate that longer ranges are farther away from the bisecting line.

To find all the annotations relevant to John 3:16-18, we draw a region starting in the upper left and continuing to the point 43003018,43003016:

This chart looks the same as the previous one but has a box from the top left ending just above and past the beginning of John near the upper right of the chart.

GeoCouch allows exactly this kind of bounding-box query: ?bbox=0,43003016,43003018,99999999

You can even support multiple users in this scheme: just give everyone their own, independent box. I might occupy 1×1 (with an annotation at 1.43003016,1.43003016), while you might occupy 2×2 (with an annotation at 2.43003016,2.43003016); queries for our annotations would never overlap. Each whole number to the left of the decimal acts as a namespace.

The drawbacks:

  1. The results aren’t sorted in a useful way. You’ll need to do sorting on the client side or in a show function.
  2. You don’t get pagination.

Repetition at Intervals

Given the shape of the data, which is overwhelmingly chapter-bound (and lookups, which at least on Bible Gateway are chapter-based), you could simply repeat chapter-spanning annotations at the beginning of every chapter. In the worst case annotation (Genesis 1-Revelation 22), you end up with about 1200 repetitions.

For example, in the Genesis-Revelation case, for John 3 you might create a key like [43000000.01001001,66022021] so that it sorts at the beginning of the chapter—and if you have multiple annotations with different start verses, they stay sorted properly.

To get annotations for John 3:16-18, you’d query for ?startkey=[43003000]&endkey=[43003018,{}]

The drawbacks:

  1. You have to filter out all the irrelevant annotations: if you have a lot of annotations about John 3:14, you have to skip through them all before you get to the ones about John 3:16.
  2. You have to filter out duplicates when the range you’re querying for spans multiple chapters.
  3. You’re repeating yourself, though given how rarely a multi-chapter span (let alone a multi-book span) happens in the wild, it might not matter that much.

Other CouchDB Approaches

Both these approaches assume that you want to make only one query to retrieve the data. If you’re willing to make multiple queries, you could create different list functions and query them in parallel: for example, you could have one for single-chapter annotations and one for multi-chapter annotations. See interval trees and geohashes for additional ideas. You could also introduce a separate query layer, such as elasticsearch, to sit on top of CouchDB.

What Twitterers Are Giving up for Lent (2011 Edition)

Thursday, March 10th, 2011

The top 100 things that people on Twitter are giving up for Lent in 2011.

Congratulations, I guess, go this year to Charlie Sheen, who came in at both #23 and, with “tiger blood,” at #90. Justin Bieber is up several spots this year, so he hasn’t quite crested yet. The next-highest celebrity, who didn’t make the top 100, is British boy band One Direction.

“Trophies,” at #69, refers to the English soccer club Arsenal‘s recent defeat, or something.

The later start to Lent this year means that “snow” doesn’t appear on the list–last year, it was #48. Myspace hangs on at #99, dropping 48 places.

This list draws from 85,000 tweets from March 7-10, 2011, and excludes retweets.

Rank Word Count Change from last year’s rank
1. Twitter 4297 0
2. Facebook 4060 0
3. Chocolate 3185 0
4. Swearing 2527 +1
5. Alcohol 2347 -1
6. Sex 2093 +3
7. Soda 1959 -1
8. Lent 1493 -1
9. Meat 1352 -1
10. Fast food 1303 0
11. Sweets 1252 0
12. Giving up things 778 +7
13. School 768 +27
14. Religion 745 +1
15. Coffee 707 -3
16. You 675 +6
17. Social networking 665 +15
18. Chips 664 +3
19. Junk food 594 -1
20. Bread 571 +6
21. Smoking 555 -4
22. Candy 541 -8
23. Charlie Sheen 511  
24. Work 482 +4
25. Stuff 467 -2
26. Catholicism 436 -10
27. Food 395 +3
28. Shopping 363 +1
29. Marijuana 358 +31
30. Beer 346 -10
31. Fried food 307 -7
32. Homework 306 +27
33. Cheese 297 +4
34. Cookies 293 +11
35. Red meat 285 -10
36. Masturbation 285 +8
37. Virginity 253 +26
38. Pancakes 252 +20
39. Rice 236 -5
40. Booze 235 +2
41. Coke 234 -3
42. Boys 229 +24
43. Sugar 229 -16
44. Sobriety 226 +10
45. Procrastination 226 -10
46. Nothing 219 +21
47. Winning 219  
48. Ice cream 211 -7
49. Caffeine 203 -16
50. McDonald’s 195 +27
51. Church 188 +28
52. Wine 188 -3
53. TV 184 -7
54. Starbucks 183 -15
55. Texting 182 -12
56. Liquor 181 -1
57. Negativity 180 +26
58. Carbs 179 +10
59. Christianity 177 -12
60. Justin Bieber 176 +9
61. Pizza 175 -11
62. French fries 159 +2
63. Me 157 +9
64. Losing 155  
65. Men 152 -13
66. Fizzy drinks 151  
67. Porn 147 +4
68. Lint 147 -11
69. Trophies 144  
70. Tumblr 144  
71. Desserts 142 -15
72. Chicken 140 +15
73. Pork 139 -3
74. Cake 132 +8
75. Tea 127 +19
76. Sarcasm 127 +14
77. Diet Coke 119 -16
78. Laziness 118 -13
79. Sleep 117 -6
80. Jesus 115 -4
81. College 111  
82. Internet 110 -46
83. Complaining 108 -9
84. Breathing 103  
85. Takeout 98  
86. Beef 98 -8
87. People 96 +11
88. New Year’s resolutions 96 +1
89. Him 94 -5
90. Tiger blood 92  
91. Makeup 91  
92. Juice 90 -7
93. Clothes 89  
94. My phone 88  
95. God 87 -15
96. Abstinence 85 -15
97. Stress 84  
98. Chipotle 82  
99. Myspace 81 -48
100. Eating out 81 -25

Image created using Wordle.

Presentation on Tweeting the Bible

Friday, March 26th, 2010

Here’s a presentation I just gave at the BibleTech 2010 conference about how people tweet the Bible:

Also: PowerPoint, PDF.

I distributed the following handout at the presentation, showing the popularity of Bible chapters and verses cited on Twitter. It displays a lot of data: darker chapters are more popular, the number in the middle of each box is the most popular verse in the chapter, and sparklines in each box show the distribution of the popularity in each chapter. (Genesis 1:1 is by far the most popular verse in Genesis 1, while Genesis 3:15 is only a little more popular than other verses in the chapter.)

The grid shows the popularity of chapters and verses in the Bible as cited on Twitter.