Blog RSS Feed

Archive for October, 2009

Top 100 Linguistic Indicators of Bible-Related Tweets

Sunday, October 25th, 2009

When people tweet about Bible verses on Twitter, what words do they use? Here are the top 100:

  1. bible
  2. lord
  3. christ
  4. gospel
  5. psalm
  6. god
  7. psalms
  8. corinthians
  9. preach
  10. shall
  11. heaven
  12. readings
  13. church
  14. spirit
  15. righteous
  16. verse
  17. lectionary
  18. verses
  19. spiritual
  20. ministry
  21. pray
  22. enemies
  23. thou
  24. tongue
  25. creation
  26. wisdom
  27. deuteronomy
  28. testament
  29. strength
  30. refuge
  31. therefore
  32. kingdom
  33. romans
  34. holy
  35. thankful
  36. thy
  37. reading
  38. rejoice
  39. understanding
  40. faithful
  41. message
  42. earth
  43. blessed
  44. exodus
  45. deut
  46. faith
  47. wise
  48. beginning
  49. pastor
  50. chapel
  51. chapter
  52. survey
  53. anger
  54. resurrection
  55. risen
  56. read
  57. hearts
  58. chronicles
  59. salvation
  60. flesh
  61. servant
  62. glory
  63. praying
  64. kings
  65. sheep
  66. praise
  67. trust
  68. prosperity
  69. bless
  70. heavens
  71. deeds
  72. toward
  73. discussion
  74. whoever
  75. speaks
  76. ye
  77. hath
  78. amen
  79. teaching
  80. thess
  81. apostles
  82. preparing
  83. eph
  84. eccl
  85. path
  86. fear
  87. upon
  88. presence
  89. inspire
  90. search
  91. zechariah
  92. seek
  93. teach
  94. wrath
  95. commandments
  96. believers
  97. humility
  98. spoke
  99. thee
  100. devo

Background

Extracting Bible references from text means identifying whether a given piece of text is referring to a Bible verse or something else. For example, the meaning of Acts 2 depends on context:

  • Referring to Bible passage: Acts 2 recounts the early church.
  • Not referring to Bible passage: She’s 5 years old but acts 2.

When you encounter a phrase that could be a Bible reference, you have to look at context to determine whether the phrase is a Bible reference. Humans can make this leap pretty easily, but computers need rigorous models and lots of training data to guess whether an ambiguous phrase is a Bible reference. In the above example, the phrase “early church” is a strong indicator that the phrase “Acts 2” is a Bible reference, while the phrase “years old” is an indicator the other way.

Twitter, with its high volume of content and decent search engine, provides lots of training data.

Methodology

Using the Twitter Search API, I downloaded 30,000 tweets possibly containing Bible references (e.g., [john 3], [jeremiah 29]) and then categorized them by hand as referring to a Bible verse or not.

I then ran a Naive Bayes algorithm on the resulting tweets to produce the above list, which contains the words that most strongly indicate the presence of a Bible reference.

This list suffers from sample bias, of course: a different set of tweets would produce a different list. In addition, the list is Twitter-centric; the results may not carry over into blogs or other media. (People substitute the number “2” for the word “to” and “4” for “for” on Twitter more frequently than they do elsewhere, for example, which oversamples content like “I’m meeting Matthew 4 dinner.”)

See It in Action

Search for Bible references on Twitter. Use the relevant and not relevant buttons to improve the filtering. I haven’t formally announced this new feature of OpenBible.info yet; consider the link a preview.