{"id":742,"date":"2013-03-16T00:00:54","date_gmt":"2013-03-16T04:00:54","guid":{"rendered":"http:\/\/www.openbible.info\/blog\/?p=742"},"modified":"2017-12-29T11:33:10","modified_gmt":"2017-12-29T15:33:10","slug":"how-to-train-your-franken-bible","status":"publish","type":"post","link":"https:\/\/www.openbible.info\/blog\/2013\/03\/how-to-train-your-franken-bible\/","title":{"rendered":"How to Train Your Franken-Bible"},"content":{"rendered":"<p>This is the outline of a talk that I gave earlier today at the <a href=\"http:\/\/www.bibletechconference.com\/\">BibleTech 2013<\/a> conference.<\/p>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide2.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li>There are two parts to this talk\n<ul>\n<li>Inevitability of algorithmic translations. An algorithmic translation (or \u201cFranken-Bible\u201d) means a translation that\u2019s at least partly done by a computer.<\/li>\n<li>How to produce one with current technology.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide4.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.6990709254359257\">\n<li dir=\"ltr\">The number of English translations has grown over the past five centuries and has accelerated recently. This growth mirrors overall trend in publishing as costs have diminished and publishers have proliferated.<\/li>\n<li dir=\"ltr\">I argue that this trend of ever-more translations will not only continue but will accelerate, that we\u2019re heading for a post-translation world, where the number of English translations becomes so high, and the market so fragmented, that Bible translations as distinct identities will have much less meaning than they do today.\n<ul>\n<li dir=\"ltr\">This trend isn\u2019t specific to Bibles, but Bible translations participate in the larger shift to more diverse and fragmented cultural expressions.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide5.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">Like other media, Bible translations are subject to a variety of pressures.\n<ul>\n<li dir=\"ltr\">Linguistic (e.g., as English becomes more gender-inclusive, pressure rises on Bible translations to also become more inclusive).<\/li>\n<li dir=\"ltr\">Academic (discoveries that shed new light on existing translations). My favorite example is <a href=\"http:\/\/www.biblegateway.com\/verse\/en\/Matthew%2017:15\">Matthew 17:15<\/a>, where some older translations say, \u201cMy son is a lunatic,\u201d while newer translations say, \u201cMy son has epilepsy.\u201d<\/li>\n<li dir=\"ltr\">Theological \/ doctrinal (conform to certain understandings or agendas).<\/li>\n<li dir=\"ltr\">Social (decline in public religious influence leads to a loss of both shared stories and religious vocabulary).<\/li>\n<li dir=\"ltr\">Moral (pressure to address whatever the pressing issue of the day is).<\/li>\n<li dir=\"ltr\">Institutional (internally, where Bible translators want control over their translations; and externally, with the wider, increasing distrust of institutions).<\/li>\n<li dir=\"ltr\">Market. Bible translators need to make enough money (directly or indirectly) to sustain translation operations.<\/li>\n<li dir=\"ltr\">Technological. Technological pressure increases the variability and intensity of other pressures and is the main one we\u2019ll be investigating today.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide6.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">If you\u2019re familiar with Clayton Christensen\u2019s <a href=\"https:\/\/www.amazon.com\/Innovators-Dilemma-Technologies-Cause-Great\/dp\/0875845851\/\"><em>The Innovator\u2019s Dilemma<\/em><\/a>, you know that the basic premise is that existing dominant companies are eventually eclipsed by newer companies who release an inferior product at low prices. The dominant companies are happy to let the new company operate in this low-margin area, since they prefer to focus on their higher-margin businesses. The new company then steadily encroaches on the territory previously staked out by the existing companies until the existing companies have a much-diminished business. Eventually the formerly upstart company becomes the incumbent, and the cycle begins again.\n<ul>\n<li dir=\"ltr\">One of the main drivers of this disruption is technology, where a technology that\u2019s vastly inferior to existing methods in terms of quality comes at a much lower price and eventually supersedes existing methods.<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">I argue that English Bible translation is ripe for disruption, and that this disruption will take the form of large numbers of specialized translations that are, from the point of view of Bible translators, vastly inferior. But they\u2019ll be prolific and easy to produce and will eventually supplant existing modern translations.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide7.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">For an analogy, let\u2019s look at book publishing (or any media, really, like news, music, or movies. But book publishing is what I\u2019m most familiar with, so it\u2019s what I\u2019ll talk about). In the past twenty years, it\u2019s gone through two major disruptions.\n<ul>\n<li dir=\"ltr\">The first is a disruption in distribution, with Amazon.com and other web retailers. National bookstore chains consolidated or folded as they struggled to figure out how to compete with the lower prices and wider selection offered online.<\/li>\n<li dir=\"ltr\">This change hurt existing retailers but didn\u2019t really affect the way that content creators like publishers and authors did business. From their perspective, selling through Amazon isn\u2019t that different from selling through Barnes &amp; Noble.<\/li>\n<li dir=\"ltr\">The second change is more disruptive to content creators: this change is the switch away from print books to ebooks. At first, this change seems more like a difference in degree rather than in kind. Again, from a publisher\u2019s perspective, it seems like selling an ebook through Amazon is just a more-convenient way of selling a print book through Amazon.<\/li>\n<li dir=\"ltr\">But ebooks actually allow whole new businesses to emerge.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide8.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li dir=\"ltr\">I\u2019d argue that these are the main functions that publishers serve for authors&#8211;in other words, why would I, as an author, want a publisher to publish my book in exchange for some small cut of the profit?\n<ul>\n<li dir=\"ltr\">Gatekeeping (by publishing a book, they\u2019re saying it\u2019s worth your time and has a certain level of quality: it&#8217;s been edited and vetted).<\/li>\n<li dir=\"ltr\">Marketing (making sure that people know about and buy books that are interesting to them).<\/li>\n<li dir=\"ltr\">Distribution (historically, shipping print books to bookstores).<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">Ebooks most-obviously remove the distribution pillar of this model&#8211;when producing and distributing an epub only involves a few clicks, it\u2019s hard to argue that a publisher is adding a whole lot of value in distribution.<\/li>\n<li dir=\"ltr\">That leaves gatekeeping and marketing, which I\u2019ll return to later in the context of Bibles.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide9.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">But beyond just affecting these pillars, ebooks also allow new kinds of products:\n<ul>\n<li dir=\"ltr\">First, a wider variety of content becomes more economically viable&#8211;content that\u2019s too long or too short to print as a book can work great as an ebook, for example.<\/li>\n<li dir=\"ltr\">Second, self-publishing becomes more attractive: when traditional publishing shuns you because, say, your book is terrible, just go direct and let the market decide just how terrible it is. And if no one will buy it, you can always give it away&#8211;it\u2019s not like it\u2019s costing you anything.<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">So ebooks primarily allow large numbers of low-quality, low-priced books into the market, which fits the definition of disruption we talked about earlier.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide10.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">Let\u2019s talk specifically about Bible translations.<\/li>\n<li>Traditionally, Bible translations have been expensive endeavors, involving teams of dozens of people working over several years.\n<ul>\n<li dir=\"ltr\">The result is a high-quality product that conforms to the translation\u2019s intended purpose.<\/li>\n<li dir=\"ltr\">In return for this high level of quality, Bible publishers charge money to, at a minimum, recoup their costs.<\/li>\n<\/ul>\n<\/li>\n<li>What would happen if we applied the lessons from the ongoing disruption in book publishing to Bible translations?\n<ul>\n<li dir=\"ltr\">First, like Amazon and physical bookstores, we disrupt distribution.<\/li>\n<li dir=\"ltr\">Bible Gateway in 1993 on the web first disrupted distribution by letting people browse translations for free.<\/li>\n<li dir=\"ltr\">YouVersion in 2010 on mobile then took that disruption a step further by letting people download and own translations for free.<\/li>\n<li dir=\"ltr\">But we\u2019re really only talking about a disruption in distribution here. Just like with print books, this type of disruption doesn\u2019t affect the core Bible-translation process.<\/li>\n<\/ul>\n<\/li>\n<li>That second type of disruption is still to come and will eventually arrive; I\u2019m going to argue that, as with ebooks, the disruption to translations themselves will be largely technological and will result in an explosion of new translations.\n<ul>\n<li dir=\"ltr\">I believe that this disruption will take the form of partially algorithmic personalized Bible translations, or Franken-Bibles.<\/li>\n<li dir=\"ltr\">Because Bible translation is a specialized skill, these Franken-Bibles won\u2019t arise from scratch&#8211;instead, they\u2019ll build on existing translations and will be tailored to a particular audience&#8211;either a single individual or a group of people.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide11.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li>In its simplest form, a Franken-Bible could involve swapping out a footnote reading for the reading that\u2019s in the main text.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide12.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li dir=\"ltr\">A typical Bible has around 1100 footnotes indicating alternate translations. What if a translation allowed you to decide which reading you preferred&#8211;either by setting a policy (for example, you might, say, \u201calways translate <em>adelphoi<\/em> in a particular way\u201d) or by deciding on a case-by-case basis.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide13.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">By including footnote variants at all, translations have already set themselves up for this approach. Some translations go even further&#8211;the Amplified and Expanded Bibles embrace variants by embedding them right in the text. Here we see a more-extensive version of the same idea.<\/li>\n<li dir=\"ltr\">But that\u2019s the simplest approach. A more-radical approach, from a translation-integrity perspective, would allow people to edit the text of the translation itself, not merely to choose among pre-approved alternatives at given points.\n<ul>\n<li dir=\"ltr\">In many ways, the pastor who says in a sermon, \u201cThis verse might better be translated as&#8230;\u201d is already doing this; it just isn\u2019t propagated back into the translation.<\/li>\n<li dir=\"ltr\">People also do this on Twitter all the time, where they alter a few words in a verse to make it more meaningful to them.<\/li>\n<li dir=\"ltr\">The risk here, of course, is that people will twist the text beyond all recognition, either through incompetence or malice.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide14.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">That risk brings us back to one of the other two functions that publishers serve: gatekeeping.\n<ul>\n<li dir=\"ltr\">A translation is nothing if not gatekeeping: a group of people have gotten together and declared, \u201cThis is what Scripture says. These are our translation principles, and here\u2019s where we stand in relation to other translations.\u201d<\/li>\n<li dir=\"ltr\">What happens to gatekeeping&#8211;to the seal of authority and trust&#8211;when anyone can change the text to suit themselves?<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide15.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">In other words, what happens if the \u201cWord of God\u201d becomes the \u201cWiki of God?\u201d\n<ul>\n<li dir=\"ltr\">After all, people who aren\u2019t interested in translating their own Bible still want to be able to trust that they\u2019re reading an accurate, or at least non-heretical, translation.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide16.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">I suggest that new axes of trust will form. Whereas today a translation \u201cbrand\u201d&#8211;NIV, ESV, or whatever&#8211;carries certain trust signals, those signals will shift to new parties, and in particular to groups that people already trust.\n<ul>\n<li dir=\"ltr\">The question of whom you\u2019d trust to steward a Bible translation probably isn\u2019t that different from whomever you already trust theologically, a group that probably includes some of the following:\n<ul>\n<li dir=\"ltr\">Social network.<\/li>\n<li dir=\"ltr\">Teachers or elders in your church.<\/li>\n<li dir=\"ltr\">Pastors in your church.<\/li>\n<li dir=\"ltr\">Your denomination.<\/li>\n<li dir=\"ltr\">More indirectly, a megachurch pastor you trust.<\/li>\n<li dir=\"ltr\">A parachurch organization or other nonprofit.<\/li>\n<li dir=\"ltr\">Maybe even a corporation.<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">The point is not that trust in a given translation would go away, but rather that trust would become more networked, complicated, and fragmented, before eventually solidifying.\n<ul>\n<li dir=\"ltr\">We already see this happening somewhat with existing Bible translations, where certain groups declare some translations as OK and others as questionable. Like your choice of cellphone, the translation you use becomes an indicator of group identity.<\/li>\n<li dir=\"ltr\">The proliferation of translations will allow these groups who are already issuing imprimaturs to go a step further and advance whole translations that fit their viewpoints.<\/li>\n<li dir=\"ltr\">In other words, they\u2019ll be able to act as their own Magisteriums. Their own self-published Magisteriums.<\/li>\n<li dir=\"ltr\">This, by the way, also addresses the third function that publishers serve: marketing. By associating a translation with a group you already identify with, you reduce the need for marketing.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide17.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">I said earlier that I think technology will bring about this situation, but there are a couple ways it could happen.\n<ul>\n<li dir=\"ltr\">First, an existing modern translation could open itself up to such modifications. A church could say, \u201cI like this translation except for these ten verses.\u201d Or, \u201cThis translation is fine except that it should translate this Greek word this way instead of this other way.\u201d<\/li>\n<li dir=\"ltr\">Such a flexible translation could provide the tools to edit itself&#8211;with enough usage, it\u2019s possible that useful improvements could be incorporated back into the original translation.<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">A second alternative is to use technology to produce a new, cheap, low-quality translation with editing in mind from the get-go, to provide a base from which a monstrous hydra of translations can grow. Let\u2019s take a look at what such a hydra translation, or a Franken-Bible, could look like.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide18.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">The basic premise is this: there are around thirty modern, high-quality translations of the Bible into English. Can we combine these translations algorithmically into something that charts the possibility space of the original text?\n<ul>\n<li dir=\"ltr\">Bible translators already consult existing translations to explore nuances of meaning. What I propose is to consult these translations computationally and lay bare as many nuances as possible.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide19.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">You can explore the output of what I\u2019m about to discuss at <a href=\"http:\/\/www.adaptivebible.com\">www.adaptivebible.com<\/a>. I\u2019m going to talk about the process that went into creating the site.<\/li>\n<li dir=\"ltr\">This part of the talk is pretty technical.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide20.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li dir=\"ltr\">In all, there are fifteen steps, broken into two phases: alignment of existing translations and generation of a new translation.<\/li>\n<li dir=\"ltr\">It took about ten minutes of processor time for each verse to produce the result.<\/li>\n<li dir=\"ltr\">The total cost in server time on Amazon EC2 to translate the New Testament was about $10. Compared to the millions of dollars that a traditional translation costs, that\u2019s a big savings&#8211;five or six orders of magnitude.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide21.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li dir=\"ltr\">The first phase is alignment.<\/li>\n<li dir=\"ltr\">First step. Collect as many English translations as possible. Obviously there are copyright implications with doing that, so it\u2019s important to deal with only one verse at a time, which is something that all translations explicitly allow. For this project, we used around thirty translations.<\/li>\n<li dir=\"ltr\">Second. Normalize the text as much as possible. For this project, we\u2019re not interested in any formatting, for example, so we can deal with just the plain text.<\/li>\n<li dir=\"ltr\">Third. Tokenize the text and run basic linguistic analysis on it.\n<ul>\n<li dir=\"ltr\">Off-the-shelf open-source software from Stanford called <a href=\"http:\/\/nlp.stanford.edu\/software\/corenlp.shtml\">Stanford CoreNLP<\/a> tokenizes the text, identifies lemmas (base forms of words) and analyzes how words are related to each other syntactically.<\/li>\n<li dir=\"ltr\">In general, it\u2019s about 90% accurate, which is fine for our purposes; we\u2019ll be trying to enhance that accuracy later.<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">Fourth. Identify Wordnet similarities between translations.\n<ul>\n<li dir=\"ltr\"><a href=\"http:\/\/wordnet.princeton.edu\/\">Wordnet<\/a> is a giant database of word meanings that computers can understand.<\/li>\n<li dir=\"ltr\">We take the lemmas from the step 3 and identify how close in meaning they are to each other. The thinking is that even when translations use different words for the same underlying Greek word, the words they choose will at least be similar in meaning.<\/li>\n<li dir=\"ltr\">For this step, we used Python\u2019s <a href=\"http:\/\/nltk.org\/\">Natural Language Toolkit<\/a>.<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">Fifth. Run an off-the-shelf translation aligner.\n<ul>\n<li dir=\"ltr\">We used another open-source program called the <a href=\"http:\/\/code.google.com\/p\/berkeleyaligner\/\">Berkeley Aligner<\/a>, which is designed to use statistics to align content between different languages. But it works just as well for different translations of the same content in the same language. It takes anywhere from two to ten minutes for each verse to run.<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">Sixth. Consolidate all this data for future processing.\n<ul>\n<li dir=\"ltr\">By this point, we have around 4MB of data for each verse, so we consolidate it into a format that\u2019s easy for us to access in later steps.<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">Seventh. Run a machine-learning algorithm over the data to identify the best alignment between single words in each pair of translations.\n<ul>\n<li dir=\"ltr\">We used another Python module, <a href=\"http:\/\/scikit-learn.org\/stable\/\">scikit-learn<\/a>, to execute the algorithm.<\/li>\n<li dir=\"ltr\">In particular, we used <a href=\"http:\/\/scikit-learn.org\/dev\/modules\/generated\/sklearn.ensemble.RandomForestClassifier.html\">Random Forest<\/a>, which is a supervised-learning system. That means we need to feed it some data we know is good so that it can learn the patterns in the data.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide22.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li dir=\"ltr\">Where did we get this good data? We wrote a <a href=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/translations-with-friends.html\">simple drag-and-drop aligner<\/a> to feed the algorithm, where there are two lists of words and you drag them on top of each other if they match; it\u2019s actually kind of fun: if you juiced it up a little, I can totally see it becoming a game called \u201cTranslations with Friends.\u201d\n<ul>\n<li dir=\"ltr\">In total, we hand-aligned around 30 pairs of translations across 25 verses. There are about 8,000 verses in the New Testament, so it doesn\u2019t need a lot of training to get good results.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide23.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li dir=\"ltr\">What the algorithm actually runs on is a big vector matrix. These are the ten factors we included in our matrix.\n<ul>\n<li dir=\"ltr\">1. One translation might begin a verse with the words \u201cJesus said,\u201d while another might put that same phrase at the end of the verse. All things being equal, though, translations tend to put words in similar positions in the verse. When all else fails, it\u2019s worth taking position into account.<\/li>\n<li dir=\"ltr\">2. Similarly, even when translations rearrange words, they\u2019ll often keep them in the same sentence. Again, all things being equal, it\u2019s more likely that the same word will appear in the same sentence position across translations.<\/li>\n<li dir=\"ltr\">3. If we know that a particular word is in a prepositional phrase, for example, it\u2019s not unlikely that it will serve a similar grammatical role in another translation.<\/li>\n<li dir=\"ltr\">4. If words in different translations are both nouns or both verbs, it\u2019s more likely that they\u2019re translating the same word than if one\u2019s a noun and another\u2019s an adverb.<\/li>\n<li dir=\"ltr\">5. Here we use the output from the Berkeley Aligner we ran earlier. The aligner is bidirectional, so if we\u2019re comparing the word \u201cJesus\u201d in one translation with the word \u201che\u201d in another, we look both at what the Berkeley Aligner says \u201cJesus\u201d should line up with in one translation and with what \u201che\u201d should line up with in the other translation. It provides a fuller picture than just going in one direction.<\/li>\n<li dir=\"ltr\">6. Here we go more general. Even if the Berkeley Aligner didn\u2019t match up \u201cJesus\u201d and \u201che\u201d in the two translations we\u2019re currently looking at, if other translations use \u201che\u201d and the Aligner successfully aligned them with \u201cJesus\u201d, we want to take that into account.<\/li>\n<li dir=\"ltr\">7. This is similar to grammatical context but looks specifically at dependencies, which describe direct relationships between words. For example, if a word is the subject of a sentence in one translation, it\u2019s likely to be the subject of a sentence in another translation.<\/li>\n<li dir=\"ltr\">8. Wordnet similarity looks at the similarities we calculated earlier&#8211;words with similar meanings are more likely to reflect the same underlying words.<\/li>\n<li dir=\"ltr\">9. This step strips out all words that aren\u2019t nouns, pronouns, adjectives, verbs, and adverbs and compares their sequence&#8211;if a different word appears between two identical words across translations, there\u2019s a good chance that it means the same thing.<\/li>\n<li dir=\"ltr\">10. Finally, we look at any dependencies between major words; it\u2019s a coarser version of what we did in #7.<\/li>\n<li dir=\"ltr\">The end result a giant matrix of data&#8211;ten vectors for every word-combination in every translation in every verse&#8211;and we run our machine-learning algorithm on it, which produces an alignment between every word in every translation.<\/li>\n<li dir=\"ltr\">At this point, we\u2019ve generated between 50 and 250MB of data for every verse.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide24.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li>Eighth. Now that we have the direct alignment, we supplement it with indirect alignment data across translations. In other words, to reuse our earlier example, the alignment between two translations may not align \u201cJesus\u201d and \u201che,\u201d but alignments in other translations might strongly suggest that the two should be aligned.<\/li>\n<li>At this point, we have a reasonable alignment among all the translations. It\u2019s not perfect, but it doesn\u2019t have to be. Now we shift to the second phase: generating a range of possible translations from this data.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide25.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li dir=\"ltr\">First. Consolidate alignments into phrases, where we look for runs of parallel words. You can see that we\u2019re only looking at lemmas here&#8211;dealing with every word creates a lot of noise that doesn\u2019t add much value, so we ignore the less-important words. In this case, the first two have identical phrases even though the words differ slightly, while the third structures the sentence differently.<\/li>\n<li dir=\"ltr\">Second. Arrange translations into clusters based on how similar they are to each other structurally. In this example, the first two form a cluster, and the the third would be part of a different cluster.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide26.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li dir=\"ltr\">Third. Insert actual keyword text. I\u2019ve been using words in the examples I\u2019ve been giving, but in the actual program, we use numerical ids assigned to each word. Here we start to introduce actual words.<\/li>\n<li dir=\"ltr\">Fourth. Fill in the gaps between keywords. We add in small words like conjunctions and prepositions that are key to producing recognizable English.<\/li>\n<li dir=\"ltr\">Fifth. Add in punctuation. Up to this point, we\u2019ve been focusing on the commonalities among translations. Now we\u2019re starting to focus on differences to produce a polished output.<\/li>\n<li dir=\"ltr\">Sixth. Reduce the possibility space to accept only valid bigrams. \u201cBigrams\u201d just means two words in a row. We remove any two-word combinations that, based on our algorithm thus far, look like they should work but don\u2019t. We check each pair of words to see whether they exist anywhere in one of our source translations. If they don\u2019t, we get rid of them.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide27.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li>Seventh. Produce rendered output.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide28.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li dir=\"ltr\">In this case, the output is just for the <a href=\"http:\/\/www.adaptivebible.com\/\">Adaptive Bible website<\/a>. It shows the various translation possibilities for each verse.\n<ul>\n<li dir=\"ltr\">Hovering over a reading shows what the site thinks are valid next words based on what you\u2019re hovering over. (That\u2019s the yellow.)<\/li>\n<li dir=\"ltr\">You can click a particular reading if you think it\u2019s the best one, and the other readings disappear. Clicking again restores them. (That\u2019s the green.)<\/li>\n<li dir=\"ltr\">The website shows a single sentence structure that it thinks has the best chance of being valid, but most verses have multiple valid structures that we don\u2019t bother to show here.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide29.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">To consider a verse successfully translated, this process has to produce readings supported by two independent translation streams (e.g., having a reading supported only by ESV and RSV doesn\u2019t count because ESV is derived from RSV).\n<ul>\n<li dir=\"ltr\">Using this metric, the process I\u2019ve described produces valid output for 96% of verses in the New Testament.<\/li>\n<li dir=\"ltr\">On the current version of adaptivebible.com, I use stricter criteria, so only 91% of verses show up.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide30.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">Limitations\n<ul>\n<li dir=\"ltr\">Just because a verse passes the test, that doesn\u2019t mean it\u2019s actually grammatical, and it certainly doesn\u2019t mean that every alternative presented within a verse is valid.<\/li>\n<li dir=\"ltr\">Because we use bigrams for validity, we can get into situations like what you see here, where all these are valid bigrams, but the result (&#8220;Jesus said, &#8216;Be healed,&#8217; Jesus said&#8221;) is ridiculous.<\/li>\n<li dir=\"ltr\">There\u2019s no handling of inter-verse transitions; even if a verse is totally valid, it may not read smoothly into the next verse.<\/li>\n<li dir=\"ltr\">Since we removed all formatting at the beginning of the process, there\u2019s no formatting.<\/li>\n<\/ul>\n<\/li>\n<li dir=\"ltr\">Despite those limitations, the process produced a couple of mildly interesting byproducts.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide31.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li>Probabilistic Strongs-to-Wordnet sense alignment. Given a single Strong\u2019s alignment and a variety of translations, we can explore the semantic range of a Strong\u2019s number. Here we have <em>dunamis<\/em>. This seems like a reasonably good approximation of its definition in English.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide32.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul>\n<li>Identifying translation similarity. This slide explores how structurally similar translations are to each other, based on the phrase clusters we produced. The results are pretty much what I\u2019d expect: translations that are derived from each other tend to be similar to each other.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" alt=\"\" src=\"https:\/\/a.openbible.info\/blog\/2013-03-bibletech\/slide33.png\" width=\"800\" height=\"600\" \/><\/p>\n<ul id=\"internal-source-marker_0.2742449094670464\">\n<li dir=\"ltr\">What I\u2019ve just described is one pretty basic approach to what I think is inevitable: the explosion of translations into Franken-Bibles as technology gets better. In the future, we won\u2019t be talking about particular translations anymore but rather about trust networks.<\/li>\n<li dir=\"ltr\">To be clear, I\u2019m not saying that I think this development is a particularly great one for the church, and it\u2019s definitely not good for existing Bible translations. But I do think it\u2019s only a matter of time until Franken-Bibles arrive. At first they\u2019ll be unwieldy and ridiculously bad, but over time they\u2019ll adapt, improve, and will need to be taken seriously.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>This is the outline of a talk that I gave earlier today at the BibleTech 2013 conference. There are two parts to this talk Inevitability of algorithmic translations. An algorithmic translation (or \u201cFranken-Bible\u201d) means a translation that\u2019s at least partly done by a computer. How to produce one with current technology. The number of English [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[11,17,6],"tags":[],"_links":{"self":[{"href":"https:\/\/www.openbible.info\/blog\/wp-json\/wp\/v2\/posts\/742"}],"collection":[{"href":"https:\/\/www.openbible.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.openbible.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.openbible.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.openbible.info\/blog\/wp-json\/wp\/v2\/comments?post=742"}],"version-history":[{"count":9,"href":"https:\/\/www.openbible.info\/blog\/wp-json\/wp\/v2\/posts\/742\/revisions"}],"predecessor-version":[{"id":1413,"href":"https:\/\/www.openbible.info\/blog\/wp-json\/wp\/v2\/posts\/742\/revisions\/1413"}],"wp:attachment":[{"href":"https:\/\/www.openbible.info\/blog\/wp-json\/wp\/v2\/media?parent=742"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.openbible.info\/blog\/wp-json\/wp\/v2\/categories?post=742"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.openbible.info\/blog\/wp-json\/wp\/v2\/tags?post=742"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}