Blog RSS Feed

Past as Probability: Rethinking the Bible Atlas for 2021

November 1st, 2021

Introduction

When Bob Pritchett asked me at the BibleTech 2019 conference what I was working on at the time, I answered that I’d returned to the idea that originally launched this site: Bible geography. When I told him that, I’d intended to present the results of this effort at a presumptive BibleTech 2021 conference. Since the global pandemic has put in-person conferences and travel on hold, however, here are 4,800 words about it instead of a presentation.

If you prefer to jump straight into the data, see the interface or browse the data.

What does rethinking the Bible Atlas mean?

This project is a Bible atlas (technically, a gazetteer) that (1) comprehensively identifies the possible modern locations of every place mentioned in the Bible as precisely as possible, (2) expresses a data-backed confidence level in each identification, and (3) links to open data to fit into a broader data ecosystem. The goal is to provide a baseline for future Bible geography projects to use.

In my original design document for this project, I have the following guiding principles; I’ll discuss their implementation below:

  1. Comprehensively reflect current scholarship.
  2. Use linked data.
  3. Be accurate and precise.
  4. Quantify uncertainty.
  5. Handle non-point data.
  6. Include media.
  7. Open the data.

Background

I started this site in 2007 with the idea of creating a place that embraced Google Earth as a way of exploring places mentioned in the Bible. I relied on freely available and older public-domain sources to disambiguate and identify modern locations of biblical places, and then I picked the location that looked likeliest to me. The 1899 Morrish Bible Dictionary proved especially helpful because it included latitude and longitude coordinates for many of its entries. Of course, it also reflects a late-nineteenth-century view of biblical and archaeological scholarship.

Over time, I became frustrated with the limited nature of the dataset; I wanted to incorporate modern scholarship, which meant one thing: I needed a budget.

Used Bible atlases are surprisingly cheap; you can find most of them on Amazon for around $10 each. Bible dictionaries and encyclopedias are surprisingly expensive: even digitally, they run $100 or more. Commentaries also add up because there are so many of them. Fortunately, I know people with commentary libraries that I could use, helping keep the cost down:

Four bookcases full of Bible reference works.
Look at all those commentaries.

In the end, my content budget for this project ran about $1,000, and I consulted what I believe to be every significant Bible atlas, dictionary, and encyclopedia published since 1980 (and several before then).

Two stacks of Bible atlases on the floor.
These are just the print atlases, excluding the ten-pound one I’m using to prop up my laptop. (Having now owned both print and digital atlases, I recommend print. Atlases on Kindle are the worst, especially when they have a non-hyperlinked index.)

Goal 1: Comprehensively reflect current scholarship

With this variety of sources—over seventy in all—it becomes possible to paint a full(ish) picture of where modern scholars believe biblical places may have been.

My basic method was simple:

  1. Start with a place mentioned in the Bible.
  2. Consult a dictionary/encyclopedia article, atlas/gazetteer entry, commentary note, or other reference work related to that place.
  3. Record every suggestion of possible locations for the place, including the confidence of the suggestion.
  4. Research coordinates for any modern locations mentioned.
  5. Repeat steps 2-4 across 70 or so sources.
  6. Repeat steps 1-5 across all 1,300 places mentioned in the Bible.

In this way, you end up with a set of “votes” from authors that, in principle, reflects the current consensus on where a place was. This approach also removes me from the position of deciding which identifications are more valid than others.

A typical Bible atlas makes around 500 identifications; a large dictionary or encyclopedia might suggest around 1,000. This project records over 3,200 identifications, tallying 23,000 votes across 1,340 ancient places tied to nearly 1,600 possible modern locations.

I use the word “identification” because a source might say that a particular place is another name for a different ancient place, or maybe not even a place at all, so not every vote is for a modern location. The ratio of modern-to-ancient suggestions generally runs around 2:1—in other words, there are usually around twice as many modern suggestions (identifying a specific modern location) as ancient suggestions (identifying a biblical place with another ancient place).

For the purposes of this project, authors have nineteen ways to propose an identification. I particularly want to highlight the different rhetorical strategies of “confidence” vs. “authority.” About 82% of the time, authors will express their own opinion that a certain identification is likely or unlikely. The rest of the time, they’ll appeal to the prevailing consensus, use the passive voice to avoid expressing their own view, or note that someone else has made an identification.

Here’s the schema I used to model this rhetoric.

CategoryKeywordCommon words used
ConfidenceYesis, undoubtedly, no doubt, confirmed, is to be identified with, convincingly identified, surely, corresponds to, has been fixed to, no good reason to question, almost universal agreement, is represented by, should be connected with
Likelyprobably, likely, gazetteer entry without qualification, plausibly identified, evidently, fairly certain, quite possibly, popular, attractive, strongly suggest, suits well, serious candidate, seems to fit, presumably, may well be, strong evidence
Most likelymost likely, most probable, best fit, most suitable, strongest candidate, best current suggestion, least objectionable, best identified, best explained
Map(coordinates unambiguously match on a map)
Possiblepossibly, may, (gazetteer entry with qualification, e.g., “?”), tentatively, preliminarily, provisionally, some scholars, apparently, sought at
Unlikelyunlikely, much less likely, doubtful, tenuous, dubious, problematic
Nonot, reject, little to commend it, rule out, abandon, unacceptable, out of the question, impossible, unthinkable
IdentifiedIs identifiedis identified, is believed, is associated with, now identified, is identifiable
Has been identifiedhas been identified, has been inferred, has been linked with
Identified (as an adjective)identified, suggest, associated with
AuthorityOldformerly, previously
Parallelparallel passage
Preservedname has been preserved, name is preserved, name survives
Scholar(another scholar is cited by name)
Traditionaltraditionally, historically, (if the tradition predates the nineteenth century)
Usuallygenerally, usually, most scholars, many scholars, most often, most, customarily, commonly, consensus, gained favor, is thought to be, prevailing identification
Variantcertain translations read
UnknownUnknownUnknown
UncertainUncertain
A screenshot of Google Earth shows hundreds of placemarks near the Dead Sea.
The resulting KML file has about 7,100 placemarks in Google Earth.

Goal 2: Use Linked Data

A modern Bible atlas doesn’t exist in a data desert; it participates in a network of scholarship both focused on the Bible and also on general history. Linked Data is a way to map data between different datasets—a way to participate computationally in this network of scholarship. This project contains 6,536 Linked Data connections for the ancient places mentioned in the Bible (though, to be fair 1,228 of them are to the 2007 version of this dataset).

About half of the places mentioned in the Bible have a Wikidata entry; I also bring in other ontologies —notably the Bible-oriented Logos Factbook and TIPNR—and Pleiades, the broadly historically focused “community-built gazetteer and graph of ancient places.”

But linking ancient places reflects only half the story: modern locations also have identifiers, and usually coordinates associated with an identifier. Here again I draw on Wikidata but also on modern gazetteers like GeoNames and OpenStreetMap.

In all, 6,473 Linked Data connections support the coordinates for modern locations.

The following are my most-cited sources: Wikidata, Digital Archaeological Atlas of the Holy Land, GeoNames, OpenStreetMap, Amud Anan, Palestine Open Maps, MEGAJordan, Israel Antiquities Authority, and Cultnat.

Satellite view of the area around Mount Carmel overlaid with dots showing photo and article locations.
The WikiShootMe tool was especially helpful for this project. Here’s what it looks like around Mount Carmel. The green dots represent Wikidata items with photos, the red dots are Wikidata items without photos, the blue dots are photos without Wikidata items, the yellow dots are Wikipedia articles, and the purple dots are links to other datasets.

Goal 3: Be accurate and precise

Most reference works, when they provide coordinates at all, use a 1 km (0.6 mile) resolution—in other words, they get you within 1 km of the actual coordinates of what you’re looking for. Often they just say that a modern location is “near” another one or provide a rough distance and (if you’re lucky) a direction from a larger city. Sometimes, if they’re drawing from nineteenth-century travelogues, a source might say that a location is a certain number of hours (by foot or horse) along a road between two other cities, or they just provide a name with no indication of where it is. The point is that they’re usually (but not always!) accurate—they provide a general sense of where a location is. Unfortunately, they’re not usually precise—telling you exactly where it is.

Fortunately, with modern, public data and gazetteers powered in part by local knowledge, we can do better. Digital sources often provide precise coordinates for ruins visible on the ground. In this dataset, 17% of locations have coordinates with 10 m, 57% within 100 m, 87% within 250 m (matching a modern settlement), and 97% within 1 km. All locations are as precise as I can find support for; some places in Jerusalem (e.g., the Millo) potentially have specific archaeological remains inside the city with exact coordinates.

Google Maps satellite view of Tel Abel Beth Maacah.
About 190 modern locations, such as Tel Abel Beth Maacah above, even have geometry (provided by OpenStreetMap) that outlines the extent of ruins, illustrating both point and polygon precision.

Additionally, I’ve provided over 1,200 comments on how I came to decide on the coordinates for certain locations (the hard ones to find, or ones with seemingly contradictory sources) so that you have a starting point to validate my decisions beyond the Linked Data. Finding coordinates for modern locations represents the bulk of the time I spent on this project; many of these locations took hours of research to locate, and I cite over 400 different sources.

Goal 4: Quantify uncertainty

Even after this research, however, it wasn’t always possible to find exact coordinates for modern locations. Therefore, I also provide my estimate of how close the coordinates in this dataset are to the actual coordinates. Sometimes the best-available coordinates aren’t exactly where they should be because archaeological datasets predate GPS, or the datasets just don’t try to be more precise.

On the other hand, while modern locations have positional uncertainty, ancient places come with their own set of uncertainties.

About 150 ancient identifications are described as “near” somewhere else, which I attempt to quantify based on context (in some cases, “near” might mean 1 km, and in others it might mean 10 km or more).

But more significantly, scholars just aren’t sure where many biblical places are. Since a goal of this project is to document all the possibilities, the question then becomes how to decide which identifications are most likely. My solution, which has substantial room for improvement, involves looking at the change in confidence over time.

I assign a numerical score for each confidence level (e.g., a “Yes” vote from a source counts for 30 points, while a “Likely” vote counts for 24), sum the scores from each source, normalize the scores (where 1,000 represents near-certainty), and then group them together into decades. In theory, the resulting composite score reflects the confidence of scholarship for a particular identification during that decade. Then, if you draw a best-fit (linear regression) line across the decades, you can see where the scholarship is trending. The number I use for determining confidence levels reflects the value of this best-fit line in 2020.

When we look at a chart expressing the data for Gath of the Philistines below, for example, the trendlines largely reflect what you’d expect if you’re familiar with the controversies over its identification: around the 1950s, it was identified with Tel Erani, but later archaeological discoveries have shifted the identification definitively away from there and likely to Tell es-Safi. Other good examples of shifting confidences are Ai and Anaharath.

A line chart show Tell es Safi gaining steady confidence over time.

The main problem with this approach is that a line doesn’t necessarily model the data well, 77% of the time, the r-squared value is less than 0.5, meaning that confidence doesn’t get consistently higher or lower with time but rather jumps around. See Abel-keramim, for example.

There are other options for generating confidence. For example, I could sum the overall confidences over time so that the lines move higher as time goes on. However, because the number of sources varies by decade, the lines jump in ways that don’t necessarily reflect the underlying confidence levels. I decided that using line charts was the least-misleading way to approach the data, imperfect though it may be.

Goal 5: Handle non-point data

This vote-based approach to modeling uncertainty also extends to regional geographic data. Regions in the past didn’t necessarily have the sharp political boundaries we think of today, and they varied over time as their territory waxed and waned. Thus, while it’s nice to present tidy historical boundaries on a map, reality was messier. We can still try to quantify the messiness, however.

I chose to approach the problem of regions not from a historical or political perspective but from a cartographic one. Instead of answering the question, “What are the boundaries of this region?” I instead answer the question, “Where should I put a label for this region on the map?” And it turns out that this latter question is answerable using a methodology like the one I used for place identifications.

I took about 70 different Bible atlases and other reference works and recorded polygons roughly where the region labels or boundaries appeared. (This approach involved a lot of interpretive looseness on my part: I wanted to emphasize the commonalities among the different sources.) Here’s what the labels for the region of Ammon look like, for example:

A satellite image shows polygons around modern Amman.

Then I combined the polygons from the different sources to create a confidence heatmap: for example, “four sources suggest this point was in the region, but only three sources support this point over here.” The resulting concave-hull isobands (with each line indicating the number of sources supporting the position of the region, so inner isobands reflect a higher confidence) yields useful information about not only where to put a label but also the rough—very rough—extent of a region. The following isobands for Ammon, for example, draw from sixteen sources and show its core territory around Rabbah, as you’d expect. The outermost isoband reflects confidence levels from two sources, while the inner reflects confidence from ten or more. Thus, if you were looking to place a label on this map for “Ammon,” your best bet would be in the innermost polygon, but anywhere inside the isobands would reflect a consensus placement.

A satellite view shows isobands around modern Amman, Jordan.

Ultimately, I created 3,545 polygons to generate 77 isobands files. Not every ancient region is controversial enough to justify isobands; this dataset contains 238 region files in total.

In addition to regions, there’s also path data, especially for rivers and wadis. The previous version of this dataset treated them as points, but OpenStreetMap already had paths for many of the relevant rivers, and when the OSM data didn’t already exist, I created it there. In total, about 120 paths have geometry in this dataset; I made 91 edits to OSM as part of this project.

For example, here are the paths of possible identifications for the Brook of Egypt (for display, I reduce all paths to around 100 segments, but the dataset also contains the full geometry from OSM):

A satellite view shows four possible identifications for the biblical Brook of Egypt.

Goal 6: Include media

The final major component of this dataset is the inclusion of an image to represent every modern location: there are about 1,650 512×512-pixel “thumbnail” images that are freely available to use. Just over 1,000 of them are drawn from Wikimedia Commons, while the remaining have a satellite image (with a resolution of 10 meters per pixel) illustrating the area surrounding the location. Honestly, Wikimedia Commons had more images than I was expecting: many of these locations are pretty obscure.

About 50 images illustrate an ancient region. (For example, the best thumbnail for the region of Gilead probably isn’t one of an archaeological site that happens to be in the region.)

Locations with Wikidata items often had images associated with them. In other cases, the WikiShootMe tool provided a way to find untagged images of the location (or sometimes an artifact associated with the location). Once I had a viable image, I cropped it into a square and potentially adjusted the colors or used Photoshop’s Content-Aware Fill tool to edit it for consistency and to be more legible at small sizes. The dataset also records license and attribution data as presented on the Wikimedia Commons page for each image.

To create the satellite images, I used Sentinel-2 composites from 2019. Sentinel-2 is a European satellite; its 10-meter-per-pixel resolution is just enough to illustrate the general character of a location (e.g., desert vs. wooded, hilly vs. flat). The resolution is lower than what you see on Google Maps but is freely reusable.

All images have descriptive alt text since accessibility is a modern application of Christian charity. The images themselves also have accessibility and license and attribution data embedded in them.

I struggled most with finding images to illustrate regions. Below, for example, is what I chose for the Sharon Plain; it shows that the region is coastal, flat, and agriculturally fertile, but depicting a whole region using a single image will necessarily be reductive.

A view looking east across the Sharon Plain.
Credit: Ori~ (modified).

Goal 7: Open the data

All the data is available in a GitHub repository with a 7,000-word readme describing the (unfortunately complex) data structure. My hope is for you to do something interesting with it. It’s under a CC-BY license, though the images are under a variety of Wikimedia-approved free licenses (as noted in the metadata), and the OpenStreetMap data is under a CC-BY-SA-style ODbL license. GitHub isn’t ideal for data projects like this (though it does have an integrated GeoJSON viewer), but I wanted a stable, long-term host that doesn’t make you jump through hoops to get the data.

The dataset currently contains files for (the totals may change over time):

  1. Ancient places (1,342 entries).
  2. Modern locations (1,596 entries).
  3. Geometry (6,621 GeoJSON and KML files) and metadata (588 entries for non-point data) for both ancient places and modern locations.
  4. Images (1,650 jpegs) and image metadata (2,424 entries—there’s metadata for images considered but not included, as well as pointers to some copyrighted images, especially Google Street View).
  5. Sources consulted (442 entries).
  6. JSON Schemas to validate the rest of the data (5 schemas).

Beyond the 175 MB of images, you’ll find 100 MB of geometry data and 16 MB of core data.

The only data omitted from the repository is the raw vote data (“Anchor Yale Bible Dictionary found this particular identification to be ‘likely’”) because I want to supplement—not replace—my sources; if you want to find out how Anchor Yale Bible Dictionary identified a place, the data shows you where you can look it up in your own copy. This project transforms and interprets raw data, bringing together different perspectives and uniting different datasets, both in print and online. To save you time in your research, I want to point you to scholarly sources where you can learn more about the arguments for various identifications.

Limitations

This dataset has a number of limitations:

It builds on the work I did in 2007 to disambiguate places mentioned in the Bible. While I reworked many of the places (e.g., I didn’t realize in 2007 that there’s more than one Rabbah), I didn’t revisit the disambiguations from scratch.

The disambiguations also showcase a limitation to the data model. If two places in the Bible share a name, but my sources disagree about which one a certain verse is referring to, then I created a third place with identifications pointing to the other two. There aren’t really three possible places; the third one exists only because the data model requires it. For example, Ibzan’s hometown of Bethlehem 3 is likely Bethlehem 2 or possibly Bethlehem 1, not a third Bethlehem somwhere.

Similarly, some sources may consider a place the same as another one, while other sources may mention the modern location associated with the other place but not make the ancient connection explicit. I recorded the identification as described in the sources, but it leads to situations where the same modern location appears in multiple identifications, as part of both “another name for an ancient place” and “at this modern location” identifications. For example, Timnah 3 has Khirbet et Tabbaneh both as a possible identification for Timnah 2 and as a separate, modern identification.

While I’m confident that the spelling of biblical place names correctly reflects the text of the translations they appear in, spellings of modern locations are less validated. They could easily contain typos (where the typos exist in a source, I note it). They also reflect a variety of transliteration approaches from the nineteenth century to today. The primary modern name depends on whether it’s unique in the dataset, not whether it reflects the dominant name used in the sources. (For example, Bethsaida could be at at-Tell while Ai could be at et-Tell, names that reflect different transliteration approaches from Arabic into English of the same underlying name.) Additionally, I don’t know Hebrew or Arabic, so the transliterated diacritics I preserved could be wrong. I focused on simpler transliterations and thus didn’t usually preserve some more scholarly aspects of transliterations like breathing marks.

River geometry is always a single path, even if the river has branches. This limitation is most apparent on large rivers like the Nile, whose geometry doesn’t reflect the delta as it approaches the Mediterranean.

The identifications depend on my reading comprehension, interpretation, and conversion of freeform text into structured data, not to mention my limited sources and research abilities. The identifications don’t have the kinds of checks and balances a more-formal research project would entail. My goal was 90% accuracy.

Originally, I’d hoped to provide a wiki-style interface to the data, where people could contribute. But the data is so interrelated—it relies on a 45-step build process from raw spreadsheet to final form—that such functionality went beyond the scope of what I could accomplish. The identifiers also rely on static data snapshots and don’t incorporate improvements to, say, Wikidata that have appeared since I first pulled the data from there. The result is less dynamic than I’d originally hoped.

Advancements over the current state of the world

This project attempts to advance the state of the art along the following dimensions:

  1. Number of identifications. With 23,000 votes for the locations of ancient places, this project reflects a broad (if heavily evangelical) survey of current thinking. And with 3,259 distinct identifications, it has around six times as many identifications as the typical Bible atlas. However, Bible atlases provide context and interpretation that this project lacks; I’m not trying to replace a Bible atlas, only to provide supporting data.
  2. Precision of coordinates. With 87% of coordinates identified to within 250 m, the precision in this project outshines the best-case 1 km precision (often just a prose description) found in most Bible reference works.
  3. Number of Bible translations. Ten Bible translations reflect a variety of spellings and translation approaches, reducing reliance on the ESV (compared to the 2007 version of this dataset). The NLT, for example, likes to use place names in verses where other translations instead might say something like “there.” In total, 5,616 verses contain at least one place name, with 87,988 total name instances tracked across all ten translations.
  4. Integration with external resources. Linking to existing databases like Wikidata creates an outward focus and enables future work, with 6,536 ancient connections and 6,473 modern connections.
  5. Images. Every modern location has a thumbnail image, with 1,014 of the 1,650 photos being an on-the-ground, close aerial, or artifact photo, and the remainder being a medium-resolution satellite image.
  6. Transparency of data. With sources cited and data available, you should have what you need to trace my conclusions if you have the relevant sources available to you. (I tried to include links to online sources where possible.) I provide 1,251 comments providing more information in difficult cases.

Benefits to you

If a place is mentioned in the Protestant Bible, it should be in this dataset. You can look it up in the interface by passage reference or name and quickly get a sense of where modern scholars think it could be, as well as how confident they are in the identification. Often the possibilities cluster tightly (Penuel, for example, is probably on one of two neighboring hills, even though one is more likely than the other); if you only need to know an approximate location (which, unless you’re doing in-depth study, is almost certainly all you need), then you’re set. But if you want to know more about the possible locations and how confident my sources are in their identifications, that data is also available to you. I want to encourage you to purchase print or digital books that you find helpful to deepen your study of Bible geography; the dataset includes links to Amazon and other sites like Best Commentaries, sites whose stable urls also provide a further kind of Linked Data.

Interface

The data went live on this site in January 2021, and an interface that showcases it went live in September 2021. Looking at how people use the site has led to some data changes. For example, when I first launched, I didn’t provide proposed locations of the Garden of Eden, just calling it “unknown.” But after seeing how often people look it up, I treated it like any other place and provided potential identifications.

The 2007 version of the interface emphasized downloading Google Earth KMLs for individual books and chapters. For this update, I want to encourage exploration of the data even if you don’t have Google Earth. Therefore, I’ve changed the interface to let you interact with the data in five ways:

1. Book/chapter view

This view lets you see a bunch of places at once so that you can get a sense of how geography fits into the narrative. It includes:

  • A map of all the places mentioned in the book or chapter.
  • A way to drill down into a single chapter if you’re looking at a whole book.
  • A clickable list of all the places (including verse references and potential modern identifications) to explore on another view or on the map on the same page.
A map of places in Lamentations, along with the features described above.

2. Ancient place view

This view presents data about ancient places mentioned in the Bible. It includes:

  • A map of possible identifications, with placemark pushpins varying in opacity depending on the confidence of the identification.
  • Any alternate names found in Bible translations.
  • Geo data (KML and GeoJSON) for the identifications.
  • All the identifications and their corresponding confidence levels.
  • Bible verses where the place appears.
  • Linked Data identifiers.
  • Sources supporting the identifications.
  • A graph of confidence trends over time.
  • Links to similarly named places (especially since the numbering system I use—“Aphek 1,” “Apkeh 2,” etc.—is fairly opaque).
  • A thumbnail image when the identification is noncontroversial.
The interface for Adullam.

3. Modern location view

This view presents data about modern locations identified with ancient places. It includes:

  • A map showing geometry (point, path, polygon, or isobands), including polygon geometry of an archaeological site if it’s available (e.g., Colossae).
  • Alternate names (usually different transliterations but sometimes names in different languages, such as Hebrew or Arabic).
  • Latitude and longitude coordinates.
  • Palestine 1928 grid coordinates, where relevant; most Bible reference works use this coordinate system.
  • Coordinate precision (how close to the actual location the coordinates are).
  • Sources supporting the coordinates.
  • Accuracy and precision notes.
  • Biblical places associated with the location.
  • A thumbnail image.
The interface for modern Khirbet Id el Minya.

4. Atlas view

This view is an alphabetical list of all the ancient places. It includes:

  • A thumbnail image of the most-likely modern location.
  • A list of possible identifications.
  • Bible verses where the place appears.
Alphabetical entries starting at Zaanan.

5. Photo view

This view is mostly for legacy compatibility. Previously, I used the Flickr and the now-defunct Panoramio APIs to pull in photos that are geographically near the coordinates, but the result was hit-or-miss. Now there’s a single photo for each modern location associated with an ancient place. I wouldn’t include this view except for historical reasons; I don’t consider it very useful.

Preview of photos of possible identifications of Kedesh-naphtali.

Conclusion

It’s 2021. Bible reference works shouldn’t have to say that a ruin is, for example, “on the left side of the road from Salt to Amman, 1.5 hours [by horse] northwest of Amman.” And then you shouldn’t have to spend three hours figuring out exactly where that is, consulting the original 1822 travelogue that makes this claim, and then trying to determine exactly where the road ran in 1822 compared to modern roads. Instead, we should be able to say that that a ruin is at particular coordinates with a certain level of precision. And then we should be able to tie that ruin back to a biblical location with a quantifiable degree of confidence. This project lets us do that.

Environmental impact

Creating this data required about 800 kWh of electricity in computing power and local travel. All the electricity used was solar-generated; carbon emissions totaled about 40 kg. Book shipping generated an additional 103 kg of carbon emissions. The total carbon impact for this project is therefore approximately 143 kg; I don’t have tools at my disposal to be more precise than that. I include this statement just to note that data, even Bible data, has an environmental impact.

Next steps

Browse the interface or get the data.

What Twitterers Are Giving up for Lent (2021 Edition)

February 20th, 2021
A word cloud from wordart.com features "Twitter," "Alcohol," "Social Networking," "Chocolate," and "Meat" most prominently.

This year, the usual trio of Twitter, alcohol, and social networking led the list. But current events influenced the list this year more than most. Radio personality Rush Limbaugh‘s death on Ash Wednesday led many to comment that they were giving him up for Lent, landing him at #7. A major winter storm (#62, snow and #77, winter) in Texas (#34) knocked out electricity (#5), water (#35), and heat (#68) for millions of people. Finally, the global pandemic (#23, covid) elevated topics like going to the pub (#24) and suppressed others, like eating out (#75). Zoom (#63), a videoconferencing service, appeared for the first time. “Executive orders” (#53) refers to a satirical tweet about U.S. President Biden.

This year’s list draws from 24,312 tweets out of 557,560 total tweets mentioning Lent.

COVID-19

The global coronavirus pandemic led to jumps in several topics, most notably “going to the pub,” which was almost as popular as “COVID,” and was followed closely by food pickup and delivery.

COVID + the pandemic is at the top, followed by going to the pub, takeout + doordash + uber eats, masks, and lockdown.

Social Media

In the battle of the social networks, TikTok has almost overtaken Instagram, while Snapchat fell below Tinder. Facebook, not shown on this chart because it makes the chart harder to read, was once dominant and is now only just above Instagram.

Instagram is just ahead of TikTok, which are far ahead of Tinder and Snapchat.

Fast Food

Fast food is down across the board.

Chick-fil-a is down but retains its lead, followed by Chipotle, McDonald's, Dunkin, and Taco Bell.

Top 103 Things Twitterers Gave Up for Lent in 2021

1.Twitter911+2
2.Alcohol895-1
3.Social networking687-1
4.Lent515+2
5.Chocolate4760
6.Meat337-2
7.Rush Limbaugh306 
8.Swearing250+3
9.Men242+4
10.Sex225-2
11.Coffee214-4
12.Sweets213-2
13.Giving up things188+14
14.Catholicism185+9
15.Electricity178+99
16.Soda174-7
17.Fast food167-5
18.Religion154+6
19.School140-2
20.Marijuana1380
21.Sugar136+1
22.Chips123-3
23.Covid116 
24.Going to the pub115 
25.Bread115-7
26.Smoking114+5
27.Facebook113+3
28.You111-13
29.Work102-13
30.Life99-2
31.Instagram99-2
32.Tiktok96+1
33.Booze94+12
34.Texas92 
35.Water87+61
36.Masks85 
37.Online shopping85+24
38.Homework83+5
39.Hope82+15
40.Living80+36
41.Takeout77+26
42.Breathing75+17
43.Beer71-22
44.Wine71-4
45.Power61 
46.Candy60-12
47.Food59-1
48.Him59+6
49.Depression58-14
50.Shopping55+7
51.Lying53-3
52.Executive orders53 
53.Virginity51-28
54.Liquor51-16
55.Cookies51-2
56.Starbucks51-26
57.Red meat50-18
58.Christianity49+18
59.Junk food49-17
60.Procrastination490
61.Anxiety47-18
62.Snow46+47
63.Zoom44 
64.Rice44-14
65.College42-28
66.Sobriety41-8
67.Lockdown41 
68.Heat40+44
69.Carbs40-28
70.Pancakes40+3
71.Caffeine40-24
72.Donald Trump40-22
73.Masturbation39-24
74.Desserts38-3
75.Eating out38-22
76.Simping36-34
77.Winter36+28
78.Ice cream34-14
79.Lint34+1
80.God33+5
81.Cheese33-30
82.Negativity33-22
83.People32-34
84.Complaining32-28
85.Being alive32+19
86.Church31-16
87.Coke29-25
88.Pussy28-22
89.Boys28-57
90.Pizza28-29
91.Sleep28-8
92.My will to live28-11
93.Amazon27-1
94.Sarcasm26-5
95.Hate26+15
96.Snacking25-15
97.Sanity240
98.My job24-44
99.Cake23-22
100.Dairy22-56
101.The pandemic22 
102.Chick Fil A22-47
103.Baseball22+9

Top Categories

1.food3,419
2.technology2,305
3.smoking/drugs/alcohol1,613
4.irony1,341
5.habits1,198
6.relationship804
7.religion528
8.sex476
9.health/hygiene442
10.school/work414
11.celebrity313
12.politics242
13.weather213
14.shopping172
15.money125
16.entertainment106
17.sports48
18.clothes15
19.possessions12

Track in Real Time What People Are Giving Up for Lent in 2021

February 15th, 2021

See the top 100 things people are giving up for Lent in 2021 on Twitter, continually updated through February 19, 2021. You can also use the Historical Lent Tracker to see trends since 2009, though 2021 is still in flux, so I wouldn’t draw any conclusions about 2021 yet.

As I write this post, with about 593 tweets analyzed, perennial favorites “alcohol,” and “twitter,” and “social networking” lead the list. I expect to see tweets this year related to the global pandemic. And will TikTok make the top ten this year?

Look for the usual post-mortem on February 20, 2021.

What Twitterers Are Giving up for Lent (2020 Edition)

February 29th, 2020

This year's word cloud is from wordart.com because wordle.net no longer works for me.

This year alcohol topped the list for the first time since 2017, followed by social networking and Twitter. New to the top 100 this year are “trolling” (#14) and “being toxic” (#94), following Pope Francis’s call to give up online insults. Also new are “TikTok” (#33), “simping,” or acting obsequiously on TikTok (#41), “coronavirus” (#73) (related: shaking hands), and “the streets” (#95).

This year’s list draws from 35,817 tweets out of 540,684 total tweets mentioning Lent.

Plastic

Plastic has been appearing near the top of the list for the past two years as some churches, especially in the UK, have encouraged people to give it up for Lent. This year, mentions of plastic fell precipitously, suggesting that either giving it up has become less fashionable or that people inclined to give it up already did so over the past two years. In particular, “straws” received no mentions.

Plastic dropped from over 1% in 2019 to 0.1% in 2020.

Social Media

As noted above, TikTok is the big winner here.

Snapchat continues its decline.

Fast Food

Chick-fil-A continues its march upward, while McDonald’s continues its decline.

Domino's wasn't mentioned this year.

Top 100 Things Twitterers Gave Up for Lent in 2020

Rank Word Count Change from last year’s rank
1. Alcohol 1,533 +1
2. Social networking 1,236 -1
3. Twitter 1,191 0
4. Meat 570 +2
5. Chocolate 534 -1
6. Lent 468 -1
7. Coffee 444 +1
8. Sex 432 +2
9. Soda 426 0
10. Sweets 409 +2
11. Swearing 403 -4
12. Fast food 356 -1
13. Men 336 +1
14. Trolling 330 +106
15. You 294 +3
16. Work 288 -1
17. School 284 -4
18. Bread 254 -1
19. Chips 235 +5
20. Marijuana 214 +7
21. Beer 210 +2
22. Sugar 207 -2
23. Catholicism 202 -2
24. Religion 198 -7
25. Virginity 196 +7
26. Giving up things 177 -4
27. Life 165 -2
28. Instagram 156 +3
29. Facebook 156 -3
30. Starbucks 151 +6
31. Smoking 146 +3
32. Boys 138 -2
33. TikTok 131 +89
34. Candy 122 +1
35. Depression 119 +16
36. College 112 -20
37. Liquor 112 +22
38. Red meat 106 +5
39. Wine 105 +8
40. Carbs 104 +8
41. Simping 104 +82
42. Junk food 100 -5
43. Homework 100 -3
44. Anxiety 97 +9
45. Dairy 94 +11
46. Booze 93 +10
47. Food 92 -1
48. Fried food 92 +1
49. Caffeine 86 +15
50. Lying 85 -6
51. Masturbation 85 -1
52. People 84 +3
53. Donald Trump 84 -10
54. Rice 83 -14
55. Cheese 81 -13
56. Eating out 78 +3
57. Cookies 75 -3
58. Hope 75 -19
59. Him 75 -6
60. My job 73 -7
61. Chick Fil A 70 +10
62. Complaining 67 +10
63. Shopping 65 +1
64. Sobriety 64 +3
65. Breathing 62 -24
66. Negativity 62 -1
67. Procrastination 61 -10
68. Online shopping 61 0
69. Pizza 60 -15
70. Coke 59 +3
71. Feelings 58 +24
72. Ice cream 58 -19
73. Coronavirus 58
74. Porn 56 +3
75. Juice 54 +5
76. Boba 54 -15
77. Pussy 52 -3
78. Takeout 51 -15
79. Bills 48 -9
80. French fries 48 -10
81. Church 46 -8
82. Desserts 45 -10
83. Being gay 44 +2
84. Pancakes 44 -5
85. Being a jerk online 44
86. Being single 41 +4
87. Hot Cheetos 40 -13
88. Cheating 40 -6
89. Chicken 39 -8
90. Crying 39 -10
91. Living 39 -4
92. Christianity 39 -30
93. Gambling 39 +6
94. Being toxic 38 +29
95. The streets 38 +28
96. Cake 38 -9
97. Energy drinks 37 -2
98. TV 37 -13
99. Women 37 0
100. Pasta 36 -16

Top Categories

1. food 6,259
2. technology 3,125
3. smoking/drugs/alcohol 2,767
4. relationship 2,163
5. habits 1,882
6. irony 1,364
7. sex 1,019
8. school/work 882
9. religion 647
10. health/hygiene 352
11. money 259
12. entertainment 166
13. politics 162
14. shopping 160
15. sports 84
16. weather 18
17. celebrity 17
18. clothes 17
19. possessions 16

Media Coverage

The Lent Tracker received some media attention this year:

Track in Real Time What People Are Giving Up for Lent in 2020

February 24th, 2020

See the top 100 things people are giving up for Lent in 2020 on Twitter, continually updated until February 29, 2020. You can also use the Historical Lent Tracker to see trends since 2009, though 2020 is still in flux, so I wouldn’t draw any conclusions about 2020 yet.

As I write this post, with about 1,200 tweets analyzed, perennial favorites “social networking,” “alcohol,” and “twitter” lead the list. I’ve already learned a new word: simping, “the type of person who, instead of trying to attract the opposite sex through being attractive and interesting, is more sycophantic and fawning,” commonly on TikTok. It’s currently at #12, though I assume it will fall as more people start posting.

Look for the usual post-mortem on March 1, 2020.

Using Declassified Spy Satellite Photos to Enhance the Resolution of Bible Maps

November 15th, 2019

In previous posts, I talked about using a digital terrain model for high-resolution Bible maps and using AI to increase the resolution of satellite photos. In this post, I’ll talk about how you can use old black-and-white but high-resolution satellite photos to enhance lower-resolution modern satellite photos, converting this:

A ten-meter Sentinel-2 satellite photo near the Dead Sea.

to this:

The same image panchromatically sharpened to an approximate two-meter resolution.

In 1995, President Clinton declassified images taken by Corona spy satellites from 1959 to 1972. These satellites operated at a resolution of up to six feet (around two meters) per pixel, a big improvement over the ten-meter imagery that the current free-and-highest-resolution Sentinel-2 program provides. However, the high-resolution Corona imagery is black-and-white, while the lower-resolution Sentinel imagery is in color. What if it were possible to combine the two?

Not only is it possible–it’s a common practice called pansharpening that you often see (unknowingly) in satellite imagery. The Landsat 8 satellite, for example, takes color pictures at a thirty-meter resolution and black-and-white pictures at a 15-meter resolution; when you combine them, you get a fifteen-meter output.

So if you take the ten-meter Sentinel imagery and pansharpen it with two-meter Corona imagery, you get something like the above image. I combined these images by hand using GDAL Pansharpen; merging them at scale is a more-complicated problem. But others have worked on it: the Corona Atlas and Referencing System run by the Center for Advanced Spatial Technologies (CAST) at the University of Arkansas actually uses Corona imagery to assist in Middle East archaeology. They run an atlas that lets you explore the high-resolution imagery as though it were Google Maps. The imagery’s age is actually an asset for this purpose; urban and agricultural development throughout the Middle East in the last fifty years obscures some archaeological sites in modern satellite imagery. CAST has georeferenced many Corona images and makes the data available for noncommercial use. The GAIA lab at UCSD also makes georeferenced imagery available as part of their Digital Archaeological Atlas of the Holy Land.

Designing for Agency in Bible Study

April 13th, 2019

Here are the slides from a talk I gave today at the BibleTech conference in Seattle. Download the accompanying handout or explore the Expanded Bible interface mentioned in the presentation.

Read on Slideshare.

What Twitterers Are Giving up for Lent (2019 Edition)

March 9th, 2019
Social networking tops the list of what Twitterers are giving up for Lent in 2019.

This year social networking topped the list, as it did last year, followed by alcohol, Twitter, chocolate, and, ironically, Lent. Swearing fell to #7 this year from #5 last year. With the absence of a major political or social event, 2019 was a fairly typical year for what people said they would give up for Lent.

This year, 44,291 tweets (excluding retweets) specifically mentioned giving up something, up from last year’s 29,609. In all, this year’s analysis covers 491,069 tweets, up from 427,810 last year.

Plastic

Giving up plastic has become increasingly popular in the past two years. In all, 464 tweets this year mentioned plastic of some sort, which would almost bring it into the top ten.

Over 1% of tweets this year mentioned plastic.

Brexit

The one major political event occurring over Ash Wednesday involved the ongoing Brexit debate. When British Prime Minister Theresa May accepted a suggestion that British lawmakers give up the EU for Lent, it led others to tweet the opposite:

Tweets about leaving the EU and Brexit outnumber tweets about the EU.

Depression and Anxiety

It was a banner year for those who said they were giving up both:

Tweets about both depression and anxiety increased substantially this year.

Winter

Tweets about cold weather were up this year, as they are cyclically depending on the severity of winter weather:

Tweets about both cold weather last peaked in 2015.

Gossip

Pope Francis this year suggested giving up gossip for Lent, leading to an increase in the number of tweets about it:

Tweets about gossip reached a new high this year.

Relationships

Even though last year Ash Wednesday fell on Valentine’s Day, this year the percentage of people saying they were going to give up a significant other rose:

The generic 'love' fell overall, however.

Fast Food

Chick-fil-A finally surpassed McDonald’s this year, and Chipotle continues its decline:

Taco Bell could surpass McDonald's next year.

Other Updates from Last Year

Hot Cheetos finally declined. Smoking and Juuling both rose. Tide Pods look to be a one-year phenomenon, along with Fortnite. Snapchat dropped off a cliff.

Top 100 Things Twitterers Gave Up for Lent in 2019

1.Social networking1,5290
2.Alcohol1,498+1
3.Twitter1,409-1
4.Chocolate8180
5.Lent770+6
6.Meat6840
7.Swearing606-2
8.Coffee563+1
9.Soda561-1
10.Sex511+3
11.Fast food473-1
12.Sweets460-5
13.School414+2
14.Men374+6
15.Work367+11
16.College346+9
17.Religion346+15
18.Bread336-4
19.You3270
20.Plastic3120
21.Sugar2940
22.Catholicism290+15
23.Giving up things289+10
24.Beer274-6
25.Chips269-9
26.Life258+3
27.Facebook227-15
28.Marijuana224+3
29.Brexit212+47
30.Boys204-8
31.Instagram195-4
32.Virginity187+28
33.Smoking175+7
34.Candy161-11
35.Starbucks144-1
36.Junk food138-3
37.Hope128+22
38.Homework128+8
39.Rice127+8
40.Breathing125+32
41.Cheese122-5
42.Donald Trump122+18
43.Red meat121-8
44.Lying1180
45.Food113+24
46.Wine111-5
47.Carbs111-6
48.Winter110+48
49.Fried food109+1
50.Masturbation109+6
51.Gossip108+41
52.Depression105+26
53.Anxiety103+30
54.Ice cream103-4
55.My job103+26
56.Him98+10
57.Cookies98-5
58.Pizza96-20
59.People94-5
60.Dairy94-15
61.Booze92-13
62.Procrastination91-3
63.Single use plastic84-10
64.Eating out840
65.Liquor81-7
66.Juuling80-1
67.Boba79+4
68.Christianity76+15
69.Takeout75-10
70.Caffeine75-15
71.Shopping74-17
72.Negativity73-46
73.My will to live69-1
74.Sobriety68-16
75.Online shopping66-1
76.Bills64+19
77.French fries64-16
78.Lint63+6
79.Chick Fil A61-16
80.Complaining61-29
81.Sleep61-11
82.Desserts60-15
83.Church60-5
84.Coke59-16
85.Pussy59-6
86.Hot Cheetos56-25
87.Netflix55-25
88.God54-3
89.Porn53-24
90.Snapchat50-73
91.Stress50-3
92.Oxygen50+4
93.Spending50-3
94.Pancakes46-21
95.Crying46-1
96.Diet coke46-19
97.Juice45-15
98.Chicken44-19
99.Cheating43-19
100.F***boys43-43

Top Categories

This year, the top celebrity was BTS, a Korean boy band / all-consuming lifestyle.

1.food8,004
2.technology3,688
3.habits2,963
4.smoking/drugs/alcohol2,820
5.irony2,097
6.relationship1,800
7.school/work1,490
8.sex1,164
9.religion1,016
10.politics440
11.generic427
12.money353
13.health/hygiene348
14.entertainment224
15.shopping182
16.weather171
17.sports165
18.possessions54
19.celebrity24
20.clothes16

Media Coverage

The Lent Tracker received some media attention this year:

Track in Real Time What People Are Giving Up for Lent in 2019

March 4th, 2019

See the top 100 things people are giving up for Lent in 2019 on Twitter, continually updated until March 9, 2019. You can also use the Historical Lent Tracker to see trends since 2009, though 2019 is still in flux, so I wouldn’t draw any conclusions about 2019 yet.

As I write this post, with about 1,500 tweets analyzed, perennial favorites “social networking,” “alcohol,” and “twitter” lead the list. If I had to guess, with an unusually cold February across much of the U.S., weather might feature more prominently this year than last year, when Ash Wednesday coincided with Valentine’s Day.

Look for the usual post-mortem on March 10, 2019.

Using Machine Learning to Enhance the Resolution of Bible Maps

March 1st, 2019

In a previous post, I discussed how 3D software could improve the resolution of Bible maps by fractally enhancing a digital elevation model and then synthetically creating landcover. In this post I’ll look at how machine learning can increase the resolution of freely available satellite images to generate realistic-looking historical maps.

Acquiring satellite imagery

The European Sentinel-2 satellites take daily photos of much of the earth at a ten-meter optical resolution (i.e., one pixel represents a ten-meter square on the ground). The U.S. operates a similar system, Landsat 8, with a fifteen-meter resolution. Commercial vendors offer much higher-resolution imagery, similar to what you find in Google Maps, at a prohibitive cost (thousands of dollars). By contrast, both Sentinel-2 and Landsat are government-operated and have freely available imagery. Here’s a comparison of the two, zoomed in to level 16 (1.3 meters per pixel), or well above their actual resolution:

Sentinel-2 shows more washed-out colors at a higher resolution than Landsat 8.

The Sentinel-2 imagery looks sharper thanks to its higher resolution, though the processing to correct the color overexposes the light areas, in my opinion. Because I want to start with the sharpest imagery, for this post I’ll use Sentinel-2.

I use Sentinel Playground to find a scene that doesn’t have a lot of clouds and then download the L2A, or atmosphere- and color-corrected, imagery. If I were producing a large-scale map that involved stitching together multiple photos, I’d use something like Sen2Agri to create a mosaic of many images, or a “basemap” (as in Google Maps). (Doing so is complicated and beyond the scope of this post.)

I choose a fourteen-kilometer-wide scene from January 2018 showing a mix of developed and undeveloped land near the northwest corner of the Dead Sea at a resolution of ten meters per pixel. I lower the gamma to 0.5 so that the colors approximately match the colors in Google Maps to allow for easier comparisons.

The Sentinel-2 scene.

Increasing resolution

Enhance!” is a staple of crime dramas, where a technician magically increases the resolution of a photo to provide crucial evidence needed by the plot. Super-resolution doesn’t work as well in reality as it does in fiction, but machine learning algorithms have increased in their sophistication in the past two years, and I thought it would be worth seeing how they performed on satellite photos. Here’s a detail of the above image, as enlarged by four different algorithms, plus Google Maps as the “ground truth.”

Comparison of four different super-resolution algorithms plus Google Maps, as discussed in the following paragraphs.

Each algorithm increases the original resolution by four times, providing a theoretical resolution of 2.5 meters per pixel.

The first, “raw pixels,” is the simplest; each pixel in the original image now occupies sixteen pixels (4×4). It was instantaneous to produce.

The second, “Photoshop Preserve Details 2.0,” uses the machine-learning algorithm built into recent versions of Photoshop. This algorithm took a few seconds to run. Generated image (1 MB).

The third, ESRGAN as implemented in Runway, reflects a state-of-the-art super-resolution algorithm for photos, though it’s not optimized for satellite imagery. This algorithm took about a minute to run on a “cloud GPU.” Generated image (1 MB).

The fourth, Gigapixel, uses a proprietary algorithm to sharpen photos; it also isn’t optimized for satellite imagery. This algorithm took about an hour to run on a CPU. Generated image (6 MB).

The fifth, Google Maps, reflects actual high-resolution (my guess is around 3.7 meters per pixel) photography.

Discussion

To my eye, the Gigapixel enlargement looks sharpest; it plausibly adds detail, though I don’t think anyone would mistake it for an actual 2.5-meter resolution satellite photo.

The stock ESRGAN enlargement doesn’t look quite as good to me; however, in my opinion, ESRGAN offers a lot of potential if tweaked. The algorithm already shows promise in upscaling video-game textures–a use the algorithm’s creators didn’t envision–and I think that taking the existing model developed by the researchers and training it further on satellite photos could produce higher-quality images.

I didn’t test the one purpose-built satellite image super-resolution algorithm I found because it’s designed for much-higher-resolution (thirty-centimeter) input imagery.

Removing modern features

One problem with using satellite photos as the base for historical maps involves dealing with modern features: agriculture, cities, roads, etc., that weren’t around in the same form in the time period the historical map is depicting. Machine learning presents a solution for this problem, as well; Photoshop’s content-aware fill allows you to select an area of an image for Photoshop to plausibly fill in with similar content. For example, here’s the Gigapixel-enlarged image with human-created features removed by content-aware fill:

Modern features no longer appear in the image.

I made these edits by hand, but at scale you could use OpenStreetMap’s land-use data to mask candidate areas for content-aware replacement:

Data from OpenStreetMap shows roads, urban areas, farmland, etc.

Conclusion

If you want to work with satellite imagery to produce a high-resolution basemap for historical or Bible maps, then using machine learning both to sharpen them and to remove modern features could be a viable, if time-consuming, process. The image in this post covers about 100 square kilometers; modern Israel is over 20,000 square kilometers. And this scene contains a mostly undeveloped area; large-scale cities are harder to erase with content-aware fill because there’s less surrounding wilderness for the algorithm to work with. But if you’re willing to put in the work, the result could be a free, plausibly realistic, reasonably detailed map over which you can overlay your own data.