Analyse the contents of over 25 million books in a matter of seconds
When I was at college, finding specific information involved a pretty low-tech and inefficient process. You had to scan through book titles on a standalone computer (from my recollection you could at least search by keywords) to find the correct lookup code, then walk to the correct section of the library to physically track down the book. You then had to decide which particular words you were looking for and flick through the index to find pages where specific words were mentioned. Then (whilst carefully holding your finger in the relevant index pages) you needed to navigate to the page… and normally discover that the content is completely irrelevant. And repeat. For hours.
The world has changed a lot in that time, not least in how accessible, and searchable information has become. One amazing example of this I’ve just discovered is the Google Books N-gram Viewer, which enables you to instantly analyse word occurrences in a huge (at point of writing over 25 million) number of books they have digitised.
Since approximately 96.7%* of all dictionary lookups are for insults or profanities, I thought I’d use it to gain insight into the popularity of a few different terms over the years.
- Step 1: Head to the Google Books Ngram Viewer
- Step 2: Enter a few words or phrases in the search box
- Step 3: Tweak any other settings as you see fit
- Step 4: Try and think about what might be driving trends and big changes in word occurrences.
I am now intrigued, especially to find that:
- ‘Cockscomb’ is/was a very popular insult (I had no idea it existed as a word until about 2 hours ago in a discussion with somebody considerably more literary then me). It is defined as ‘the crest or comb of a domestic cock’, so I’m still in the dark about why this is considered insulting.
- ‘Halfwit’ only started gaining in popularity around the 1920’s, and for some reason peaked around the time of World War Two at the exact same time that ‘cockscomb’ experienced a significant trough. Were these authors using ‘halfwit’ as a direct replacement?! Why did ‘cockscomb’ experience such a big recovery just as ‘halfwit’ began to wane in popularity?
- Whilst ‘muppet’ usage really started to take off in the 1970s and 1980s (apparently they were created in 1955, but The Muppet Show debuted in 1976 and the movies came out in the 80's), ‘peak muppet’ was in 1994. What happened in 1994?!
I know this has many actually useful (and less facile) applications, however I do find this quite interesting. And it makes me laugh too, which is a good thing as far as I’m concerned.
*This number is a obviously a complete fabrication. However this article was written in May 2020, which is officially in the post-truth era (and deep in what will probably be considered to be the pre-post-certainty era too), so I make no apology.