while searching for the next book to read, i stumbled upon google books ngram viewer / it will search key phrases in the corpus of “google books” <books that google has scanned & included in its library> & graph the percentage occurrence of the phrases over a time period which you can specify <as far back as 1800!> \ it will also list down the books that were published in ranges of years for those phrases <this was my purpose> / super awesome tool! you can also use your own dataset
what is n-gram? from wikipedia:
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus.
An n-gram model is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n – 1)–order Markov model. n-gram models are now widely used in probability, communication theory, computational linguistics (for instance, statistical natural language processing), computational biology (for instance, biological sequence analysis), and data compression.
here’s the fun part / you can make inferences about how the popularity <or current thinking about> of ideas, subjects, people etc. etc. etc. has changed over time \ pretty much like “trending topics” / here’s what i trended <click images to enlarge>
ngram music: got unrelated books with genre keywords such as blues, bluegrass, rock <but jazz is distinct> | ngram sports <“art of swimming” by benjamin franklin 1810>
ngram concerns <heaven & hell have caught up> | ngram isms: find your most hated ism <capitalism rules, at least in books>