Secrecy & Googlebooks Ngram Viewer

On a lark, I’ve been playing with googlebooks Ngram viewer with the words secret, secrecy, transparency, publicity, conspiracy, as well as various combinations such as government secrecy, freedom of information, government transparency, top secret, and secrecy, transparency. I  toggled levels of smoothing with raw data and limited the searches to English fiction, American-English and British-English and the category of “English One Million,” which is defined as:

The “Google Million” are in English with dates ranging from 1500 to 2008. No more than about 6000 books were chosen from any one year, which means that all of the scanned books from early years are present, and books from later years are randomly sampled. The random samplings reflect the subject distributions for the year (so there are more computer books in 2000 than 1980). Books with low OCR quality were removed, and serials were removed.

Googlebooks Ngram Viewer searches the prevalence of specific words – trends, really, in publishing and social thought – through items included in googlebooks and popular magazines such as Life.  N-grams are defined as a “sequence of units drawn from a longer sequence; in the case of text, the unit in question is usually a character or a word.”  A shorter definition is simply “a sequence of variable characters that stands for a word or string of words in a corpus.” The Ngram Viewer offers a few revealing historical trends:

secrecy : no smoothing/American-English, year range =1800-2000 ~ notice the peaks in 1800 (Adams/Jefferson presidencies) and roughly 1830 (Adams/Jackson presidencies).

secrecy : smoothing of 10, years range=1800-2000 ~ one sees a greater emphasis on secrecy on the increase in the 1820s – the Madison,  Monroe, and John Q. Adams presidencies.

secrecy : smoothing of 10, but look what happens when the search is limited to the years 1800-1830 – secrecy begins to increase during the Madison administration and continues rising.  I’m supposing  that after the War of 1812, secrecy became an important tool for diplomacy and decisionmaking in American politics and as such, is reflected in the texts of the time.

A smoothing of 2 gives us a peak of the word secrecy in 1806 – the Jefferson presidency:

Switching the search to Google One Million shows a peak in secrecy during 1806, and in looking at googlebooks for the period 1805-1822, it’s clear that secrecy very much involves affairs of government both in the U.S.and England.

Searching American English, smoothing 0, 1946-1960,  the years of the Atomic Energy Act (1946), National Security Act (1947), amended Atomic Energy Act (1954),  and the Truman and Eisenhower administrations with various Executive Orders on national security, indicates a peak in secrecy during the year 1953. While materials from the the years 1947-1955 indicate discussion of atomic and military secrecy (it’s an interesting sidenote that the Kurt Wolff edition of Georg Simmel’s work was published in 1950), materials from 1953 are fascinating from a cultural perspective, especially in terms of the Eisenhower administration’s Atoms for Peace program.

Taking the Ngram patterns even further with the term conspiracy shows an increase around the time of the War of 1812. It’s  fun to see how conspiracy played out changing the dates to 1940-2000 and the smoothing (0) to see the trends of the word in texts.  This exercise suggests the McCarthy era and its spyhunt.  The notice the peak in the late 1960s!

The Ngram viewer is surely a wonderful tool for researchers to capture ideas and patterns through the publishing record of specific time periods. The Ngram is really a cultural snapshot. As the original Mr. Spock of Star Trek often remarked: fascinating.


Written by S.

October 31, 2011 at 8:41 pm