Culturomics, Google Books, and Campus Assets

Really interesting piece explaining a new field of ‘cultural’ research, culturomics, being done via quantitative analysis of 5 million books in Google Books!  Lots of problems with this approach, but the possibilities are also pretty amazing given the size of data set (4% of all books every printed). From Geoffrey Nunberg at Inside Higher Education.

Humanities scholars may someday count as a watershed the paper that appeared on Wednesday in Science, titled “Quantitative Analysis of Culture Using Millions of Digitized Books.” But they’ll have certain things to get past before they can appreciate that.

The paper describes some examples of quantitative analysis performed on what is by far the largest corpus ever assembled for humanities and social-science research. Culled from Google Books, it contains more than five million books published between 1800 and 2000—at a rough estimate, 4 percent of all books ever published—of which two-thirds are in English and the others distributed among Chinese, French, German, Hebrew, Russian, and Spanish. The English corpus alone contains some 360 billion words, a size that permits analyses on a scale that aren’t possible with collections like the Corpus of Historical American English, at Brigham Young University, which tops out at a mere 410 million words.

Not everyone will find these statistics bracing. A lot of scholars have reservations about studying literature en bloc, mindful of Seneca’s admonition that distrahit animum librorum multitudo, or loosely, “Too many books spoil the prof.” And they’re apprehensive about the prospect of turning literary scholarship into an engineering problem.

It is well worth reading the entire piece and also checking out some of the tools ( Ngrams.GoogleLabs.com and Culturomics.org). You can play around and search terms such as  ‘economics ‘ vs ‘entrepreneurship education’ ,  just as you would search Google itself.

While I am a qualitative analyst and see major limits to this kind of data mining, there are also huge yields available through this interplay between university resources (libraries, databases, professors, students) and entrepreneurial entities such as Google. Lets not forget, the birth of Google occurred when Larry Page met Sergey Brin on Stanford’s campus and applied academic ranking principles to web pages.

Counting on Google Books – The Chronicle Review – The Chronicle of Higher Education.

Leave a comment