[This post appeared originally on the collaborative blog hist.net.]
This summer the Center for History and New Media at George Mason University will begin work on a two-year study of the potential of text-mining tools for historical (and by extension, humanities) scholarship. The project, entitled “Scholarship in the Age of Abundance: Enhancing Historical Research With Text-Mining and Analysis Tools,” aims to determine how historians might begin to take advantage of the incredible abundance of historical content now available in on-line databases.
Many millions of original sources (texts, images, etc.) have now been placed online in these databases, but historians have yet to figure out how to work effectively with such vast quantities of information. Ironically, more and more historians are finding themselves overwhelmed by the abundance of digital sources. As a result, no one has yet figured out how to access potential new insights about the past that may lurk in these databases or in the intersections between them.
Smaller efforts, like those of programming historian Bill Turkel at the University of Western Ontario, have yielded very interesting preliminary results. The CHNM project intends to expand on work like Turkel’s and the MONK project to determine what historians need on a grander scale. Funded by the National Endowment for the Humanities, this project will include a variety of research endeavors, including focus groups with historians who will be asked to test the efficacy of various text mining methods in their research.