Tag Archives: primary sources

Is Digitizing Historical Texts a Bad Idea (II)?

My previous post about digital historical text generated some very interesting comments, both here and on Twitter. I met with my students again last night and we had an extended discussion about those discussions, so thanks to everyone who chimed in. What follows is a summary, more or less, of our conversation last night.

We were particularly taken by Steve Ramsey’s critique of my post, especially the following paragraph:

If so, your problem clearly necessitates access to the original work. But if you are concerned merely to read it, it seems to me very hard to argue against a digital copy. And the truth is that even digital copies can rival the originals for problems that apparently involve the “thingness” of the thing. Scans of the Beowulf manuscript — which no responsible scholar should ever touch — are of such density that one can see the hills and valleys of the vellum. I’m unable to imagine what it is about scans of the War Papers that make the original “disappear from view” or resistant to prioritization as historical sources. Are you prepared to argue that Spencerian handwriting moves documents up and down the hierarchy of importance?

None of us was arguing that digitizing texts was, in and of itself, bad. We all agreed that access to the content of those texts was an unqualified good. And I’ve gone back into the original post and clarified my language about the War Department project, because the way I wrote one sentence made it sound as though I was unhappy with the scans of the documents (which are copies of the originals due to a fire that destroyed the originals — see the project page for more on this issue).

Nevertheless, we all agreed that as historians, we care about the “thingness” of the source, and we care a lot about that. Not because of some “thinly veiled nostalgia” for the thing itself, but because texts are both texts and historical artifacts and so students of the past need access to that thingness if they are to understand both aspects of the source — it’s content and its materiality.

The importance of the text itself is pretty obvious and so doesn’t need clarification. But the materiality does. We discussed, for instance, the problems posed in teaching using historical newspapers via a database like ProQuest Historical Newspapers. The ProQuest search delivers the story requested abstracted from the page that it appeared on. The full page is available as well, but unless students are taught what a newspaper is, how the arrangement of content on the page and its placement in a section is the result of a dynamic process involving editors, writers, and layout staff, they will have no sense for why the placement of that story matters sometimes as much as the content of the story itself. “Above the fold” or “below the fold” become meaningless when a database serves up only the story.

ProQuest at least returns a pdf of the original story, so students can see the type face and (often but not always) the images that went along with the story. And they can examine the headline and consider why a headline might be more sensational than the content of the story warrants — again, as a result of that dynamic process involving several actors I just described.

As for the hierarchy we assign to sources, we also agreed that sometimes we might just assign a different importance to a source based on things other than the words in the text — that all sorts of other factors, most of them material, might convince us that this or that source was of greater import. Knowing everything about the source — not just its words, but the marginalia, its placement in a collection, or where it was found — can all shed potentially important light on what the source means and meant to others at the time it was created or later.

Given all of that, we wanted some sort of best practices for digitizers, that included common standards about such things as images of the original to go with the plain text on a white screen. As Sarah Werner wrote in her comment, creating such standards will require historians, bibliographers, archivists, and technologists to get together and discuss, among other things, what they (and our students) aren’t seeing when all we get is black pixels on a white screen.

Is Digitizing Historical Texts a Bad Idea?

Several years ago I took a group of Mason students to Prague, Vienna, and Budapest. Among the things I’d planned for them was a visit to the Klementinum in Prague where the Codex Gigas (the “Devil’s Bible“) was on display. Needless to say, when I told them we were going to a library to look at a book, they were decidedly underwhelmed. Until they saw it up close and personal.

Codex_Gigas_devilAt 90cm x 50cm and weighing in at 75 pounds, it’s quite a book and was unlike anything they had seen or expected. More intriguing to them, though, was the legend surrounding the work. Created sometime between 1200 and 1230 in a monastery in Bohemia, the story that goes with the bible is that the devil himself helped a monk create it in just one night. In exchange, the monk included an image of the devil as part of the text decoration. Despite their earlier reluctance to go look at a book, the students pronounced the whole thing kind of cool.

I was reminded of that trip the other night during a tutorial I’m leading with four of our most talented doctoral students. One of those four, Jeri Wieringa, asked one of those questions students ask us with regularity that makes us think really hard. I’ll paraphrase what she asked: “If we digitize texts and present them to students as just so many pixels, are they losing an essential connection to the text as a historical artifact?”

This question led to an energetic discussion around our table. On the one hand, there are obvious advantages to digitizing texts. At the most obvious, the texts, especially those before the age of the typewriter, become much more legible and so therefore accessible to a wide audience. Anyone who has taught pre-typewriter texts knows just how reluctant students can be when it comes to trying to make sense of handwriting from back in the day. Even excellent tutorials like the one on decoding Martha Ballard’s diary can reinforce the notion that such handwriting is essentially unreadable except by experts or code breakers.

A second obvious advantage is that the text becomes fully searchable in ways that it can’t be when it is just an image of a document. Our Papers of the War Department project here at RRCHNM is a great example of the advantages of having transcribed texts to sort through and analyze using the text analysis algorithm of your choice.

Finally, making the text available in this way opens up any digitized collection to crawling by the various search engines, thereby opening up the collection to a much larger audience.

But, and this was the but that we got stuck on in our discussion, the artifact itself can disappear from the view of the researcher if an image of the original is not also available to the researcher. We really liked the War Department project because that image is there for users to see any time they want. [NB: I edited this paragraph because in the original, my wording made it sound as though images weren’t available on the War Department site.]

To put it another way, the coolness of the text as artifact disappears when all the researcher/student sees is black pixels on a white screen. Yes, it’s much more readable and accessible. But there is a bigger potential problem–and this is the one that really troubled Jeri. An essential task of the historian is to assign greater or lesser value to a particular historical source based on his/her growing expertise in a given subject. Some documents are just more important to a given problem or interpretation than others and it’s up to us to help others see that.

But if all documents are reduced to black pixels on a white screen, they start to seem all the same. Given that students/novice historians often have a difficult time placing sources in a hierarchy of importance that they are developing, if all texts look the same, are we making it more difficult for them to develop this skill of prioritizing some sources over others?

We arrived at no answer in our conversation and despite two weeks of ruminating on the issue, I still don’t have one. I’m just going to have to worry about this one for a while longer.