Is Digitizing Historical Texts a Bad Idea (II)?

My previous post about digital historical text generated some very interesting comments, both here and on Twitter. I met with my students again last night and we had an extended discussion about those discussions, so thanks to everyone who chimed in. What follows is a summary, more or less, of our conversation last night.

We were particularly taken by Steve Ramsey’s critique of my post, especially the following paragraph:

If so, your problem clearly necessitates access to the original work. But if you are concerned merely to read it, it seems to me very hard to argue against a digital copy. And the truth is that even digital copies can rival the originals for problems that apparently involve the “thingness” of the thing. Scans of the Beowulf manuscript — which no responsible scholar should ever touch — are of such density that one can see the hills and valleys of the vellum. I’m unable to imagine what it is about scans of the War Papers that make the original “disappear from view” or resistant to prioritization as historical sources. Are you prepared to argue that Spencerian handwriting moves documents up and down the hierarchy of importance?

None of us was arguing that digitizing texts was, in and of itself, bad. We all agreed that access to the content of those texts was an unqualified good. And I’ve gone back into the original post and clarified my language about the War Department project, because the way I wrote one sentence made it sound as though I was unhappy with the scans of the documents (which are copies of the originals due to a fire that destroyed the originals — see the project page for more on this issue).

Nevertheless, we all agreed that as historians, we care about the “thingness” of the source, and we care a lot about that. Not because of some “thinly veiled nostalgia” for the thing itself, but because texts are both texts and historical artifacts and so students of the past need access to that thingness if they are to understand both aspects of the source — it’s content and its materiality.

The importance of the text itself is pretty obvious and so doesn’t need clarification. But the materiality does. We discussed, for instance, the problems posed in teaching using historical newspapers via a database like ProQuest Historical Newspapers. The ProQuest search delivers the story requested abstracted from the page that it appeared on. The full page is available as well, but unless students are taught what a newspaper is, how the arrangement of content on the page and its placement in a section is the result of a dynamic process involving editors, writers, and layout staff, they will have no sense for why the placement of that story matters sometimes as much as the content of the story itself. “Above the fold” or “below the fold” become meaningless when a database serves up only the story.

ProQuest at least returns a pdf of the original story, so students can see the type face and (often but not always) the images that went along with the story. And they can examine the headline and consider why a headline might be more sensational than the content of the story warrants — again, as a result of that dynamic process involving several actors I just described.

As for the hierarchy we assign to sources, we also agreed that sometimes we might just assign a different importance to a source based on things other than the words in the text — that all sorts of other factors, most of them material, might convince us that this or that source was of greater import. Knowing everything about the source — not just its words, but the marginalia, its placement in a collection, or where it was found — can all shed potentially important light on what the source means and meant to others at the time it was created or later.

Given all of that, we wanted some sort of best practices for digitizers, that included common standards about such things as images of the original to go with the plain text on a white screen. As Sarah Werner wrote in her comment, creating such standards will require historians, bibliographers, archivists, and technologists to get together and discuss, among other things, what they (and our students) aren’t seeing when all we get is black pixels on a white screen.