Richard White’s excellent “What is Spatial History?” argues convincingly that digital (specifically spatial) history isn’t just a technique for making pretty representations of research, but rather
It is a means of doing research; it generates questions that might otherwise go unasked, it reveals historical relations that might otherwise go unnoticed, and it undermines, or substantiates, stories upon which we build our own versions of the past.
However, as I discussed in my last post, I have some concerns about how scholars (and scholars-in-training) learn these new techniques: not just how to do them, but how to do them well, as experts.
Take me, for example. I’m a reasonably well-educated person, with a good academic background in the humanities and strong computing skills, and yet when I encounter some of these discussions of digital humanities techniques, I really struggle to understand what’s going on. Neither my undergrad days nor my graduate studies have trained me in statistical analysis and/or software, nor have they given me reason to do any programming. Does breaking into digital humanities work (to build something, as Stephen Ramsay demands) mean either spending my (very limited) free time teaching myself new skills, or does it require even more years of schooling? If I only work on part of a project, am I contributing substantively, or am I myself just a tool?
When Ben Schmidt says “most humanists who do what I’ve just done—blindly throwing data into MALLET—won’t be able to give the results the pushback they deserve” he is warning us that these tools are, like other research methods, fallible. Yet when we draw specious or simply wrong conclusions in traditional historical research, we can catch ourselves, or our advisors/peer reviewers/editors can catch us. Megan Brett encourages us: “Don’t be afraid to fail or to get bad results, because those will help you find the settings which give you good results.”
But how do we learn what a bad result looks like? Some are obvious, but the really tricky ones buried in the data may be invisible to the undertrained eye – and everyone in this field seems to have an undertrained eye.
Furthermore, there’s an old computer science cliché: Garbage In, Garbage Out. In other words, no matter how elegant your program, if you put in bad data (or bad rules, or bad metadata), your output is going to be crap.
This article on data analysis techniques used to research the 1918 flu epidemic demonstrates the difficulty of getting good data into a digital system:
Human beings recognize tone. Algorithms are better suited to sifting through data in search of keywords—like “influenza” and “kissing.” But “when we see a word or something being highlighted with an algorithm, we don’t know what it means,” says Mr. Ramakrishnan.
Mr. Ewing came armed with a set of “tone categories” to focus on: Were newspaper reports alarming, reassuring, factual? The group talked through the analysis that members wanted to do. “Our goal was to mimic it in an algorithm,” Mr. Ramakrishnan says.
How do you make ‘tone’ mathematically analyzable? This isn’t a rhetorical question; these kinds of syntactical and semantic classifications underlie good text analysis (as Brett points out, to do topic modeling, you have to prepare your textual corpus for analysis and already have a good understanding of what’s in it). If you’re working with a programmer instead of programming things yourself, will important content get missed? If you do the programming, will important analysis get missed? After all, encoding “aboutness” is hard – just ask any library cataloguer or metadata librarian (like me!). Encoding the nuance of human language is even harder.
Digital humanities and digital history is a growing field for a reason; when these tools work, and when they’re used well, they can give us insights into the human experience that are simply unreachable with traditional methods. But the skills needed to use these tools well are not innate to the humanist’s methods of analysis. (Nor to the computer scientist, the statistician, or the librarian, for that matter.) If we as scholars want to realize the full potential of the digital humanities, we’re going to have to stretch our brains even further than our PhD studies and our traditional research already ask us to.
 Ben Schmidt, “When You Have a MALLET, Everything Looks Like a Nail,” Sapping Attention (November 2, 2012), http://sappingattention.blogspot.com/2012/11/when-you-have-mallet-everything-looks.html
 Megan Brett, “Topic Modeling: A Basic Introduction,” Journal of Digital Humanities 2 (Winter 2012), http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/
 Jennifer Howard, “Big-Data Project on 1918 Flu Reflects Key Role of Humanists,” Chronicle of Higher Education, February 27, 2015, http://chronicle.com/article/Big-Data-Project-on-1918-Flu/190457/