Helpful Links for Tonight’s Text Encoding Lesson

Recommended text editors: Notepad++ , TextPad, TextWrangler (for Mac)

TEI Lite: http://www.tei-c.org/release/doc/tei-p5-exemplars/html/tei_lite.doc.html#faces

A transcribed letter fully encoded in XML format TEI: http://nzetc.victoria.ac.nz/tm/scholarly/tei-JCB-001.html

Letter from John Adams to Abigail Adams, 3 May 1789: http://www.masshist.org/digitaladams/archive/doc?id=L17890503ja&rec=sheet&archive=&hi=&numRecs=&query=&queryid=&start=&tag=&num=10&bc=/digitaladams/archive/browse/letters_1789_1796.php

Letter from John Adams to Abigail Adams, 1 May 1789: http://www.masshist.org/digitaladams/archive/doc?id=L17890501ja&rec=sheet&archive=&hi=&numRecs=&query=&queryid=&start=&tag=&num=10&bc=/digitaladams/archive/browse/letters_1789_1796.php

About the transcription of the Adams letters: http://www.masshist.org/digitaladams/archive/about/transdetail.php 

Civil War Love Letters: http://www.historyhappenshere.org/archives/7642 

Project Proposal: Mapping Chinatowns

One of the recurring themes of our semester so far has been the “why” of digital projects – yes, you can put a historical idea into a digital medium, but why would you? How does it change or enhance what you’re trying to discover or show about the topic in question?

map

As part of a research project for a course on the 19th century American West, I discovered that, beyond merely having a Chinatown (known locally as “Hop Alley”), St. Louis had had one of the earliest Chinese communities in a large American urban center not on the Pacific coast. Although I produced a digital map (above) with some simple color coding to try to represent the spread of the Chinese-American population, the lack of movement, interactivity, and clear tie between the map and other content in the research presentation, along with the complexity of the data, made the facts — and the importance of the facts — much less clear.

For my digital history project, therefore, I’m going to use the University of Virginia’s VisualEyes platform and editing tools to  turn my static map into something that can better show movement over time, and which will tie individual points together with images and descriptions of what was happening when.

Even though this follows a more traditional linear narrative (passage through time), like a research paper or chronology, my goal is that being able to watch a more literal representation of the spread of these communities will allow viewers to  see how the Chinese population in 19th century America moved not only with the railroads, but with other demographic and economic shifts.

stretching the humanities brain

Richard White’s excellent “What is Spatial History?” argues convincingly that digital (specifically spatial) history isn’t just a technique for making pretty representations of research, but rather

It is a means of doing research; it generates questions that might otherwise go unasked, it reveals historical relations that might otherwise go unnoticed, and it undermines, or substantiates, stories upon which we build our own versions of the past.

However, as I discussed in my last post, I have some concerns about how scholars (and scholars-in-training) learn these new techniques: not just how to do them, but how to do them well, as experts.

Take me, for example. I’m a reasonably well-educated person, with a good academic background in the humanities and strong computing skills, and yet when I encounter some of these discussions of digital humanities techniques, I really struggle to understand what’s going on. Neither my undergrad days nor my graduate studies have trained me in statistical analysis and/or software, nor have they given me reason to do any programming. Does breaking into digital humanities work (to build something, as Stephen Ramsay demands) mean either spending my (very limited) free time teaching myself new skills, or does it require even more years of schooling? If I only work on part of a project, am I contributing substantively, or am I myself just a tool?

When Ben Schmidt says “most humanists who do what I’ve just done—blindly throwing data into MALLET—won’t be able to give the results the pushback they deserve”[1] he is warning us that these tools are, like other research methods, fallible. Yet when we draw specious or simply wrong conclusions in traditional historical research, we can catch ourselves, or our advisors/peer reviewers/editors can catch us. Megan Brett encourages us: “Don’t be afraid to fail or to get bad results, because those will help you find the settings which give you good results.”[2]

But how do we learn what a bad result looks like? Some are obvious, but the really tricky ones buried in the data may be invisible to the undertrained eye – and everyone in this field seems to have an undertrained eye.

Furthermore, there’s an old computer science cliché: Garbage In, Garbage Out. In other words, no matter how elegant your program, if you put in bad data (or bad rules, or bad metadata), your output is going to be crap.

This article on data analysis techniques used to research the 1918 flu epidemic demonstrates the difficulty of getting good data into a digital system:

Human beings recognize tone. Algorithms are better suited to sifting through data in search of keywords—like “influenza” and “kissing.” But “when we see a word or something being highlighted with an algorithm, we don’t know what it means,” says Mr. Ramakrishnan.

Mr. Ewing came armed with a set of “tone categories” to focus on: Were newspaper reports alarming, reassuring, factual? The group talked through the analysis that members wanted to do. “Our goal was to mimic it in an algorithm,” Mr. Ramakrishnan says.[3]

How do you make ‘tone’ mathematically analyzable? This isn’t a rhetorical question; these kinds of syntactical and semantic classifications underlie good text analysis (as Brett points out, to do topic modeling, you have to prepare your textual corpus for analysis and already have a good understanding of what’s in it). If you’re working with a programmer instead of programming things yourself, will important content get missed? If you do the programming, will important analysis get missed? After all, encoding “aboutness” is hard – just ask any library cataloguer or metadata librarian (like me!). Encoding the nuance of human language is even harder.

Digital humanities and digital history is a growing field for a reason; when these tools work, and when they’re used well, they can give us insights into the human experience that are simply unreachable with traditional methods. But the skills needed to use these tools well are not innate to the humanist’s methods of analysis. (Nor to the computer scientist, the statistician, or the librarian, for that matter.) If we as scholars want to realize the full potential of the digital humanities, we’re going to have to stretch our brains even further than our PhD studies and our traditional research already ask us to.

[1] Ben Schmidt, “When You Have a MALLET, Everything Looks Like a Nail,” Sapping Attention (November 2, 2012), http://sappingattention.blogspot.com/2012/11/when-you-have-mallet-everything-looks.html

[2] Megan Brett, “Topic Modeling: A Basic Introduction,” Journal of Digital Humanities 2 (Winter 2012), http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/

[3] Jennifer Howard, “Big-Data Project on 1918 Flu Reflects Key Role of Humanists,” Chronicle of Higher Education, February 27, 2015, http://chronicle.com/article/Big-Data-Project-on-1918-Flu/190457/

what is the economic model of the digital humanities?

 

I ask this question a bit tongue-in-cheek, but I think a more serious version of the question is one that has not yet been adequately answered. Who does the work that produces digital humanities output? This is the conundrum underlying most questions of  evaluation (particularly for tenure credit) of digital humanities work, of much of the altmetrics conversation, and to some extent the question of what training our advanced graduate students require.

To be sure, there are other important questions that are continuously being asked of DH: what are we doing? To what end? Particularly in history, is the tool that we use for our project contributing substantively to the scholarship it purports to represent?

Yet as I read the short introduction to DH from the book Digital_Humanities,[1] which speaks of DH projects in terms of project management, deliverables, and development cycles, I can’t help but wonder: who is responsible for this work? Who is teaching history PhDs to write technical documentation? To evaluate, choose, and implement metadata standards? To do database administration? Are those instructors departmental faculty? Does the single-apprenticeship model of scholarly training still hold up? Is coursework on proper LAMP implementation taught the same semester as paleography?

Or, alternatively, does DH methodology allow the humanities to borrow more efficiently from the model used in the physical sciences, with small armies of grad students and postdocs doing “grunt work” (text encoding or map reading) in support of a primary investigator’s research and output?

As to the expert option, if professionals (programmers, information architects, designers, and publishers are all possible collaborators identified by Roy Rozenzweig[2]) are used, by whom are they paid? For whom do they work? Does the future of humanities research depend on the grant-funded research structure? What is the scholar’s role in the production of a high-tech undertaking that requires professionals to develop it? If they’re merely writing content, why not write a book?

[1] Anne Burdick, Johanna Drucker, Peter Lunenfeld, Todd Presner, and Jeffrey Schnapp. Digital_Humanities. The MIT Press, 2012.

[2] Douglas Seefeldt, and William G. Thomas III. “What is digital history? A look at some exemplar projects.” (2009), 3.

is digital scholarship a medium?

The theme of this week’s readings is the future (or lack thereof) for “traditional” or linear narrative. In response to this theme, I began reading with a greater than usual degree of skepticism – which is saying something.

I’ve been online regularly since 1997, and intermittently for a few years before then. I have seen blogging/publishing platforms come and go, and slapped together more than a few website assignments. Most “innovative” online scholarship presentations are overblown, hard to read, hard to navigate, often impossible to save or cite, and generally frustrating. Yet in reading the following, I finally found the real reason for my knee-jerk instinct to roll my eyes at the premise of non-linear, “futuristic,” “Web 2.0” or “interactive” scholarship:

“Our presumption that readers would want to engage with our article in the way we intended was probably misplaced.”

Respected historian William Thomas wrote this rather damning line in the excellent essay on the development of his digital history project (with Edward Ayers) “The Differences Slavery Made.”

The traditional journal article or book layout is not just a narrative construction; it is a collection of powerful signaling devices that indicate (at least to an experienced reader) what the “important” pieces of a piece of writing are. We have also been trained, over the last 25+ years of the World Wide Web, to look for certain importance markers online, like logos, certain consistent phrases (contact, about us, FAQ). A great deal of digital scholarship, in my experience, fails to effectively use the signals of either the print antecedents or the active web, leaving users to flail about helplessly or, potentially, give up without learning anything.

Communication theorist Marshall McLuhan famously argued that “the medium is the message” — he says:

“…[T]he content of any medium is always another medium. The content of writing is speech, just as the written word is the content of print, and print is the content of the telegraph. If it is asked, ‘What is the content of speech?,’ it is necessary to say, ‘It is an actual process of thought, which is itself nonverbal.”[1]

I confess I still don’t understand what the content of Thomas and Ayers’ project is. What is the content or medium of this?

thomas_graph_text2

(Illustration of “the article’s linkages and structure” from Thomas’ essay, linked above.) In what way is this an article? What understanding or knowledge must the reader come to the table with to be able to read this site?

Scholars – and librarians, and archivists, and museum specialists – are well-trained in their respective crafts, to the point where they can develop a curious reversed myopia about what other people can see in their own work. It is important to keep in mind that the web’s potential for non-linear narration give us new and more powerful ways to completely fail to connect with the people we are attempting to communicate with.

[1] Marshall McLuhan. “The Medium is the Message,” in The Anthropology of Media, ed.  Kelly Askew and Richard R. Wilk. (Malden, Massachusetts: Blackwell, 2002), 18.

lies, damn lies, and metrics

In his 2014 Code|Words essay, Michael Peter Edson waxes prophetic about the “dark matter of the Internet,” heralding the beginning of a new age, where everything is interactive, collaborative, creative. In the list of factoids he uses to support his theory, he cites number after number: 140 billion Facebook images; 185.4 million Tumblr accounts with 83.1 billion posts; a billion TED talk videos “served”. (Having been on tumblr and Facebook for some time now, I can tell you that it’s likely only a miniscule percent of those posts and images are original, but copyright in this “new” web community is a topic for another time.)

Even while touting the measurement these huge digital impacts, Edson seems to look down on traditional GLAM institutions for measuring their non-digital impacts:

“Some of the disconnect between what institutions could do online and what they do do online can be attributed to the clothesline paradox, a term environmental pioneer Steve Baer coined to describe the phenomenon in which activity that can be measured easily (e.g., running a clothes dryer) is valued over equally important activity that eludes measurement (e.g., drying clothes outside.) The same can be said for the way in which institutions habitually value activity such as visits to museums or journal articles published by their scholars over equally meaningful but more difficult to measure activity such as the sharing of museum-related materials on social media sites or the creation of Wikipedia pages.”[1]

The core concept is this: everything that matters is measurable, and old-timey institutions like museums and libraries are just measuring the wrong thing.

A recent case study of an impressive suite of digital offerings from the Cleveland Museum of Art seems to have taken this idea to heart. The authors are clearly and justly proud of the work the team at CMA have done, and their responsive, dynamic Collection Wall sounds like something I’d like to play with. But in the enumeration of all the digital tools they use and all the data they gather, the story seemed to turn slightly from “gather data to help find new ways reach our patrons with our collection” to “gather data… to gather data.” As they put it:

The visitor-experience backbone captures “user” moments (i.e. transactions) that document a person’s relationship with a museum – from parking garage entrances and special exhibition entrances, to attendance at museum cocktail nights, children’s art-class registrations, donor-circle renewals, and volunteer shift check-ins.[2]

I can’t be the only person that finds this a little creepy.

Counting things – patrons, tickets, blog posts, images, whatever – makes it easy for us to tell a story about “progress” :  “One year we had x widget uses logged, and the following year we had x + 100 widget uses logged. We’re aiming for an even bigger number this year – maybe as high as x + 250!” These stories make it easier to justify our continued existence, help us find (or keep) funding, and make great bullet points in marketing materials. But what are they really measuring? At the risk of sounding incredibly trite, can you measure inspiration? Is there a metric for Eureka moments?

A few days ago I read[3] an entertainingly foulmouthed essay by instructional designer Sean Michael Morris on critical pedagogy in the age of digital learning. Morris warns that in the age of student metrics and constant tracking, it is too often the case that “Learners […] are very much the objects of the learning process, and not the subjects. Learning is being done to them while they’re observed to see how well the technology is working, or the content… to see, to put it frankly, how well learners are responding to the treatment.”[4]

While the digital initiatives and user-data-analytics world of libraries and museums is more ethnographic (what are they doing?) and less experimental (what do they do if I do this?), it feels as though there is still a distinct separation between the content holding institution and the user or patron. Are patrons learners? Are museum staff and librarians teachers? We should be. And if Edson’s utopian internet dark matter is to exert its force, the patrons will need to be teachers and the staff and librarians learners – which is difficult if not impossible to build analytics for.

Just as it is important to ask ourselves why we want to use this digital tool or that interactive design, it is important to ask – and to justify to our patrons – why we gather the data we do, and how we plan to use it. What will our patrons walk away with? Can we and should we measure that?

 

 

[1] Michael Peter Edson, Dark Matter: The Dark Matter of the Internet is Open, Social, Peer-to-Peer and Read/Write—And It’s the Future of Museums, 2014, https://medium.com/tedx-experience/dark-matter-a6c7430d84d1#.n7rbitvn9

[2] Jane Alexander and Elizabeth Bolander, “A Digital Road Map: Developing and Evaluating Museum-wide Digital Strategy,” in Technology and Digital Initiatives, Juilee Decker, ed. Lanham, Maryland: Rowland & Littlefield, 2015.

[3] On Twitter, where I have 755 followers as of today – how’s that for a countable thing

[4] Sean Michael Morris, “Critical Pedagogy in the Age of Learning Management.” January 24, 2016. http://www.seanmichaelmorris.com/blog/2016/1/24/critical-pedagogy-learning-management