Automatic Transcription for Oral Histories: One Step Closer at NYPL?

Amy Starecheski is Associate Director of OHMA. In this post, she reports back from an event jointly curated by OHMA, the New York Public Library and Columbia's Digital Humanities Center, in which oral historians tested a new transcript correction tool developed by NYPL Labs.

Oral historians are rarely found gathered in a room together huddled over laptops. But that was the scene last week in the Studio@Butler, an experimental digital humanities space at Columbia University known for hosting hackathons and open lab hours. Engineers from NYPL Labs, the New York Public Library’s team of digital preservationists, curators, and developers, and Alex Kelly, who has been running NYPL’s Community Oral History Project since 2013, came to Columbia to test out their new software tool on a group of oral historians and digital humanists in a sneak preview before their April 5 launch. We, who are so attached to the one-on-one face-to-face interview, are learning from the digital humanities folks how to research in teams.

The Open Transcript Correction Tool, developed by NYPL through the Together we Listen partnership with The Moth and Pop Up Archive, and supported by the Knight Prototype Fund, is an online tool to allow the public to collaboratively fill the gap between what voice recognition software hears and what our narrators are saying. In this case, the automatic transcription is by Pop Up Archive, and it’s often quite good. Still, a human touch is needed to make the leap from at times unintelligible strings of words to polished transcript. With the Open Transcript Correction Tool, anyone can go to the site and, without even logging in, be listening to audio and correcting transcript instantly. There is a tutorial and FAQs, but the program is pretty intuitive – you click on a segment to play an audio clip of up to five seconds in length, and correct any errors in the accompanying transcript just by typing over what’s there. There is a succinct style guide, available if you have questions.

Users have the option of logging in to do more sustained work on a project, keeping track of their accomplishments and watching their edited transcripts accumulate. A line is considered complete once three users have approved the same version of the text. If there are conflicts, users can weigh in, choosing the version they find to be the most accurate or adding another option to the pool. In this way, NYPL hopes to find a balance between crowdsourcing and quality control - a problem even when not working at the detail-oriented and subjective task of what we at Columbia’s Center for Oral History Research call “audit-editing” transcripts.

The oral historians and digital humanists gathered at Columbia had few major critiques of the beta version of the Open Transcript Correction Tool. We thought it was kind of fun! Because you have to actively click on each tiny segment to play and check it, I experienced none of the frantic feeling I get when trying to transcribe with the audio constantly getting ahead of my ability to type. And there is a neat tool to track your progress, based on the percentage of total lines in the transcript you’ve corrected. In less than a half hour of effort, I did about five minutes of tape and about ten percent of the total interview. Yes, a good transcriber could do it faster from scratch, but I am not a good transcriber, and many projects can’t afford one. It was satisfying to watch the lines turn green and my percentages grow, and I enjoyed listening to the oral history, conducted with a Harlem resident of Jamaican heritage.

NYPL’s Community Oral History Project is already scaling up the process of oral history, training volunteers to conduct hundreds of oral histories in eleven different neighborhood projects over the past several years. Oral historians have begun experimenting with Pop Up Archive for transcription, and have found, as with all automatic transcription tools that have come before it, that it is good, but needs some human help. The Community Oral History Project has a proven track record of involving large teams of volunteers in the oral history process. Now that the tool is live they are hosting a series of in-person events for people to use it together. And they imagine regular people taking a few minutes on a lunch break or at home in the evening to listen to, and fix up, a little oral history. If people take it up and use it, this tool may be a part of the solution to a problem faced by managers of community oral history projects everywhere: keeping volunteers engaged after the interviewing phase is over. And it certainly seems to be one more step in our incremental move towards automating large parts of the transcription process.

Try it out here or in person at an upcoming event in NYC. RSVP today!