Loretta Merlo, circulation manager, transcribes 19th-century case files at the Transcribathon.
Article credit: WCM Central
This summer, the WCM community came together to give artificial intelligence a human boost by transcribing handwritten medical notes from the 19th century into computer-readable files. The transcribed documents are part of a project by Cornell Tech master’s candidate Praveen Kumar Govindaraj to teach computers to decipher the florid script preferred by society at that time.
Govindaraj is using machine learning, a technique that allows the computer to learn from data without being specifically programmed. With the handwriting recognition project, the computer analyzes a set of “gold standard” transcriptions that have been verified as accurate. And the more there are, the better.
The Transcribathon gave human participants a fascinating view into medicine in the 1800s, with cases ranging from an inebriated sailor who gashed his head after falling off the dock to a young woman who had been badly burned when a candle ignited her clothing. Treatments at the time varied from remedies still in use today to therapies such as bloodletting that were not very helpful.
Transcribathon participants weren’t working from the original case files. The fragile and sometimes decaying files had first been scanned and saved as digital documents by the Medical Center Archives staff as part of a project spearheaded by Dr. Curtis Cole, chief information officer, with a grant from the Frank Naeymi-Rad and Theresa A. Kepic Foundation. With digital copies, the documents will both be preserved and more accessible to those unable to visit the archives in person.
The more the structure of the archive and the aggregated records is a representation of their functions, the better and faster the information requested will be retrieved from it. The structure of an archive has a physical as well as a logical, functional dimension. The physical structure of the archive is the physical order of its components. The return of Cold Case Files will explore compelling new cases that have gone cold for years and chronicle the journeys of the detectives who reopened them. The detectives relive the events of the crimes, reveal new twists and startling revelations for full viewer immersion into these tragic cases, relying on breakthroughs in forensic technology and the influence of social media to.
The hope for the machine learning project is that the computer will become able to generate keywords from the scanned handwritten documents so that they can be organized in a searchable database. It will likely take many years of refinement for the technology to be able to generate complete, accurate translations from the script.
Fortunately, five more Cornell Tech students have signed on to advance the project: Young Sang Choi, Evan Yates, Kelly Wang, Aaron Yingxiang Lu and Rohun Tripathi are spending this semester tackling the handwriting-recognition problem as part of a Product Studio challenge to develop a technology-driven solution to a business need.
The Transcribathons took place in the Samuel J. Wood Library computer lab on July 27 and Aug. 17. Additional events will be scheduled in the future.