Handwriting Recognition and Retrieval

A page from George Washington's letters

A large quantity of information of historical and scientific interest remains locked up in archives of handwritten papers. While such collections are gradually being made available in the form of scanned images, creating transcripts by human annotation is prohibitively expensive in most cases. At the same time, historical document collections present special challenges that make standard handwriting recognition technology ineffective.

Abrogans 1st page

Portion of a Latin manuscript page

I have looked at techniques that can make handwritten document collections more accessible to search, without full transcription. An approach called word spotting seeks to identify appearances of an arbitrary query word wherever they appear in a document. Several of my papers have developed word spotting techniques, based on different underlying techniques.

I began research in this area in collaboration with researchers at the U. Mass Amherst Center for Intelligent Information Retrieval, working with digitized copies of George Washington's letters obtained from the Library of Congress. More recently I have worked on manuscripts in Latin, medieval High German, and early Syriac (a relative of Aramaic).

See a list of papers on word spotting and handwriting recognition.

Download MATLAB reference code for part-structured models used in several of my papers.