Transmog: from MS Word documents to structured XML

Like many institutions, at the University of Virginia, we have in the past, done archival description using Microsoft word. In order to better manage this descriptive and structural metadata (finding aids) it became necessary to migrate to a structured format. In order to save time, we developed a tool to facilitate converting MS Word documents into EAD XML finding aids, which could then be imported into ArchivesSpace. The tool helps in two main ways, first it allow for rule-based assignment of sections of the word document and exposes an XML-based rule language to allow arbitrary rules to be written. For example, it's fairly easy to write a rule that says "treat the first line as the title" or "treat all paragraphs below the heading 'scope and content' as paragraphs within a scopecontent tag". The second way the tool aids in the conversion is to expose a powerful user interface that allows sections of document to be assigned their place in the structured format, including drag-and-drop reordering and nesting as well as bulk handling of content that appears to be tabular. Through it initial testing and use, lots of convenience features were added to the user interface. The tool has since been adapted to support conversion of MS Word documents in other contexts. http://github.com/uvalib/transmog

Speaker(s)

03:05 PM
Room: Lobby