Applying Automatic Coding to the Past Travel Research

Rimvydas Laužikas (Vilnius University, Lithuania)  

The recent massive digitisation of written historical sources with the optical character recognition and availability online has created new opportunities and challenges for historical research. The main research problem is related to the fact that the travel accounts usually constitute only a small part of a given source and are unevenly distributed across different documents. Therefore, given the volume of digitised documents and the number of documents published in different languages, researching only one aspect of these documents, i.e., travel, as sources requires a significant amount of human and time resources.     

The methodology presented in this paper is based on the application of the information paradigm and digital technology-based methods. The methodology consists of two steps: (i) text corpus with Optical Character Recognition (OCR) and (ii) collection and analysis of empirical data using a dictionary-based automatic coding method and its implementation with the MaxQDA software. The vocabulary used in the study is structured around six concepts (categories) related to past journeys: (i) the journey (general description), (ii) the road and its infrastructure (bridges, banks, etc.), (iii) means of transport, (iv) stopping places and places to stay overnight (towns, villages, taverns, post offices, etc.), (v) the people met on the way (inn-keepers, highwaymen, guides, etc.), and (vi) the food of the journey. Each concept is described by a set of key words and phrases. The methodology has been tested and found to address the research problem mentioned above.