CULTUROMICS IN THE RUSSIAN NATIONAL CORPUS: THREE CENTURIES OF RUSSIAN ROADS


2015. № 3 (6), 605-640

National Research University Higher School of Economics

Abstract:

Culturomics, known as an innovative method of cultural research, studies cultural trends through quantitative analysis of the immense data of Google digitized books. This paper suggests the concept of medium data methodology, which uses much smaller but cleaner data sets. Medium data allows statistical analysis as well as more complicated linguistic preprocessing and use of complicated mark-up schemes. The study, proposed as an example of medium data research, deals with the concept of road, which is reflected in lexical data of 18th–20th centuries, received from the Russian National Corpus. The data comprises more than 15800 examples with the noun road and an adjective which serves as a diagnostic context for polysemy disambiguation. Each example is linked with one of the eight temporal periods. All the examples are divided in 19 semantic classes, and the method of hierarchical clusterization is used to reveal the most important trends and similarities in the behavior of different classes. The changes in charts are connected to social processes and changes in cultural perception.