THE CORPUS OF REGIONAL RUSSIAN-LANGUAGE NEWSPAPERS IN RUSSIA AND ABROAD


2015. № 3 (6), 163-193

Vinogradov Russian Language Institute of the Russian Academy of Sciences

Abstract:

The article describes the corpus of regional and foreign press as a new module within the RNC. Unlike newspaper corpus including only materials of the national press, the new corpus focuses on texts from regional publications. The corpus presently consists of the following subcorpora: 1) newspapers of the Grodno region (7 editions, 1.9 million tokens in total), prepared jointly with the specialists of the Grodno University, 2) Russian regional press of 2010s (13 newspapers, about 2 million tokens), 3) Russian regional press of 1990-2000 (40 newspapers, 2.6 million tokens), 4) regional releases of the national newspaper "Komsomolskaya Pravda" (6.5 million tokens). Our approach to selecting texts, the standards and software for word processing, organization of the corpus interface are described in the article as well as the prospects of the development of corpus composition and searching instruments. The texts of the regional corpus are provided with morphological, semantic and detailed metatextual annotation. The corpus is freely available at the address http://www.ruscorpora.ru/search-regional.html. The main purpose of this corpus as we see it is to provide researchers with a tool of studying the regional variation of the Russian language.