CORPORA OF THE RUSSIAN LANGUAGE


2015. № 3 (6), 20-64

Saint-Petersburg State University

Institute for Linguistic Studies

Abstract:

The paper describes corpora of the Russian language and the state of the art of Russian corpus linguistics. The main attention is paid to the Russian National Corpus (RNC) as the most popular one among linguists for both being the most well known and the opportunities which it presents. The author regards the subcorpora within the RNC, semantic annotation and Charts service of the RNC in comparison with Google Books Ngram Viewer. The article also presents a large number of other text corpora of Russian, among them Helsinki Annotated Corpus (HANCO), Leeds University corpora, Sketch Engine corpora and so on, speech corpora of Russian, parallel сorpora, diachronic corpora and specialized corpora. Corpora of the Russian language provide corpus-based studies of both oral speech and written language using synchronic and diachronic approaches and allow us to study linguistic phenomena in typological, sociological, culturological aspects.