A DEEPLY ANNOTATED CORPUS OF RUSSIAN TEXTS (SYNTAGRUS): CONTEMPORARY STATE OF AFFAIRS


2015. № 3 (6), 272-299

A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences

 

Abstract:

The paper discusses the main features, principles of creation and parameters of the syntactically tagged corpus of Russian texts, SynTagRus. In addition to syntactic annotation of all sentences with dependency trees, the corpus contains information on arguments and values of lexical functions for words occurring in the sentence, as well as the data on lexical meanings of words. The article considers the subcorpus of sentences containing diverse types of ellipsis and discusses possible uses of the corpus for research tasks and applications.