2019. № 4 (22), 89-102

 University of Geneva / Institute of Informatics Problems FRC CSC RAS


 The paper analyses the formal structure of conjunctive tools (connectors) in Russian, units acting as elements of a dynamic system both quantitively and qualitatively. The author examines how the problem of annotating multiword connectors is addressed in corpus projects, including two annotated corpora of Russian language: HANCO (the Helsinki An- notated Corpus), and “Corpus Dictionary of Multiword Lexical Units” of the Russian National Corpus. The new corpus tool called a supracorpora database (SCDB) of connectors considers discursive realization (DR), e.g. the form in which a connector has been used in a certain context, as an annotation unit. During the DR annotation process in the SDC, elements of the multiword connector’s structure are attributed to cross-clusters. This approach not only allows one to register the ad hoc form of a given conjunctive tool, but also gives an idea of the combinatorial capacity of multiword connectors’ components and therefore reveals systematic ties between them. These data can be used to describe the range of linguistic units available to the speaker, who is free to combine some of them to produce a DR. Diverse statistic data extractible from the SDC, on the other hand, help to ascertain the number of occurrences and the frequency of a given conjunctive tool in a text.