2022. № 2 (32), 63-80

Kharkevich Institute for Information Transmission Problems of the Russian Academy of Sciences


The main issue addressed in this paper is the validity of so-called lexical-functional collocations (in terms of the Meaning–Text theory) accumulated under development of the ETAP system combinatorial dictionary. There are 158 diff erent functional dependency types used in the dictionary, but for this study we chose only collocations that satisfy the morphological pattern “noun governs adjective”. The theory of lexical functions postulates that such collocations can be described by one of 6 lexical functions — MAGN (‘very’, ‘high degree’ deep disappointment), BON (‘good’, ‘positive attitude’ exhaus-tive analysis), VER (X ‘that lives up to expectations’ true love) and their “an tonyms” A NTIMAGN (‘low degree’ venial sin) ANTIBON (‘negative attitude’ doubtful achieve-ment) and ANTIVER (X ‘that is against expectations’ wrong advice). We compared the subset consisting of the LF-collocations of these types with the corresponding part of the “Dictionary of Russian Idioms” by G. I. Kustova. This dictionary focuses on collocations with intensifi ers that express the idea of high degree. Nouns with adjective modifi ers make up more than half of entries in the dictionary. On the next step the subset of selected LF-collocations and the morphosyntactically similar entries from Kustova’s dictionary were searched in the syntactically annotated corpus of Russian (SynTagRus). Collocations found in the corpus got the estimations of four association measures: Point-wise Mutual Information (PMI), Student’s test (t-test), Dice’s coeffi cient, Log-likelihood ratio. Based on these data we came to the following conclusions: 1) the number of col- locations that occur both in the list of LF-collocations extracted from the combinatorial dictionary and in the Kustova’s dictionary is rather small (only 27% of LF-combinations of the MAGN type are in the list of Kustova’s dictionary entries); 2) approximately the same shares of LF-collocations and the idiomatic dictionary entries are attested in SynTagRus — about 20%; 3) the collocations attested in the corpus have similar association measures estimations, regardless of which dictionary they come from; 4) the Point-wise Mutual Information and the Dice coeffi cient give non-trivial estimates of the words interdependence even on a corpus as small as SynTagRus; 5) a syntactically annotated corpus is a convenient resource to search LF-collocations and calculate their association measures, since the LF-collocations are not linearly continuous.