DEVELOPMENT OF A COMBINED RULE-BASED AND NEURAL NETWORK ACCENTATOR FOR RUSSIAN POETIC TEXT


2022. № 3 (33), 181-190

HSE University

Abstract:

This article describes the process of developing a combined rule-based and neural network accentuator, which has been implemented for markup of Russian poetic text. The need to develop this tool arose due to the insuffi cient quality of existing annotators. Moreover, the task of accentuating Russian texts is non-trivial, since these texts are charac terized by the presence of homographs and diff erently-spaced accents. In the course of this work, we fi rst compare the existing tools for marking Russian poetic text based on dictionary and neural network approaches: A. Polyakov’s rule annotator and the annotator of the group headed by E. Chernyak, based on the model of artifi cial neural (recurrent) networks. Analysis of tools includes comparison of numerical metrics of quality (accuracy) and classifi cation of errors. We then consider options for combining these two annotators to achieve better quality. The principle behind the fi nal combined accentuator is that fi rst the words in which the accent is determined unambiguously are marked using rules, and the others are marked using neural networks. In this way we have been able to achieve a higher quality of automatic accentuation marking of Russian poetic text.