FROM THE EXPERIENCE OF WORK ON THE CREATION OF AUTHOR’S CORPUS OF TEXTS
Abstract:
The article describes the experience of the Laboratory of General and Com puter Lexicology and Lexicography of the Faculty of Philology of the Lomonosov Moscow State University in creating author’s text corpora. The optimal ways of organizing such corpora are discussed and the expediency of adhering to the full-text principle and lexicography principle in it is substantiated. By lexicographic organization we mean the organization of the corpus in which search and navigation are carried out by means of the user’s choice of a dictionary with the type of information he/she needs. Through dictionaries of diff erent types and their intersection the user searches for the necessary units, their contexts of use and gets to the right place in the text. The article also describes some aspects of work on the dictionary and the corpus of Chekhov’s fi ction texts, in particular, the possibility of thematic markup of the works. The main attention is paid to the ongoing work on the corpus of A. S. Pushkin’s texts. Some problems related to semantization and semantic classifi cation of lexical units, as well as to additional segmentation of texts, which increases the informative value of the corpus, are considered. The main modes of work with the corpus available in the system ISTOK developed in the laboratory, including the help mode, are described. The latter allows the user to obtain various kinds of additional information in the process of reading and working with the units of dictionaries. The sources of such information are the Dictionary of Pushkin’s Language, Guide to Pushkin and others. The description is illustrated by the results obtained with the help of the ISTOK system.