CORPUS “BALANCED ANNOTATED TEXT COLLECTION (TEXTOTEC)” (SAT): STUDYING THE SPECIFICITY OF RUSSIAN MONOLOGICAL SPEECH
Abstract:
The article represents one of the Russian speech corpora: a collection of monologic texts, known as the “Balanced Annotated Text Collection (Textotec)” (SAT). This corpus was being assembled in St. Petersburg State University for more than 20 years, using the author’s (N. V. Bogdanova-Beglarian’s) methodology of data collection, which involves a fairly strict set of experimental procedures. SAT is designed to study various types of spontaneous monologues(reading, retelling, image description, story on the topic) and it contains texts recorded from five professionally-oriented groups of native speakers (medical doctors, lawyers, computer specialists, philologists, teachers of Russian as a foreign language, and teachers-philosophers), several blocks of students speech (philologists and non-philologists), as well as four blocks of the interfered Russian speech of native speakers of other languages: Americans, Chinese, Francophone and Dutch. In total, there are about 700 texts in the SAT and about 50 hours of sound recording. In the article, against the background of other Russian-speaking and foreign speaking corpora, a description of this linguistic resource is given, the main topics developed on its material are marked, and prospects for continuing work are outlined.