CORPORA OF AN EARLY CHILDREN'S WRITTEN LANGUAGE AS A RESOURCE OF LINGUISTIC INFORMATION
Abstract:
The corpora of non-standard Russian speech, created at the School of Linguistics of the National Research University Higher School of Economics, are designed as a representative speech database which can be used to analyze language practices, to compare actual language use and prescriptive norms, and to study the dynamics of Russian speech. Several diff erent corpora are developed representing foreign speakers’ and heritage speakers’ speech; young speakers’ speech; Russian speech of the XIX-th century; regional speech. One such resource is the Corpus of Early (Initial) Children’s Writing. The collection is provided with metadata and marked up in terms of deviations from the standard. The material of the corpus permits to analyze grammatical and spelling anomalies that can shed light on the structure of the language system. In particular, one can trace the specifi c features of word usage, register non-typical compatibility, and detect systemic agrammatisms. The data can be considered in comparison with those of the National Corpus of the Russian Language and Internet usage which can reveal the process of speech system formation in ontogeny. The formation of writing skills refl ects the stages of phonetic writing, hyper-correction, and approaching the adult norm. Free genres (summary, composition) make it possible to trace the formation of the skills to choose appropriate words and constructions, the process of building syntactic structures and a coherent text. Errors show areas of diffi cult choice and give an idea of the competition of strategies in the search for adequate means of expression. Tactics of simplifi cation of the task related to the orientation of the writer to the principle of analogy (graphic, phonetic, grammatical, semantic) and to the consideration of frequent, regular, formally transparent samples are revealed.