Abstract Universal Dependencies (UD) is an internationally coordinated annotation scheme for cross-linguistically comparable morphosyntactic annotation of corpora, which has been applied to more than 130 other languages worldwide, including Slovenian. In this paper, we present the results of recent activities related to Slovenian UD annotation within the Development of Slovene in a Digital Environment project. During the project, we upgraded the existing infrastructure with reviewed and detailed documentation of the Slovenian UD annotation guidelines and produced four new datasets, manually annotated in accordance with the scheme. Specifically, we expanded the SSJ-UD treebank for written Slovenian with new sentences from the ssj500k and ELEXIS-WSD corpora, and created a new hidden UD treebank based on the SentiCoref corpus to be used on the SloBENCH evaluation platform. In addition, the SUK and Janes-tag reference training corpora, originally annotated using the language-specific JOS annotation scheme, have been semi-automatically converted to UD part-of-speech categories and morphological features. The new version of the reference SSJ-UD treebank with more than 5,000 new sentences and double the original number of tokens was used to train a new dependency parsing model in the CLASSLA-Stanza annotation tool. This paper gives an in-depth evaluation of its performance with respect to the overall parsing performance, the relation-specific parsing performance and the most common types of errors produced.
Alan : Eğitim Bilimleri; Sosyal, Beşeri ve İdari Bilimler
Dergi Türü : Uluslararası
Benzer Makaleler | Yazar | # |
---|
Makale | Yazar | # |
---|