User Guide
Why can I only view 3 results?
You can also view all results when you are connected from the network of member institutions only. For non-member institutions, we are opening a 1-month free trial version if institution officials apply.
So many results that aren't mine?
References in many bibliographies are sometimes referred to as "Surname, I", so the citations of academics whose Surname and initials are the same may occasionally interfere. This problem is often the case with citation indexes all over the world.
How can I see only citations to my article?
After searching the name of your article, you can see the references to the article you selected as soon as you click on the details section.
 Views 27
 Downloands 3
Collection and evaluation of lexical complexity data for Russian language using crowdsourcing
2022
Journal:  
Vestnik Rossijskogo Universiteta Družby Narodov: Seriâ Lingvistika
Author:  
Abstract:

Estimating word complexity with binary or continuous scores is a challenging task that has been studied for several domains and natural languages. Commonly this task is referred to as Complex Word Identification (CWI) or Lexical Complexity Prediction (LCP). Correct evaluation of word complexity can be an important step in many Lexical Simplification pipelines. Earlier works have usually presented methodologies of lexical complexity estimation with several restrictions: hand-crafted features correlated with word complexity, performed feature engineering to describe target words with features such as number of hypernyms, count of consonants, Named Entity tag, and evaluations with carefully selected target audiences. Modern works investigated the use of transforner-based models that afford extracting features from surrounding context as well. However, the majority of papers have been devoted to pipelines for the English language and few translated them to other languages such as German, French, and Spanish. In this paper we present a dataset of lexical complexity in context based on the Russian Synodal Bible collected using a crowdsourcing platform. We describe a methodology for collecting the data using a 5-point Likert scale for annotation, present descriptive statistics and compare results with analogous work for the English language. We evaluate a linear regression model as a baseline for predicting word complexity on handcrafted features, fastText and ELMo embeddings of target words. The result is a corpus consisting of 931 distinct words that used in 3,364 different contexts.

Keywords:

2022
Author:  
0
2022
Author:  
Citation Owners
Information: There is no ciation to this publication.
Similar Articles










Vestnik Rossijskogo Universiteta Družby Narodov: Seriâ Lingvistika

Field :   Sosyal, Beşeri ve İdari Bilimler

Journal Type :   Uluslararası

Metrics
Article : 916
Cite : 2.194
2023 Impact : 0.173
Vestnik Rossijskogo Universiteta Družby Narodov: Seriâ Lingvistika