Missing data cause various problems when processing and analysing real-world datasets. In this paper, we consider the structure of data from social networks’ accounts and introduce models of imputation and their ensembles designed for this area. We have analysed the structure and strength of correlations between data from social networks’ accounts and suggested an approach to imputation on the basis of an extended matrix of attributes. We have justified a step of preliminary clustering when processing missing data, which helps overcome the problem of a large number of unique values in analysable variables. We have designed models of imputation on the basis of association rules, a random forest, a support vector machine, a neural network, and an EM algorithm while using preliminary clustering and an extended matrix of attributes. We have compared the performance of these models with the most popular method of imputation “Most Common Value” (MCV), which is usually integrated into statistical packages. These results demonstrate that the MCV method is not well suited for data from social networks’ accounts in terms of two evaluation criteria. Using the suggested models, we have developed ensembles of models for imputation of nominal and numerical data types. We have shown that the ensembles of the models can handle missing values more effectively and stably in terms of the concerned evaluation criteria in comparison with the single models. Author Biographies Olesia Slabchenko, Kremenchuk Mykhailo Ostohradskyi National University Pershotravneva str., 20, Kremenchuk, Ukraine, 39600 Computer and information systems department
Alan : Fen Bilimleri ve Matematik
Dergi Türü : Uluslararası
Benzer Makaleler | Yazar | # |
---|
Makale | Yazar | # |
---|