Veri madenciliğinin en dikkat çekici konularından biri olan kümelenme yöntemleri, bu alanın en yoğun araştırma sahası olup kümelenme üzerine bir çok teknik ve bağlı yöntemler bulunmaktadır.Bu alandaki çalışmaların bir kısmı daha önce mevcut olan algoritmaların güncellenmesiyle elde edilmiş ve performansları değerlendirilmiştir.Kümelenmenin en çok ilgi duyulan konusu K-Ortalamalar yöntemidir.K-Ortalamalar algoritması her çalıştırıldığında, başlangıç merkezlerinin rastgele seçilmesi nedeniyle farklı küme çıktıları döndürür.Bu nedenle, sonuçların güvenilirliği olumsuz etkilenir ve kümeleme doğruluğu için yineleme sayısı artar.Bu sorunu ortadan kaldırmaya çalışan yöntemlerden biri de K-Ortalamalar++ yöntemidir.Bu çalışmada, sentetik veri kümesine çift k olarak adlandırdığımız önerilen yöntem uygulanmıştır.Çift k yöntemi, nihai kümelenme etiketlerini bulmada K-Ortalamalar ve K-Ortalamalar++ yöntemine gore daha başarılı olduğu gözlenmiştir.
The accumulation methods, which are one of the most remarkable topics of data mining, are the most intense research field in this area and there are many technical and related methods on accumulation.Some of the studies in this area have been achieved with the updating of the algorithms that are previously available and their performance has been evaluated.The most interesting subject of accumulation is the K-mediates method.K-mediates algorithm every time it is run, turns different accumulation outcomes due to the random selection of the starting centers.Therefore, the reliability of the results is negatively affected and the number of repetitions for accumulation accuracy increases.This is one of the methods that attempts to eliminate the problem. In this study, the recommended method, which we call the synthetic data set double k, was applied.The double k method was more successful in finding the final accumulation labels than the K-mediates and K-mediates++ method.
Clustering methods that one of the most striking subjects of data mining are the most intensive research area of this field and there are many techniques and related methods on it. Some of the studies in this field have been obtained by updating the algorithms previously available and their performance has been evaluated. The most interesting topic of clustering techniques is K-Means method. Every initializing of K-Means algorithm return different cluster outputs because of random selection of the initial centers. Therefore, the reliability of the results is adversely affected and the number of iterations increase for clustering accuracy.One of the methods that tries to eliminate this problem is the k-means ++ method. In this study, the proposed method that we called double k was applied to synthetic dataset. It has been observed that double k method which finding final cluster labels is more successful than the K-Means and K-Means++ methods.