ANALISIS PERFORMA DAN KECEPATAN KOMPUTASI ALGORITMA K-MEANS DAN K-MEDOIDS  PADA TEXT CLUSTERING

Karno Nur Cahyo; Agus Subekti; Muhammad Haris

doi:10.51903/pixel.v15i2.931

Authors

Karno Nur Cahyo Universitas Nusa Mandiri
Agus Subekti Universitas Nusa Mandiri
Muhammad Haris Universitas Nusa Mandiri

DOI:

https://doi.org/10.51903/pixel.v15i2.931

Keywords:

clustering, text mining, thesis grouping, k-means, k-medoids, Davies Bouldin Index, t-SNE visualization, cluster determination performance, cluster determination computation time

Abstract

The large number of theses will certainly make it difficult to find categories on thesis topics that have been written by students at a university. One of the uses of the Text Mining method is being able to group thesis objects into the number of clusters formed by the clustering algorithm. This study aims to compare 2 clustering algorithms, namely the K-Means and K-Medoids algorithms to obtain an accurate evaluation of the performance and computational time in the case of thesis clustering, so that relevant topics can be grouped and have better clustering accuracy. The evaluation parameter used is the Davies Bouldin Index (DBI) which is one of the testing techniques on clustering results, with the distribution of training data and testing data using cross validation using a repetition parameter of 10 folds iteration. From the results of the study with the Term Weighting condition used is Term Occurrences and using the N-Grams value is 2, it can be concluded that the K-Means algorithm has a better DBI value of -0.426. Meanwhile, the range of DBI values owned by K-Medoids with the same conditions has a DBI value of -1,631. However, from the visualization results using t-SNE with the same supporting parameters, there are options that can be used, namely the number of clusters is 6, and the DBI value is -1.110. For testing the computational time in the clustering process of 50 thesis documents, the K-Means algorithm has an average time of 2.5 seconds while the K-Medoids algorithm has an average time of 261.5 seconds. The computer specifications used are Asus ZenBook UX425EA.312 with the processor used is 11th Gen Intel® Core™ i5-1135G7 @ 2.40GHz @ 2.40GHz, the graphics card is Intel® Iris® Xe Graphics, the RAM used is 8GB, with storage of 512GB SSD.

References

[1] S. Lialiyah and R. Andrea, “Penerapan Algoritma K-Medoids Clustering Dalam Pembentukan Zona Cluster Vaksin Boster,” vol. 4, no. 1, pp. 124–129, 2022, doi: 10.47065/bits.v4i1.1617.
[2] S. Ramadhani, D. Azzahra, U. I. Negeri, and S. S. Kasim, “Comparison of K-Means and K-Medoids Algorithms in Text Mining based on Davies Bouldin Index Testing for Classification of Student ’s Thesis,” vol. x, no. x, pp. 24–33, 2022.
[3] F. Nur, M. Zarlis, and B. B. Nasution, “Penerapan Algoritma K-Means Pada Siswa Baru Sekolahmenengah Kejuruan Untuk Clustering Jurusan,” InfoTekJar (Jurnal Nas. Inform. dan Teknol. Jaringan), vol. 1, no. 2, pp. 100–105, 2017, doi: 10.30743/infotekjar.v1i2.70.
[4] M. Arifandi, A. Hermawan, and D. Avianto, “Implementasi Algoritma K-Medoids Untuk Clustering Wilayah Terinfeksi Kasus Covid19 Di Dki Jakarta,” JTT (Jurnal Teknol. Ter., vol. 7, no. September, pp. 120–128, 2021.
[5] B. Wira, A. E. Budianto, and A. S. Wiguna, “Implementasi Metode K-Medoids Clustering Untuk Mengetahui Pola Pemilihan Program Studi Mahasiwa Baru Tahun 2018 Di Universitas Kanjuruhan Malang,” RAINSTEK J. Terap. Sains Teknol., vol. 1, no. 3, pp. 53–68, 2019, doi: 10.21067/jtst.v1i3.3046.
[6] Y. Elda, S. Defit, Y. Yunus, and R. Syaljumairi, “Klasterisasi Penempatan Siswa yang Optimal untuk Meningkatkan Nilai Rata-Rata Kelas Menggunakan K-Means,” J. Inf. dan Teknol., vol. 3, pp. 103–108, 2021, doi: 10.37034/jidt.v3i3.130.
[7] C. Purnama, W. Witanti, and P. Nurul Sabrina, “Klasterisasi Penjualan Pakaian untuk Meningkatkan Strategi Penjualan Barang Menggunakan K-Means,” J. Inf. Technol., vol. 4, no. 1, pp. 35–38, 2022, doi: 10.47292/joint.v4i1.79.
[8] N. Nurahman, A. Purwanto, and S. Mulyanto, “Klasterisasi Sekolah Menggunakan Algoritma K-Means berdasarkan Fasilitas, Pendidik, dan Tenaga Pendidik,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 21, no. 2, pp. 337–350, 2022, doi: 10.30812/matrik.v21i2.1411.
[9] S. U. Tarigan, M. Yetri, and Saniman, “Klasterisasi Data Penanganan Dan Pelayanan Kesehatan Masyarakat,” Jurnam Sist. Inf. TGD, vol. 1, p. 14, 2022.
[10] Noviyanto and P. Ekasari, “Algoritma K-Means Untuk Klasterisasi Jabatan Fungsional Dosen Pada Perguruan Tinggi Swasta Di Lingkungan LLDikti Wilayah III,” Paradigma, vol. 24, no. 1, pp. 103–107, 2022.
[11] A. Wahyu and Rushendra, “Klasterisasi Dampak Bencana Gempa Bumi Menggunakan Algoritma K-Means di Pulau Jawa,” J. Edukasi dan Penelit. Inform., vol. 8, no. 1, pp. 175–179, 2022.
[12] H. N. Putra, A. Wisandra, and Fransiska, “Penerapan Algoritma K-Means untuk Klasterisasi Data Obat Pasien Rawat Jalan Berdasarkan 3 Penyakit Terbanyak Di Rumah Sakit M. Natsir Solok,” Ensikolediaku J., vol. 4, no. 3, pp. 304–312, 2022.
[13] M. R. Nugroho, I. E. Hendrawan, and Purwantoro, “Penerapan Algoritma K-Means Untuk Klasterisasi Data Obat Pada Rumah Sakit ASRI,” J. Nuansa Inform., vol. 16, pp. 125–133, 2022.
[14] S. A. Rahmah and J. Antares, “Klasterisasi Seleksi Mahasiswa Calon Penerima Beasiswa Yayasan Menggunakan K-Means Clustering,” I N F O R M a T I K a, vol. 13, no. 2, p. 25, 2022, doi: 10.36723/juri.v13i2.282.
[15] M. A. Hairudin, Y. Wabula, and Hazriani, “Rekomendasi Strategi Sosialisasi Program Studi Melalui Jalur Undangan Menggunakan Algoritma ID3 dan K-Means,” J. Inf. Technol. Comput. Eng., vol. 01, pp. 14–18, 2022.
[16] R. Chairunnisa and P. P. Adikara, “Analisis Sentimen terhadap Karyawan Dirumahkan pada Media Sosial Twitter menggunakan Fitur N-Gram dan Pembobotan Augmented TF – IDF Probability dengan K-Nearest Neighbour,” vol. 6, no. 4, pp. 1960–1965, 2022.
[17] A. T. Ni’mah and A. Z. Arifin, “Perbandingan Metode Term Weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis,” Rekayasa, vol. 13, no. 2, pp. 172–180, 2020, doi: 10.21107/rekayasa.v13i2.6412.