IMPLEMENTASI DEEP LEARNING MENGGUNAKAN HYBRID SENTENCE-TRANSFORMERS DAN K-MEANS UNTUK PERBANDINGAN JURNAL
##plugins.themes.academic_pro.article.main##
Abstract
This study addresses the challenge of identifying semantic relatedness between scientific journal articles by developing a classification system based on deep learning. The system applies an unsupervised learning approach using the Sentence-Transformers model and K-Means clustering to generate semantic similarity scores and categorical labels. Abstracts from journal PDFs are extracted and processed to determine similarity levels across four predefined categories. The optimal number of clusters was determined using Elbow Method, Silhouette Score, and Davies-Bouldin Index, resulting in k = 4. The system is implemented as a web-based application that allows users to upload two PDF files, compare them semantically, and receive both a similarity score and an AI-generated narrative explanation. Functional testing showed that all core features performed as expected. This system significantly reduces the time required to assess relatedness between journal articles, offering an efficient tool for academic research navigation.
##plugins.themes.academic_pro.article.details##

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
[2] M. T. Colangelo, M. Meleti, S. Guizzardi, E. Calciolari, and C. Galli, “A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata,” Big Data Cogn. Comput., vol. 9, no. 3, pp. 1–18, 2025, doi: 10.3390/bdcc9030067.
[3] R. Kusumaningrum, S. F. Khoerunnisa, K. Khadijah, and M. Syafrudin, “Exploring Community Awareness of Mangrove Ecosystem Preservation through Sentence-BERT and K-Means Clustering,” Inf., vol. 15, no. 3, pp. 1–14, 2024, doi: 10.3390/info15030165.
[4] H. T. A. Simanjuntak, P. E. P. Silaban, J. K. S. Manurung, and V. H. Sormin, “Klasterisasi Berita Bahasa Indonesia Dengan Menggunakan K-Means Dan Word Embedding,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 3, pp. 641–652, 2023, doi: 10.25126/jtiik.20231026468.
[5] A. Aszani, H. I. Wicaksono, U. Nadzima, and L. Heryawan, “Information Retrieval for Early Detection of Disease Using Semantic Similarity,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 17, no. 1, p. 45, 2023, doi: 10.22146/ijccs.80077.
[6] A. Subakti, H. Murfi, and N. Hariadi, “The performance of BERT as data representation of text clustering,” J. Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00564-9.
[7] Y. Ortakci, “Engineering Science and Technology , an International Journal Revolutionary text clustering : Investigating transfer learning capacity of SBERT models through pooling techniques,” Eng. Sci. Technol. an Int. J., vol. 55, no. April, p. 101730, 2024, doi: 10.1016/j.jestch.2024.101730.
[8] M. H. Weng, S. Wu, and M. Dyer, “Identification and Visualization of Key Topics in Scientific Publications with Transformer-Based Language Models and Document Clustering Methods,” Appl. Sci., vol. 12, no. 21, 2022, doi: 10.3390/app122111220.
[9] C. Y. Sy, L. L. Maceda, and M. B. Abisado, “AI-driven analysis: optimizing tertiary education policy through machine learning insights,” Int. J. Adv. Intell. Informatics, vol. 10, no. 2, pp. 296–316, 2024, doi: 10.26555/ijain.v10i2.1525.
[10] R. Anggrainingsih, E. S. Wihidayat, and B. Widoyono, “Sentence embedding to improve rumour detection performance model,” IAES Int. J. Artif. Intell., vol. 13, no. 1, pp. 115–121, 2024, doi: 10.11591/ijai.v13.i1.pp115-121.
[11] Octavian Ery Pamungkas et al., “Classification of Rupiah to Help Blind with The Convolutional Neural Network Method,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 2, pp. 259–268, 2022, doi: 10.29207/resti.v6i2.3852.
[12] Irbah salsabila and Yuliant Sibaroni, “Multi Aspect Sentiment of Beauty Product Reviews using SVM and Semantic Similarity,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 3, pp. 520–526, 2021, doi: 10.29207/resti.v5i3.3078.
[13] S. Afriyani, S. Surono, and M. I. Solihin, “Chi-Square Feature Selection with Pseudo-Labelling in Natural Language Processing,” vol. 8, no. 3, pp. 896–909, 2024.
[14] Z. Zainuddin and A. A. N. Risal, “Balanced clustering for student admission school zoning by parameter tuning of constrained k-means,” IAES Int. J. Artif. Intell., vol. 13, no. 2, pp. 2299–2311, 2024, doi: 10.11591/ijai.v13.i2.pp2301-2313.
[15] E. M. Hambi and F. Benabbou, “A deep learning based technique for plagiarism detection: A comparative study,” IAES Int. J. Artif. Intell., vol. 9, no. 1, pp. 81–90, 2020, doi: 10.11591/ijai.v9.i1.pp81-90.
[16] R. Annisa, D. Rosiyadi, and D. Riana, “Improved point center algorithm for k-means clustering to increase software defect prediction,” Int. J. Adv. Intell. Informatics, vol. 6, no. 3, pp. 328–339, 2020, doi: 10.26555/ijain.v6i3.484.