Optimisasi Whisper Speech-to-Text Bahasa Indonesia dengan Hybrid Cloud dan Multi-Engine

Authors

  • Ikhwan Alfath Nurul Fathony Untidar
  • Affix Mareta Universitas Tidar
  • Beta Estri Adiana Universitas Tidar
  • Olivia Wardhani Universitas Tidar
  • Dimas Ardiansyah Halim Universitas Tidar

DOI:

https://doi.org/10.51903/5n2d3s08

Keywords:

Automatic Speech Recognition, Whisper, Indonesian language, Hybrid cloud, Audio preprocessing, Multi-engine architecture

Abstract

Automatic Speech Recognition (ASR) for the Indonesian language faces significant challenges due to high Word Error Rate (WER), especially when using pre-trained models without fine-tuning. This study develops an optimized ASR system using a hybrid cloud architecture that integrates the Faster-Whisper large-v3 engine with advanced audio preprocessing techniques. The system adopts a distributed architecture, with Google Colab (Tesla T4, 15GB VRAM) as the GPU server and Ubuntu 22.04 LTS (8 core, 32GB RAM) as the client. Evaluation was conducted on five Indonesian audio samples covering formal news, informal conversations, and long-duration recordings. The system achieved an 80% success rate in processing, with WER ranging from 27.69% (formal news) to 645.16% (informal conversations). Resource utilization was also efficient, with 21.3% GPU usage and 35.4% RAM usage. Processing time remained stable for normal-sized files but experienced timeouts on large files (>50MB). The results indicate that hybrid cloud architecture is feasible for distributed ASR processing in Indonesian, with several areas still open for optimization toward production deployment.

References

Adiwidjaja, R., & Ivan Fanany, M. (2020). End-to-end indonesian speech recognition with convolutional and gated recurrent units. Journal of Physics: Conference Series, 1566(1), 12118. https://doi.org/10.1088/1742-6596/1566/1/012118

Anjani, H. U., Vitriani, V., & Hastuti, M. (2024). Pemanfaatan Media Google Colaboratory Pada Mata Pelajaran Informatika di SMA Negeri 5 Pekanbaru. SOKO GURU: Jurnal Ilmu Pendidikan, 4(1), 101–108. https://doi.org/10.55606/sokoguru.v4i1.3613

Arafah, M. M., Jaya, A. K., Suryawan, Z. G. M. A., Banjarnahor, A. R., Bukidz, D. P., Simanjuntak, H. M., Saputra, N., & Fajrillah, F. (2023). Implementasi Artificial Intelligence (AI) Dalam Kehidupan. In Yayasan Kita Menulis. http://repository.upy.ac.id/4945/1/FullBook Implementasi Artificial Intelligence (AI) dalam Kehidupan.pdf

Arianto, O. D., & Susetyo, Y. A. (2022). Penerapan Restful Web Service Dengan Framework Laravel Untuk Pembangunan Sistem Informasi Manajemen Sumber Daya Manusia. JIPI (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika), 7(2), 522–532. https://doi.org/10.29100/jipi.v7i2.2870

Baevski, A., Zhou, H., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. Advances in Neural Information Processing Systems, 2020-Decem(Figure 1), 1–19.

Butarbutar, M., Sachio, K., Nugroho, M., David, D., & Saputra, P. (2023). Adaptive Wiener Filtering Method for Noise Reduction in Speech Recognition System. https://doi.org/10.36227/techrxiv.23608602.v1

Cahyawijaya, S., Lovenia, H., Aji, A. F., Winata, G., Wilie, B., Koto, F., Mahendra, R., Wibisono, C., Romadhony, A., Vincentio, K., Santoso, J., Moeljadi, D., Wirawan, C., Hudi, F., Wicaksono, M. S., Parmonangan, I., Alfina, I., Putra, I. F., Rahmadani, S., … Purwarianti, A. (2023). NusaCrowd: Open Source Initiative for Indonesian NLP Resources. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Findings of the Association for Computational Linguistics: ACL 2023 (pp. 13745–13818). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-acl.868

Chairani, M. S. (2023). TELEMEDICINE SEBAGAI BENTUK DIGITALISASI PELAYANAN KESEHATAN DI INDONESIA: TINJAUAN LITERATUR.

Fleck, M., & Goderle, W. (2023). wav2vec and its Current Potential to Automatic Speech Recognition in German for the usage in Digital History: A comparative assessment of available ASR-technologies for the use in cultural heritage contexts. Sustainability (Switzerland), 11(1), 1–14. https://doi.org/https://doi.org/10.48550/arXiv.2303.06026

Herdiana, B., Setiawan, E. B., & Sartoyo, U. (2023). Tinjauan Komprehensif Evolusi, Aplikasi, dan Tren Masa Depan Programmable Logic Controllers. Telekontran: Jurnal Ilmiah Telekomunikasi, Kendali Dan Elektronika Terapan, 11(2), 173–193. https://ojs.unikom.ac.id/index.php/telekontran/article/view/12896%0Ahttps://ojs.unikom.ac.id/index.php/telekontran/article/download/12896/4459

Katti, A., & Sumana, M. (2023). Pipeline for Pre-processing of Audio Data. In J. Choudrie, P. Mahalle, T. Perumal, & A. Joshi (Eds.), IOT with Smart Systems (pp. 191–198). Springer Nature Singapore.

Khoiroh, R. F., Julianto, E., Adiyansa, S. A., Fajri, H. A., Abi, A., Yasa, R., & Sangapta, B. (2024). Implementasi Speech Recognition Whisper Pada Debat Calon Wakil Presiden Republik Indonesia. EXPLORE, 14(2), 67–74.

Manu, G. A., & Masan, P. L. (2020). Aplikasi Text To Speech Untuk Meningkatkan Pembelajaran Bahasa Inggris Bagi Siswa Disabilitas. Jurnal Pendidikan Teknologi Informasi (JUKANTI), 3(2), 17–26. https://doi.org/10.37792/jukanti.v3i2.217

Özkurt, C. (2024). Investigation of the Effectiveness of Audio Processing and Filtering Strategies in Noisy Environments on Speech Recognition Performance. 0–23. https://doi.org/10.21203/rs.3.rs-3973856/v1

Raharjo, B. (2022). Deep Learning dengan Python (M. C. Wibowo (ed.)). Yayasan Prima Agus Teknik.

Soen, G. I. E., Marlina, M., & Renny, R. (2022). Implementasi Cloud Computing dengan Google Colaboratory pada Aplikasi Pengolah Data Zoom Participants. JITU : Journal Informatic Technology And Communication, 6(1), 24–30. https://doi.org/10.36596/jitu.v6i1.781

Sudhakaran, P., Kumar Yadav, A., & Karamchandani, S. (2024). an End-To-End Deep Learning Approach for an Indian English Repository. Journal of Theoretical and Applied Information Technology, 102(3), 1216–1226.

Tofure, I. R., Erwada, B. A. De, & Ukratalo, A. M. (2025). Telemedicine Sebagai Media Konsultasi Layanan Kesehatan Bagi Masyarakat di Wilayah Pesisir. Anestesi, 3(1), 121–134.

Wafiy, A. D., & Prasetio, B. H. (2022). Penerapan Model Whisper Pada Embedded System Untuk Speech to Text. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 1(1), 1–7.

Wahidin, M. (2021). PERENCANAAN ARSITEKTUR ENTERPRISE BERBASIS CLOUD COMPUTING MENGGUNAKAN TOGAF (Studi Kasus: PT. XYZ). Jurnal Interkom: Jurnal Publikasi Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 13(1), 28–35. https://doi.org/10.35969/interkom.v13i1.42

Wibowo, T. S., Mamis, S., Yahya, S. R., Romadloni, N. T., Witjaksono, G., Trianti, F. A., Nurislamiah, M., Fauzi, R., Silviana, S. C., Fadilah, R., Tantrisna, E., Pardosi, V. B. A., & N, M. A. (2024). Transformasi Teknologi Komunikasi. In Aina Media Baswara. http://scioteca.caf.com/bitstream/handle/123456789/1091/RED2017-Eng-8ene.pdf?sequence=12&isAllowed=y%0Ahttp://dx.doi.org/10.1016/j.regsciurbeco.2008.06.005%0Ahttps://www.researchgate.net/publication/305320484_SISTEM_PEMBETUNGAN_TERPUSAT_STRATEGI_MELESTARI

William, E., & Zahra, A. (2025). SPEECH RECOGNITION DENGAN WHISPER DALAM BAHASA INDONESIA. Action Research Literate, 9(2), 386–397.

Zebua, R. S. Y., Khairunnisa, K., Hartatik, H., Pariyadi, P., Wahyuningtyas, D. P., Thantawi, A. M., Sudipa, I. G. I., Prayitno, H., Sumakul, G. C., Sepriano, S., & Kharisma, L. P. I. (2023). FENOMENA ARTIFICIAL INTELLIGENCE (AI) (Efitra (ed.); 1st ed., Issue June). PT. Sonpedia Publishing Indonesia. https://www.researchgate.net/publication/371491224

Zhang, Y., Han, W., Qin, J., Wang, Y., Bapna, A., Chen, Z., Chen, N., Li, B., Axelrod, V., Wang, G., Meng, Z., Hu, K., Rosenberg, A., Prabhavalkar, R., Park, D. S., Haghani, P., Riesa, J., Perng, G., Soltau, H., … Wu, Y. (2023). Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages. http://arxiv.org/abs/2303.01037

Downloads

Published

2025-07-16

How to Cite

[1]
“Optimisasi Whisper Speech-to-Text Bahasa Indonesia dengan Hybrid Cloud dan Multi-Engine”, ELKOM , vol. 18, no. 1, pp. 60–72, Jul. 2025, doi: 10.51903/5n2d3s08.