IAES Nawala: Teknologi pemrosesan bahasa alami studi kasus Bahasa Indonesia

Salam, rekan Nawala! Semoga kalian selalu dalam keadaan sehat.

Ini adalah IAES Nawala dari Institute of Advanced Engineering and Science. Hari ini kami akan berbagi kabar berkaitan dengan pemrosesan bahasa alami (natural language processing,NLP). NLP adalah bagian dari pengembangan kecerdasan buatan yang berhubungan dengan interaksi antara komputer dan bahasa manusia. NLP berfokus pada kemampuan komputer untuk mengelola bahasa manusia dengan cara yang mirip seperti cara manusia memahami konteks tersebut. Segmentasi kalimat, yang memecah data tekstual menjadi beberapa kalimat, merupakan bagian yang penting dalam teknik NLP. Petrus dkk. (2023) mengusulkan sebuah sistem segmentasi kalimat yang disebut segmentasi kalimat bahasa Indonesia (SKBI) dengan menerapkan sekumpulan aturan dan dapat digunakan pada teks bahasa Indonesia dan dapat diadaptasi untuk bahasa Inggris. Seperti penggunaan titik, koma, tanda tanya, dan tanda seru yang merupakan pemisah kalimat. Hasil lebih detail telah dijelaskan pada artikel berikut.

An adaptable sentence segmentation based on Indonesian rules

Johannes Petrus, Ermatita Ermatita, Sukemi Sukemi, Erwin Erwin
Sentence segmentation that breaks textual data strings into individual sentences is an important phase in natural language processing (NLP). Each word in the string that is added a punctuation mark such as a period, question mark, or exclamation point, becomes the location for splitting the string. Humans can easily see the punctuation and split the string into sentences, but not machines. Basically, the three punctuation marks also perform other functions so that the sentence segmentation process must really be able to detect whether a word marked with punctuation is a sentence boundary or not. This research proposes a sentence segmentation system called segmentasi kalimat bahasa Indonesia (SKBI) or Indonesian language sentence segmentation by applying a set of rules and can be used in Indonesian texts and can be adapted for English. There are 34 rules built with a combination of 27 fairly complete features that contribute to this research. The experimental results for the Indonesian text show that the SKBI is able to achieve an F1-Score of 96.89% and 97.07% for English. Both need to be improved but now better than previous research.

Hayaty dkk. (2023) meneliti pengembangan NLP terhadap post di aplikasi X (sebelumnya Twitter). Mereka mengamati dampak penyematan kata pre-trained global vector (GloVe) terhadap akurasi pengelompokan kalimat kebencian atau bukan kebencian dalam teks bahasa Indonesia. Penelitian ini menemukan bahwa penggunaan pre-trained GloVe (teks dalam bahasa Indonesia) dan pengklasifikasi long short-term memory (LSTM) single dan multi-layer memiliki performa yang tahan terhadap overfitting dibandingkan dengan embedding yang sudah dilatih sebelumnya untuk deteksi ujaran kebencian. Nilai akurasinya adalah 81,5% pada single layer dan 80,9% pada LSTM double layer.

Hate speech detection on Indonesian text using word embedding method-global vector

Mardhiya Hayaty, Arif Dwi Laksito, Sumarni Adi
Hate speech is defined as communication directed toward a specific individual or group that involves hatred or anger and a language with solid arguments leading to someone’s opinion can cause social conflict. It has a lot of potential for individuals to communicate their thoughts on an online platform because the number of Internet users globally, including in Indonesia, is continually rising. This study aims to observe the impact of pre-trained global vector (GloVe) word embedding on accuracy in the classification of hate speech and non-hate speech. The use of pre-trained GloVe (Indonesian text) and single and multi-layer long short-term memory (LSTM) classifiers has performance that is resistant to overfitting compared to pre-trainable embedding for hatespeech detection. The accuracy value is 81.5% on a single layer and 80.9% on a double-layer LSTM. The following job is to provide pre-trained with formal and non-formal language corpus; pre-processing to overcome non-formal words is very challenging.

Di sisi lain, Garini dkk. (2023) mengukur kualitas layanan dari sebuah aplikasi mobile di Indonesia melalui ulasan pelanggan online menggunakan NLP. Metode yang digunakan adalah analisis sentimen dan pemodelan topik. Mereka menganalisis 20.452 ulasan dari Google Play Store dan Apple App Store untuk aplikasi bernama “myIndiHome”. Dari hasil dari penelitian ini, diketahui bahwa ada berbagai aspek yang memengaruhi ulasan positif dan negatif, seperti fitur aplikasi, produk/layanan, antarmuka aplikasi, ketersediaan, keandalan fitur, kecepatan pemrosesan, bug, dan keandalan. Pendekatan ini dapat membantu dalam memahami umpan balik pelanggan dan meningkatkan kualitas aplikasi seluler layanan mandiri di sektor telekomunikasi di Indonesia.

Using machine learning to improve a telco self-service mobile application in Indonesia

Jwalita Galuh Garini, Achmad Nizar Hidayanto, Agri Fina
The use of mobile applications extends to the telecommunication sector, mainly due to COVID-19. Failure to provide it can cause dissatisfaction and result in the removal of the mobile application. Moreover, this leads to lost service opportunities, so paying attention to the mobile application’s quality is essential. There has yet to be a study on measuring the service quality of a self-service mobile application in the telecommunication sector using online customer reviews. This study uses sentiment analysis and topic modeling to determine the service quality of a self-service mobile application in the telecommunication sector from reviews on Google Play Store and Apple App Store. This study uses myIndiHome as a case study. The total data obtained from both platforms are 20,452 reviews. Sentiment analysis was performed using Naïve Bayes, support vector machine, and logistic regression, while topic modeling was performed using latent dirichlet allocation. The results show that logistic regression performs better than support vector machine and Naïve Bayes. Meanwhile, topic modeling shows that the positive review data has three topics, including application features, products/services, and application interfaces. Moreover, the negative review data has five topics, including application availability, application feature reliability, application processing speed, bugs, and application reliability.

Beberapa artikel di atas merupakan bagian kecil dari penelitian mengenai natural language processing. Untuk mendapatkan informasi lebih lanjut, pembaca dapat mengunjungi laman IAES International Journal of Artificial Intelligence (IJ-AI) dan membaca artikel secara GRATIS melalui tautan berikut https://ijai.iaescore.com/.

Redaksi: I. Busthomi

Get fit with us Share this content

You Might Also Like

IAES Nawala: Blockchain feat Internet of Things?

Algoritma random forest (RF) untuk prediksi penyakit jantung

Perangkat IoT yang dapat dikenakan “Caring Jacket”

Share this content