Research Article Open Access

Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models

Ghizlane Bourahouat1, Manar Abourezq1 and Najima Daoudi1
  • 1 ITQAN Team, LyRICA Laboratory, School of Information Science, Morocco

Abstract

This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects.

Journal of Computer Science
Volume 20 No. 2, 2024, 157-167

DOI: https://doi.org/10.3844/jcssp.2024.157.167

Submitted On: 15 October 2023 Published On: 27 December 2023

How to Cite: Bourahouat, G., Abourezq, M. & Daoudi, N. (2024). Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models. Journal of Computer Science, 20(2), 157-167. https://doi.org/10.3844/jcssp.2024.157.167

  • 1,396 Views
  • 804 Downloads
  • 1 Citations

Download

Keywords

  • ANLP
  • Embedding
  • Arabic
  • Transformer
  • Sentiment Analysis
  • CNN and SVM