Arabic Fake News Detection Across Generational Text Representations: From Traditional Models to Transformer-Based Methodologies

Noor M. Alkudah

doi:10.3844/jcssp.2026.1313.1329

Research Article Open Access

Arabic Fake News Detection Across Generational Text Representations: From Traditional Models to Transformer-Based Methodologies

Noor M. Alkudah¹

¹ Department of Computer Science, Faculty of Information Technology, The World Islamic Sciences and Education University, Amman, Jordan

Abstract

The rapid proliferation of fake news on Arabic social media has amplified societal and political risks, yet research on automatic detection in Arabic remains limited due to scarce datasets, morphological complexity, and underexplored preprocessing strategies. This study presents a comprehensive benchmark for Arabic fake news detection, unifying seven Machine Learning (ML) algorithms, three Deep Learning (DL) models, and a transformer-based approach (AraBERT) under consistent experimental conditions. A hybrid balanced dataset of 4,838 tweets was constructed from ArCOV19-Rumors, AraCOVID19-MFH, and NLP4IF-2021. Three levels of preprocessing were systematically evaluated: Primitive cleaning and tokenization, named entity recognition (NER), and NER with stemming. The results show a clear change in representation: TF-IDF gives strong lexical baselines, AraVec gives moderate gains through static embeddings, AraBERT embeddings give big improvements through contextualization, and fine-tuned AraBERT gets the best results (Accuracy/F1 ≈ 0.95). A comparative analysis shows that SVM is the best ML algorithm, Bi-LSTM is the best DL model, and contextual embeddings have a huge effect on all families. Preprocessing strategies have different effects on different types of models. For example, stemming helps ML but hurts DL, while NER always helps both. This study provides solid baselines, methodological insights, and a generational perspective on Arabic text representations, establishing a foundation for future research aimed at combating misinformation in Arabic NLP.

Journal of Computer Science

Volume 22 No. 4, 2026, 1313-1329

DOI: https://doi.org/10.3844/jcssp.2026.1313.1329

Submitted On: 21 August 2025 Published On: 17 April 2026

How to Cite: Alkudah, N. M. (2026). Arabic Fake News Detection Across Generational Text Representations: From Traditional Models to Transformer-Based Methodologies. Journal of Computer Science, 22(4), 1313-1329. https://doi.org/10.3844/jcssp.2026.1313.1329

Copyright: © 2026 Noor M. Alkudah. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

35 Views
7 Downloads
0 Citations

Download

Keywords

Arabic Fake News
Generational Benchmarking
Hybrid Datasets
NER
Stemming
AraBERT