Enhancing Sentiment Analysis for Malayalam With mBERT: A Profoundly Transparent and Accurate Approach Using LIME
- 1 Department of Futures Studies, University of Kerala, Thiruvananthapuram, India
- 2 KSMDB College, Sasthamcotta, KOLLAM, India
- 3 ICFOSS, Thiruvananthapuram, India
- 4 Department of Computer Science, P. M. Govt. College, Chalakudy, India
Abstract
The overlapping sentiment boundaries, intensifiers, and intricate morphological structures in Malayalam present particular difficulties for sentiment analysis, making it hard for traditional machine learning techniques to produce consistent results. We present an explainable sentiment analysis framework in this paper that refines a Multilingual Bidirectional Encoder Representations from Transformers (mBERT) model on a novel constituency-level dataset that has been manually curated and annotated into five-class (very positive, positive, neutral, negative, and very negative) and three-class (positive, neutral, and negative) categories. In contrast to previous research that focuses solely on accuracy, our method incorporates Local Interpretable Model-Agnostic Explanations (LIME) to identify linguistic cues that significantly impact sentiment prediction in Malayalam, including intensifiers, negations, and context-dependent modifiers. Despite the inherent linguistic complexity, the suggested model demonstrated consistency, achieving 61.78% precision for three-class classification and 61.47% for five-class classification. More significantly, the LIME-based interpretability analysis provides a clear and linguistically grounded standard for low-resource sentiment analysis by highlighting the impact of Malayalam-specific features on classification results. In addition to presenting one of the earliest explainable BERT-based sentiment models for Malayalam, this work lays the groundwork for further studies on interpretable deep learning in underrepresented languages. As far as we know, the current work is the first to create an explainable, transformer-based sentiment analysis framework for Malayalam that incorporates BERT with LIME and is underpinned by a constituency-level curated dataset. This contribution sets a new standard for NLP in low-resource languages in terms of performance and explainability.
DOI: https://doi.org/10.3844/jcssp.2026.1666.1678
Copyright: © 2026 Anitha R., K. S. Anil Kumar, Rajeev R. R., Ansil Shafee, Manju G. and Reshmi L. B.. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 51 Views
- 16 Downloads
- 0 Citations
Download
Keywords
- Sentiment Analysis
- Malayalam
- BERT
- Explainable AI
- LIME
- NLP