A Hybridized BERT-Based Approach for Crime News Collection and Classification from Online Newspapers
- 1 Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia
Abstract
Crime news analysis is crucial for understanding criminal activity, enhancing public safety, and informing policy decisions. The exponential growth and unstructured nature of online news articles, however, present significant challenges for efficient and accurate information extraction. This study aims to enhance the efficiency and accuracy of crime news data collection and classification through advanced Natural Language Processing (NLP) techniques and pre-trained language models. We propose a hybridized approach that combines topic modelling, an external knowledge base, and a BERT-based pre-trained model fine-tuned specifically for crime-related content. Our comprehensive experiments demonstrate that this method significantly outperforms existing models, achieving a new state-of-the-art result with a 0.58% increase in accuracy for crime news classification. These findings underscore the practical applicability of our approach in real-world scenarios for improving public safety and crime awareness.
DOI: https://doi.org/10.3844/jcssp.2025.2000.2015
Copyright: © 2025 Ashour Ali, Shahrul Azman Mohd Noah, Lailatul Qadri Zakaria and Saeed Amer Al Ameri. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 82 Views
- 23 Downloads
- 0 Citations
Download
Keywords
- BERT
- Crime News Classification
- Natural Language Processing
- Web Scraping
- Topic Modeling
- Knowledge Bases
- Deep Learning
- Text Classification
- Data Filtering
- Online News