A Deep Learning Approach for Telugu Domain Identification with Multichannel LSTM-CNN
- 1 Electronics and Communication Engineering Department, Sathyabama Institute of Science and Technology, Chennai, India
- 2 Department of Electronics and Communication Engineering, Aditya University, Surampalem, India
- 3 Department of Electronics and Communication Engineering, Vignan's Institute Of Information Technology, Visakhapatnam, India
- 4 Department of Electronics and Communication Engineering, Vasavi College of Engineering, Hyderabad, Telangana, India
- 5 Department of Electronics and Communication Engineering , Vignan's Institute of Engineering for Women, Visakhapatnam, India
Abstract
The vast growth of textual data has ushered into the limelight, a plethora of applications in information retrieval and natural language processing (NLP). Proper extraction of information from text is heavily dependent on recognizing the thematic content, which becomes crucial in the tasks of document summarization, information extraction, question answering, machine translation, and sentiment analysis. The great complexity of this challenge arises for regional languages such as Telugu, where unique linguistic features demand specialized approaches. In this work, we propose a Telugu Technical Domain Identification model based on a Multichannel Long Short-Term Memory Convolutional Neural Network (LSTM-CNN) architecture. This methodology benefits from the sequential data treatment capabilities of LSTM combined with the local feature extractive powers of CNN, which enable effective domain identification in Telugu texts. The model was assessed at the ICON Shared Challenge "TechDOfication 2020," scoring an F1 score of 90.01% on the validation set and 69.90% on the test set. The results indicate a great improvement over conventional models and show the tremendous efficacy of multichannel deep learning techniques for domain identification in Telugu. The proposed model will serve as a milestone toward enhancing NLP applications for regional languages while providing a scalable solution to the heightened demands for accurate thematic classification of techno-domain risks.
DOI: https://doi.org/10.3844/jcssp.2025.2181.2190
Copyright: © 2025 Buddha Hari Kumar, Chitra Perumal, Inakoti Ramesh Raja, Chukka Ramesh Babu, Srinivas Rao Gorre and Santosh Tripurana. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 91 Views
- 16 Downloads
- 0 Citations
Download
Keywords
- Natural Language Processing (NLP)
- Multichannel LSTMCNN
- Long Short-Term Memory (LSTM)
- Text Summarization
- Multilingual Text Processing
- F1 Score