A Multi-Split Cross-Strategy for Enhancing Machine Learning Algorithms Prediction Results with Data Generated by Conditional Generative Adversarial Network
- 1 Department of Computer Science, Ecole Nationale Supérieure d'Arts et Métiers (ENSAM-MEKNES), Moulay Ismail University, Meknes, Morocco
- 2 Department of Computer Science, Ecole Nationale Supérieure d'Arts et Métiers (ENSAM-MEKNES), Moulay Ismail University, Meknes, Morocco
- 3 Department of Computer Science, Regional Center for Teaching and Training Professions, Meknes, Morocco
- 4 Euromed Center of Research, Euromed Polytechnic School, Euromed University, FEZ, Morocco
Abstract
In this study, we present a Multi-Split Cross-Strategy (MSC-Strategy) designed to leverage synthetic tabular data generated by a Conditional Generative Adversarial Network (CGAN). Our study aims to investigate the potential of synthetic data in comparison to real-world data for improving machine learning predictive results. Firstly, we develop a CGAN architecture tailored to generate synthetic tabular data, trained on a comprehensive real-world dataset. Secondly, we validate the synthetic data generated by the CGAN to ensure its statistical fidelity and resemblance to the distribution of real data. Finally, we selectively leverage a subset of the generated data and apply our strategy to create a new combined training set comprising the training set of real data and the chosen subset of generated data. To validate our approach, we employ six diverse regression models: Decision Tree (DT), K-Nearest Neighbors (KNN), Random Forest (RF), XGB Regressor (XGB), and Support Vector Regressor (SVR). Each model is trained and tested using a training set of real data, generated data, combined data (training set of real data and generated data), and data formed by our MSC strategy. Our findings indicate that the training set formed by our MSC strategy demonstrates remarkable predictive performance compared to real-world data and generated data, highlighting its ability to enhance the prediction of machine learning models using only a subset of generated data.
DOI: https://doi.org/10.3844/jcssp.2024.700.707
Copyright: © 2024 Abdelfattah Abassi, Brahim Bakkas, Mostapha El Jai, Ahmed Arid and Hussain Benazza. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 2,013 Views
- 798 Downloads
- 0 Citations
Download
Keywords
- Conditional Generative Adversarial Networks
- Tabular Data Generation
- Machine Learning