Data Streams Curation for Better Machine Learning Functionality and Result to Serve IoT and other Applications: A Survey
- 1 Princess Sumaya University for Technology (PSUT), Jordan
- 2 INTRASOFT MIDDLE EAST, Jordan
Abstract
Data Curation on data streams is effective in operating and reducing costs of BIG DATA analytic. Basically, analytic preparation requires data curation of available heterogeneous data sets available in big data clusters and such analytic process becomes harder when it comes to the concept of conducting the curation process on Data-on-Motion, in order to come at actionable insights and valuable analytic on a real-time basis including the Machine Learning further analytic and processing. In our paper, we identified and surveyed the different issues and challenges among different areas that are related to the big data. In addition to investigate, the most common techniques and methods followed through the implementations including Streams Curation, the Machine Learning Different Algorithms used in such implementations and the Feature Engineering different techniques that can be considered as curation pre-processing paradigm for data streams analytic. Furthermore, our paper shows the different application areas were data curation concept plays a critical role. Finally, we draw the map between the techniques and methods that are related to the data curation field to emphasize on its main critical role among Business, Retails, Culture, Arts, Health, Medicine, Social Media, Wireless Sensor Networks, Natural Language Processing (NLP) and Automated Feature Engineering (FE). On other hand, we identified the different issues and challenges among different areas including the IoT and Media Streams Curation to help the scholars in this region accordingly.
DOI: https://doi.org/10.3844/jcssp.2019.1572.1584
Copyright: © 2019 Haya Salah, Islam Al-Omari, Jaber Alwidian, Rashed Al-Hamadin and Tariq Tawalbeh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 4,546 Views
- 1,852 Downloads
- 6 Citations
Download
Keywords
- Data Curation
- Data Streaming
- Data Ingestion
- Big Data
- Machine Learning