Sökning: onr:"swepub:oai:DiVA.org:kau-95940" >
DQSOps :
DQSOps : Data Quality Scoring Operations Framework for Data-Driven Applications
-
- Bayram, Firas (författare)
- Karlstads universitet,Institutionen för matematik och datavetenskap (from 2013)
-
- Ahmed, Bestoun S., 1982- (författare)
- Karlstads universitet,Institutionen för matematik och datavetenskap (from 2013)
-
- Hallin, Erik (författare)
- Uddeholms AB, Sweden
-
visa fler...
-
- Engman, Anton (författare)
- Uddeholms AB, Sweden
-
visa färre...
-
(creator_code:org_t)
- Association for Computing Machinery (ACM), 2023
- 2023
- Engelska.
-
Ingår i: EASE '23: Proceedings of the 27<sup>th</sup> International Conference on Evaluation and Assessment in Software Engineering. - : Association for Computing Machinery (ACM). - 9798400700446 ; , s. 32-41
- Relaterad länk:
-
https://doi.org/10.1...
-
visa fler...
-
https://kau.diva-por... (primary) (Raw object)
-
https://urn.kb.se/re...
-
https://doi.org/10.1...
-
visa färre...
Abstract
Ämnesord
Stäng
- Data quality assessment has become a prominent component in the successful execution of complex data-driven artificial intelligence (AI) software systems. In practice, real-world applications generate huge volumes of data at speeds. These data streams require analysis and preprocessing before being permanently stored or used in a learning task. Therefore, significant attention has been paid to the systematic management and construction of high-quality datasets. Nevertheless, managing voluminous and high-velocity data streams is usually performed manually (i.e. offline), making it an impractical strategy in production environments. To address this challenge, DataOps has emerged to achieve life-cycle automation of data processes using DevOps principles. However, determining the data quality based on a fitness scale constitutes a complex task within the framework of DataOps. This paper presents a novel Data Quality Scoring Operations (DQSOps) framework that yields a quality score for production data in DataOps workflows. The framework incorporates two scoring approaches, an ML prediction-based approach that predicts the data quality score and a standard-based approach that periodically produces the ground-truth scores based on assessing several data quality dimensions. We deploy the DQSOps framework in a real-world industrial use case. The results show that DQSOps achieves significant computational speedup rates compared to the conventional approach of data quality scoring while maintaining high prediction performance.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Programvaruteknik (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Software Engineering (hsv//eng)
- NATURVETENSKAP -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
- TEKNIK OCH TEKNOLOGIER -- Elektroteknik och elektronik -- Datorsystem (hsv//swe)
- ENGINEERING AND TECHNOLOGY -- Electrical Engineering, Electronic Engineering, Information Engineering -- Computer Systems (hsv//eng)
Nyckelord
- Data reduction
- Quality control
- Automated data
- Automated data scoring
- Data assessment
- Data quality; Data quality dimensions
- Data stream
- Data-driven applications
- Dataops
- Mutation testing
- Real-world
- Life cycle
- Computer Science
- Datavetenskap
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)
Hitta via bibliotek
Till lärosätets databas