Search: WFRF:(Bayram S)
> (2020-2023) >
DQSOps :
DQSOps : Data Quality Scoring Operations Framework for Data-Driven Applications
-
- Bayram, Firas (author)
- Karlstads universitet,Institutionen för matematik och datavetenskap (from 2013)
-
- Ahmed, Bestoun S., 1982- (author)
- Karlstads universitet,Institutionen för matematik och datavetenskap (from 2013)
-
- Hallin, Erik (author)
- Uddeholms AB, Sweden
-
show more...
-
- Engman, Anton (author)
- Uddeholms AB, Sweden
-
show less...
-
(creator_code:org_t)
- Association for Computing Machinery (ACM), 2023
- 2023
- English.
-
In: EASE '23: Proceedings of the 27<sup>th</sup> International Conference on Evaluation and Assessment in Software Engineering. - : Association for Computing Machinery (ACM). - 9798400700446 ; , s. 32-41
- Related links:
-
https://doi.org/10.1...
-
show more...
-
https://kau.diva-por... (primary) (Raw object)
-
https://urn.kb.se/re...
-
https://doi.org/10.1...
-
show less...
Abstract
Subject headings
Close
- Data quality assessment has become a prominent component in the successful execution of complex data-driven artificial intelligence (AI) software systems. In practice, real-world applications generate huge volumes of data at speeds. These data streams require analysis and preprocessing before being permanently stored or used in a learning task. Therefore, significant attention has been paid to the systematic management and construction of high-quality datasets. Nevertheless, managing voluminous and high-velocity data streams is usually performed manually (i.e. offline), making it an impractical strategy in production environments. To address this challenge, DataOps has emerged to achieve life-cycle automation of data processes using DevOps principles. However, determining the data quality based on a fitness scale constitutes a complex task within the framework of DataOps. This paper presents a novel Data Quality Scoring Operations (DQSOps) framework that yields a quality score for production data in DataOps workflows. The framework incorporates two scoring approaches, an ML prediction-based approach that predicts the data quality score and a standard-based approach that periodically produces the ground-truth scores based on assessing several data quality dimensions. We deploy the DQSOps framework in a real-world industrial use case. The results show that DQSOps achieves significant computational speedup rates compared to the conventional approach of data quality scoring while maintaining high prediction performance.
Subject headings
- NATURVETENSKAP -- Data- och informationsvetenskap -- Programvaruteknik (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Software Engineering (hsv//eng)
- NATURVETENSKAP -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
- TEKNIK OCH TEKNOLOGIER -- Elektroteknik och elektronik -- Datorsystem (hsv//swe)
- ENGINEERING AND TECHNOLOGY -- Electrical Engineering, Electronic Engineering, Information Engineering -- Computer Systems (hsv//eng)
Keyword
- Data reduction
- Quality control
- Automated data
- Automated data scoring
- Data assessment
- Data quality; Data quality dimensions
- Data stream
- Data-driven applications
- Dataops
- Mutation testing
- Real-world
- Life cycle
- Computer Science
- Datavetenskap
Publication and Content Type
- ref (subject category)
- kon (subject category)
Find in a library
To the university's database