SwePub
Sök i LIBRIS databas

  Utökad sökning

id:"swepub:oai:DiVA.org:bth-22433"
 

Sökning: id:"swepub:oai:DiVA.org:bth-22433" > An empirical study ...

An empirical study on the effectiveness of data resampling approaches for cross‐project software defect prediction

Bennin, Kwabena Ebo (författare)
Wageningen Univ & Res, NLD
Tahir, Amjed (författare)
Massey University, NZL
MacDonell, Stephen G. (författare)
University of Otago, NZL
visa fler...
Börstler, Jürgen, 1960- (författare)
Blekinge Tekniska Högskola,Institutionen för programvaruteknik,Blekinge Institute of Technology Karlskrona Sweden
visa färre...
 (creator_code:org_t)
2021-11-28
2022
Engelska.
Ingår i: IET Software. - : John Wiley & Sons. - 1751-8806 .- 1751-8814. ; 16:2, s. 185-199
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • Cross‐project defect prediction (CPDP), where data from different software projects are used to predict defects, has been proposed as a way to provide data for software projects that lack historical data. Evaluations of CPDP models using the Nearest Neighbour (NN)Filter approach have shown promising results in recent studies. A key challenge with defect‐prediction datasets is class imbalance, that is, highly skewed datasets where nonbuggy modules dominate the buggy modules. In the past, data resampling approaches have been applied to within‐projects defect prediction models to help alleviate the negative effects of class imbalance in the datasets. To address the class imbalance issue in CPDP, the authors assess the impact of data resampling approaches on CPDP models after the NN Filter is applied. The impact on prediction performance of five oversampling approaches (MAHAKIL, SMOTE, Borderline‐SMOTE, Random Oversamplingand ADASYN) and three undersampling approaches (Random Undersampling, Tomek Links and One‐sided selection) is investigated and results are compared to approaches without data resampling. The authors examined six defect prediction models on34 datasets extracted from the PROMISE repository. The authors' results show that there is a significant positive effect of data resampling on CPDP performance, suggesting that software quality teams and researchers should consider applying data resampling approaches for improved recall (pd) and g‐measure prediction performance. However, if the goal is to improve precision and reduce false alarm (pf) then data resampling approaches should be avoided.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Programvaruteknik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Software Engineering (hsv//eng)

Nyckelord

Defect prediction
software metrics
software quality
Programvaruteknik
Software Engineering
Computer Science
Datavetenskap

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy