SwePub
Sök i LIBRIS databas

  Utökad sökning

onr:"swepub:oai:research.chalmers.se:aacf790e-5aa1-4634-a06d-c6aed91ffb5b"
 

Sökning: onr:"swepub:oai:research.chalmers.se:aacf790e-5aa1-4634-a06d-c6aed91ffb5b" > Evaluating Surprise...

Evaluating Surprise Adequacy for Deep Learning System Testing

Kim, Jinhan (författare)
Korea Advanced Institute of Science and Technology (KAIST)
Feldt, Robert, 1972 (författare)
Chalmers tekniska högskola,Chalmers University of Technology
Yoo, Shin (författare)
Korea Advanced Institute of Science and Technology (KAIST)
 (creator_code:org_t)
2023-03-29
2023
Engelska.
Ingår i: ACM Transactions on Software Engineering and Methodology. - : Association for Computing Machinery (ACM). - 1049-331X .- 1557-7392. ; 32:2
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • The rapid adoption of Deep Learning (DL) systems in safety critical domains such as medical imaging and autonomous driving urgently calls for ways to test their correctness and robustness. Borrowing from the concept of test adequacy in traditional software testing, existing work on testing of DL systems initially investigated DL systems from structural point of view, leading to a number of coverage metrics. Our lack of understanding of the internal mechanism of Deep Neural Networks (DNNs), however, means that coverage metrics defined on the Boolean dichotomy of coverage are hard to intuitively interpret and understand. We propose the degree of out-of-distribution-ness of a given input as its adequacy for testing: the more surprising a given input is to the DNN under test, the more likely the system will show unexpected behavior for the input. We develop the concept of surprise into a test adequacy criterion, called Surprise Adequacy (SA). Intuitively, SA measures the difference in the behavior of the DNN for the given input and its behavior for the training data. We posit that a good test input should be sufficiently, but not overtly, surprising compared to the training dataset. This article evaluates SA using a range of DL systems from simple image classifiers to autonomous driving car platforms, as well as both small and large data benchmarks ranging from MNIST to ImageNet. The results show that the SA value of an input can be a reliable predictor of the correctness of the mode behavior. We also show that SA can be used to detect adversarial examples, and also be efficiently computed against large training dataset such as ImageNet using sampling.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Programvaruteknik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Software Engineering (hsv//eng)
NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
TEKNIK OCH TEKNOLOGIER  -- Elektroteknik och elektronik -- Datorsystem (hsv//swe)
ENGINEERING AND TECHNOLOGY  -- Electrical Engineering, Electronic Engineering, Information Engineering -- Computer Systems (hsv//eng)
NATURVETENSKAP  -- Data- och informationsvetenskap -- Datorseende och robotik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Vision and Robotics (hsv//eng)

Nyckelord

Test adequacy
deep learning systems

Publikations- och innehållstyp

art (ämneskategori)
ref (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy