Sökning: onr:"swepub:oai:DiVA.org:kth-248377" >
Characterizing Deep...
Characterizing Deep-Learning I/O Workloads in TensorFlow
-
Chien, Steven W. D. (författare)
-
- Markidis, Stefano (författare)
- KTH,Beräkningsvetenskap och beräkningsteknik (CST)
-
- Sishtla, Chaitanya Prasad (författare)
- KTH,Beräkningsvetenskap och beräkningsteknik (CST)
-
visa fler...
-
Santos, Luis (författare)
-
- Herman, Pawel (författare)
- KTH,Beräkningsvetenskap och beräkningsteknik (CST)
-
Nrasimhamurthy, Sai (författare)
-
- Laure, Erwin (författare)
- KTH,Parallelldatorcentrum, PDC
-
visa färre...
-
(creator_code:org_t)
- Institute of Electrical and Electronics Engineers (IEEE), 2018
- 2018
- Engelska.
-
Ingår i: Proceedings of PDSW-DISCS 2018: 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 54-63
- Relaterad länk:
-
https://urn.kb.se/re...
-
visa fler...
-
https://doi.org/10.1...
-
visa färre...
Abstract
Ämnesord
Stäng
- The performance of Deep-Learning (DL) computing frameworks rely on the rformance of data ingestion and checkpointing. In fact, during the aining, a considerable high number of relatively small files are first aded and pre-processed on CPUs and then moved to accelerator for mputation. In addition, checkpointing and restart operations are rried out to allow DL computing frameworks to restart quickly from a eckpoint. Because of this, I/O affects the performance of DL plications. this work, we characterize the I/O performance and scaling of nsorFlow, an open-source programming framework developed by Google and ecifically designed for solving DL problems. To measure TensorFlow I/O rformance, we first design a micro-benchmark to measure TensorFlow ads, and then use a TensorFlow mini-application based on AlexNet to asure the performance cost of I/O and checkpointing in TensorFlow. To prove the checkpointing performance, we design and implement a burst ffer. find that increasing the number of threads increases TensorFlow ndwidth by a maximum of 2.3 x and 7.8 x on our benchmark environments. e use of the tensorFlow prefetcher results in a complete overlap of mputation on accelerator and input pipeline on CPU eliminating the fective cost of I/O on the overall performance. The use of a burst ffer to checkpoint to a fast small capacity storage and copy ynchronously the checkpoints to a slower large capacity storage sulted in a performance improvement of 2.6x with respect to eckpointing directly to slower storage on our benchmark environment.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Datorteknik (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Computer Engineering (hsv//eng)
Nyckelord
- Parallel I/O
- Input Pipeline
- Deep Learning
- TensorFlow
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)