SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Zomaya Albert)
 

Sökning: WFRF:(Zomaya Albert) > QSpark :

QSpark : Distributed Execution of Batch & Streaming Analytics in Spark Platform

HoseinyFarahabady, M. Reza (författare)
The University of Sydney, AUS
Taheri, Javid (författare)
Karlstads universitet,Institutionen för matematik och datavetenskap (from 2013)
Zomaya, Albert Y. (författare)
The University of Sydney, AUS
visa fler...
Tari, Zahir (författare)
RMIT University, AUS
visa färre...
 (creator_code:org_t)
IEEE, 2021
2021
Engelska.
Ingår i: 2021 IEEE 20TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA). - : IEEE. - 9781665495509
  • Konferensbidrag (refereegranskat)
Abstract Ämnesord
Stäng  
  • A significant portion of research work in the past decade has been devoted on developing resource allocation and task scheduling solutions for large-scale data processing platforms. Such algorithms are designed to facilitate deployment of data analytic applications across either conventional cluster computing systems or modern virtualized data-centers. The main reason for such a huge research effort stems from the fact that even a slight improvement in the performance of such platforms can bring a considerable monetary savings for vendors, especially for modern data processing engines that are designed solely to perform high throughput or/and low-latency computations over massive-scale batch or streaming data. A challenging question to be yet answered in such a context is to design an effective resource allocation solution that can prevent low resource utilization while meeting the enforced performance level (such as 99-th latency percentile) in circumstances where contention among applications to obtain the capacity of shared resources is a non negligible performance-limiting parameter. This paper proposes a resource controller system, called QSpark, to cope with the problem of (i) low performance (i.e., resource utilization in the batch mode and p-99 response time in the streaming mode), and (ii) the shared resource interference among collocated applications in a multi-tenancy modern Spark platform. The proposed solution leverages a set of controlling mechanisms for dynamic partitioning of the allocation of computing resources, in a way that it can fulfill the QoS requirements of latency-critical data processing applications, while enhancing the throughput for all working nodes without reaching their saturation points. Through extensive experiments in our in-house Spark cluster, we compared the achieved performance of proposed solution against the default Spark resource allocation policy for a variety of Machine Learning (ML), Artificial Intelligence (AI), and Deep Learning (DL) applications. Experimental results show the effectiveness of the proposed solution by reducing the p-99 latency of high priority applications by 32% during the burst traffic periods (for both batch and stream modes), while it can enhance the QoS satisfaction level by 65% for applications with the highest priority (compared with the results of default Spark resource allocation strategy).

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)

Nyckelord

Streaming Big Data Processing
Quality of Service (QoS)
Iterative Computation
Dynamic resource allocation
Computer Science
Datavetenskap

Publikations- och innehållstyp

ref (ämneskategori)
kon (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy