QSpark: Distributed Execution of Batch & Streaming Analytics in Spark Platform

↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Sökning: WFRF:(Zomaya Albert) > QSpark :

QSpark : Distributed Execution of Batch & Streaming Analytics in Spark Platform

HoseinyFarahabady, M. Reza (författare): The University of Sydney, AUS

Taheri, Javid (författare): Karlstads universitet,Institutionen för matematik och datavetenskap (from 2013)

Zomaya, Albert Y. (författare): The University of Sydney, AUS

visa fler...

Tari, Zahir (författare): RMIT University, AUS

visa färre...

(creator_code:org_t)

IEEE, 2021
2021
Engelska.
Ingår i: 2021 IEEE 20TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA). - : IEEE. - 9781665495509

Relaterad länk:: https://urn.kb.se/re...; visa fler...; https://doi.org/10.1...; visa färre...

Konferensbidrag (refereegranskat)

Abstract Ämnesord

Stäng

A significant portion of research work in the past decade has been devoted on developing resource allocation and task scheduling solutions for large-scale data processing platforms. Such algorithms are designed to facilitate deployment of data analytic applications across either conventional cluster computing systems or modern virtualized data-centers. The main reason for such a huge research effort stems from the fact that even a slight improvement in the performance of such platforms can bring a considerable monetary savings for vendors, especially for modern data processing engines that are designed solely to perform high throughput or/and low-latency computations over massive-scale batch or streaming data. A challenging question to be yet answered in such a context is to design an effective resource allocation solution that can prevent low resource utilization while meeting the enforced performance level (such as 99-th latency percentile) in circumstances where contention among applications to obtain the capacity of shared resources is a non negligible performance-limiting parameter. This paper proposes a resource controller system, called QSpark, to cope with the problem of (i) low performance (i.e., resource utilization in the batch mode and p-99 response time in the streaming mode), and (ii) the shared resource interference among collocated applications in a multi-tenancy modern Spark platform. The proposed solution leverages a set of controlling mechanisms for dynamic partitioning of the allocation of computing resources, in a way that it can fulfill the QoS requirements of latency-critical data processing applications, while enhancing the throughput for all working nodes without reaching their saturation points. Through extensive experiments in our in-house Spark cluster, we compared the achieved performance of proposed solution against the default Spark resource allocation policy for a variety of Machine Learning (ML), Artificial Intelligence (AI), and Deep Learning (DL) applications. Experimental results show the effectiveness of the proposed solution by reducing the p-99 latency of high priority applications by 32% during the burst traffic periods (for both batch and stream modes), while it can enhance the QoS satisfaction level by 65% for applications with the highest priority (compared with the results of default Spark resource allocation strategy).

Hitta via bibliotek

2021 IEEE 20TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NC... (Sök värdpublikationen i LIBRIS)

Till lärosätets databas

Hitta mer i SwePub

Av författaren/redakt...: HoseinyFarahabad ...; Taheri, Javid; Zomaya, Albert Y ...; Tari, Zahir

Om ämnet

NATURVETENSKAP: NATURVETENSKAP; och Data och informa ...; och Datavetenskap

Artiklar i publikationen: 2021 IEEE 20TH I ...

Av lärosätet: Karlstads universitet

Sök utanför SwePub

Sök vidare i:: Google; Google Book Search; Google Scholar

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se

QSpark : Distributed Execution of Batch & Streaming Analytics in Spark Platform

Ämnesord

Nyckelord

Publikations- och innehållstyp

Hitta via bibliotek

Till lärosätets databas

Hitta mer i SwePub

Sök utanför SwePub