SwePub
Sök i LIBRIS databas

  Utökad sökning

onr:"swepub:oai:DiVA.org:uu-152255"
 

Sökning: onr:"swepub:oai:DiVA.org:uu-152255" > Scalable Paralleliz...

Scalable Parallelization of Expensive Continuous Queries over Massive Data Streams

Zeitler, Erik, 1975- (författare)
Uppsala universitet,Avdelningen för datalogi,Datalogi,UDBL
Risch, Tore, Professor (preses)
Uppsala universitet,Avdelningen för datalogi
Rundensteiner, Elke (opponent)
Worcester Polytechnic Institute
 (creator_code:org_t)
ISBN 9789155480950
Uppsala : Acta Universitatis Upsaliensis, 2011
Engelska 35 s.
Serie: Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, 1651-6214 ; 836
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)
Abstract Ämnesord
Stäng  
  • Numerous applications in for example science, engineering, and financial analysis increasingly require online analysis over streaming data. These data streams are often of such a high rate that saving them to disk is not desirable or feasible. Therefore, search and analysis must be performed directly over the data in motion. Such on-line search and analysis can be expressed as continuous queries (CQs) that are defined over the streams. The result of a CQ is a stream itself, which is continuously updated as new data appears in the queried stream(s). In many cases, the applications require non-trivial analysis, leading to CQs involving expensive processing. To provide scalability of such expensive CQs over high-volume streams, the execution of the CQs must be parallelized.In order to investigate different approaches to parallel execution of CQs, a parallel data stream management system called SCSQ was implemented for this Thesis. Data and queries from space physics and traffic management applications are used in the evaluations, as well as synthetic data and the standard data stream benchmark; the Linear Road Benchmark. Declarative parallelization functions are introduced into the query language of SCSQ, allowing the user to specify customized parallelization. In particular, declarative stream splitting functions are introduced, which split a stream into parallel sub-streams, over which expensive CQ operators are continuously executed in parallel.Naïvely implemented, stream splitting becomes a bottleneck if the input streams are of high volume, if the CQ operators are massively parallelized, or if the stream splitting conditions are expensive. To eliminate this bottleneck, different approaches are investigated to automatically generate parallel execution plans for stream splitting functions. This Thesis shows that by parallelizing the stream splitting itself, expensive CQs can be processed at stream rates close to network speed. Furthermore, it is demonstrated how parallelized stream splitting allows orders of magnitude higher stream rates than any previously published results for the Linear Road Benchmark. 

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)

Nyckelord

Datavetenskap med inriktning mot databasteknik
Computer Science with specialization in Database Technology

Publikations- och innehållstyp

vet (ämneskategori)
dok (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy