SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Risch Tore Professor) "

Sökning: WFRF:(Risch Tore Professor)

  • Resultat 1-10 av 16
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Mahmood, Khalid (författare)
  • Scalable Data Management for Internet of Things
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Internet of Things (IoT) often involve considerable numbers of sensors that produce large volumes of data. In this context, efficient management of data could potentially enable automatic decision making based on analytics of sensors on equipment. However, these sensors are often geographically distributed and generate diverse formats of data in form of sensor streams at a high rate. The combination of these properties of IoT pose significant challenges for the existing database management systems (DBMSs) to provide scalable data storage and analytics.The problem of providing efficient data management of distributed IoT applications using DBMS technologies is addressed in this thesis. Initially, we developed a prototype system, Fused LOg database Query Processor (FLOQ), which enables general query processingover collections of relational databases that are deployed locally on distributed sites to store sensor measurement logs. Although FLOQ provides efficient query execution when scaling the number of distributed databases, it exhibits complexity and scalability issues for large IoT applications having heterogeneous data. The limitations of FLOQ are primarily inherent to its use of relational database backends for storage of sensor logs.When a relational database is used to store large-scale IoT data, it exhibits several challenges. The loading of massive logs produced at high rates is not fast enough due to its strong consistency mechanisms. Furthermore, it could demonstrate a single point of failure that limits the availability, and the inflexible schemas make it difficult to manage heterogeneity. In contrast to relational databases, distributed NoSQL data stores could provide scalable storage of heterogeneous data through data partitioning, replication, and high availability by sacrificing strong consistency. To understand the suitability of NoSQL databases, this thesis also investigates to what degree NoSQL DBMSs provide scalable storage and analytics of IoT applications by comparing a variety of state-of-the-art relational and NoSQL databases for real-world industrial IoT data. The experimental evaluations reveal that the scalability can be provided by the distributed NoSQL data stores; however, the support of advanced data analytics is difficult due to their limited query processing capabilities. Furthermore, data management of distributed IoT applications often requires seamless integration between a real-time edge analytics platform, a distributed storage manager, effective data integration, and query processing techniques for handling heterogeneity. Therefore, in order to provide a holistic data management solution, this thesis developed the Extended Query Processing (EQP) system, which enables advanced analytics for supporting both edge and offline analytics for large-scale IoT applications.These contributions enable efficient data management of large-scale heterogeneous IoT applications and supports advanced analytics.
  •  
2.
  • Andrejev, Andrej, 1980- (författare)
  • Semantic Web Queries over Scientific Data
  • 2016
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Semantic Web and Linked Open Data provide a potential platform for interoperability of scientific data, offering a flexible model for providing machine-readable and queryable metadata. However, RDF and SPARQL gained limited adoption within the scientific community, mainly due to the lack of support for managing massive numeric data, along with certain other important features – such as extensibility with user-defined functions, query modularity, and integration with existing environments and workflows.We present the design, implementation and evaluation of Scientific SPARQL – a language for querying data and metadata combined, represented using the RDF graph model extended with numeric multidimensional arrays as node values – RDF with Arrays. The techniques used to store RDF with Arrays in a scalable way and process Scientific SPARQL queries and updates are implemented in our prototype software – Scientific SPARQL Database Manager, SSDM, and its integrations with data storage systems and computational frameworks. This includes scalable storage solutions for numeric multidimensional arrays and an efficient implementation of array operations. The arrays can be physically stored in a variety of external storage systems, including files, relational databases, and specialized array data stores, using our Array Storage Extensibility Interface. Whenever possible SSDM accumulates array operations and accesses array contents in a lazy fashion.In scientific applications numeric computations are often used for filtering or post-processing the retrieved data, which can be expressed in a functional way. Scientific SPARQL allows expressing common query sub-tasks with functions defined as parameterized queries. This becomes especially useful along with functional language abstractions such as lexical closures and second-order functions, e.g. array mappers.Existing computational libraries can be interfaced and invoked from Scientific SPARQL queries as foreign functions. Cost estimates and alternative evaluation directions may be specified, aiding the construction of better execution plans. Costly array processing, e.g. filtering and aggregation, is thus preformed on the server, saving the amount of communication. Furthermore, common supported operations are delegated to the array storage back-ends, according to their capabilities. Both expressivity and performance of Scientific SPARQL are evaluated on a real-world example, and further performance tests are run using our mini-benchmark for array queries.
  •  
3.
  • André-Jönsson, Henrik, 1968- (författare)
  • Indexing strategies for time series data
  • 2002
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Traditionally, databases have stored textual data and have been used to store administrative information. The computers used. and more specifically the storage available, have been neither large enough nor fast enough to allow databases to be used for more technical applications. In recent years these two bottlenecks have started to di sappear and there is an increasing interest in using databases to store non-textual data like sensor measurements or other types of process-related data. In a database a sequence of sensor measurements can be represented as a time series. The database can then be queried to find, for instance, subsequences, extrema points, or the points in time at which the time series had a specific value. To make this search efficient, indexing methods are required. Finding appropriate indexing methods is the focus of this thesis.There are two major problems with existing time series indexing strategies: the size of the index structures and the lack of general indexing strategies that are application independent. These problems have been thoroughly researched and solved in the case of text indexing files. We have examined the extent to which text indexing methods can be used for indexing time series.A method for transforming time series into text sequences has been investigated. An investigation was then made on how text indexing methods can be applied on these text sequences. We have examined two well known text indexing methods: the signature files and the B-tree. A study has been made on how these methods can be modified so that they can be used to index time series. We have also developed two new index structures, the signature tree and paged trie structures. For each index structure we have constructed cost and size models. resulting in comparisons between the different approaches.Our tests indicate that the indexing method we have developed. together with the B-tree structure. produces good results. It is possible to search for and find sub-sequences of very large time series efficiently.The thesis also discusses what future issues will have to be investigated for these techniques to be usable in a control system relying on time-series indexing to identify control modes.
  •  
4.
  • Fomkin, Ruslan, 1978- (författare)
  • Optimization and Execution of Complex Scientific Queries
  • 2009
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Large volumes of data produced and shared within scientific communities are analyzed by many researchers to investigate different scientific theories. Currently the analyses are implemented in traditional programming languages such as C++. This is inefficient for research productivity, since it is difficult to write, understand, and modify such programs. Furthermore, programs should scale over large data volumes and analysis complexity, which further complicates code development. This Thesis investigates the use of database technologies to implement scientific applications, in which data are complex objects describing measurements of independent events and the analyses are selections of events by applying conjunctions of complex numerical filters on each object separately. An example of such an application is analyses for the presence of Higgs bosons in collision events produced by the ATLAS experiment. For efficient implementation of such an ATLAS application, a new data stream management system SQISLE is developed. In SQISLE queries are specified over complex objects which are efficiently streamed from sources through the query engine. This streaming approach is compared with the conventional approach to load events into a database before querying. Since the queries implementing scientific analyses are large and complex, novel techniques are developed for efficient query processing. To obtain efficient plans for such queries SQISLE implements runtime query optimization strategies, which during query execution collect runtime statistics for a query, reoptimize the query using the collected statistics, and dynamically switch optimization strategies. The cost-based optimization utilizes a novel cost model for aggregate functions over nested subqueries. To alleviate estimation errors in large queries the fragments are decomposed into conjunctions of subqueries over which runtime statistics are measured. Performance is further improved by query transformation, view materialization, and partial evaluation. ATLAS queries in SQISLE using these query processing techniques perform close to or better than hard-coded C++ implementations of the same analyses. Scientific data are often stored in Grids, which manage both storage and computational resources. This Thesis includes a framework POQSEC that utilizes Grid resources to scale scientific queries over large data volumes by parallelizing the queries and shipping the data management system itself, e.g. SQISLE, to Grid computational nodes for the parallel query execution.
  •  
5.
  • Ivanova, Milena, 1967- (författare)
  • Scalable Scientific Stream Query Processing
  • 2005
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Scientific applications require processing of high-volume on-line streams of numerical data from instruments and simulations. In order to extract information and detect interesting patterns in these streams scientists need to perform on-line analyses including advanced and often expensive numerical computations. We present an extensible data stream management system, GSDM (Grid Stream Data Manager) that supports scalable and flexible continuous queries (CQs) on such streams. Application dependent streams and query functions are defined through an object-relational model. Distributed execution plans for continuous queries are specified as high-level data flow distribution templates. A built-in template library provides several common distribution patterns from which complex distribution patterns are constructed. Using a generic template we define two customizable partitioning strategies for scalable parallel execution of expensive stream queries: window split and window distribute. Window split provides parallel execution of expensive query functions by reducing the size of stream data units using application dependent functions as parameters. By contrast, window distribute provides customized distribution of entire data units without reducing their size. We evaluate these strategies for a typical high volume scientific stream application and show that window split is favorable when expensive queries are executed on limited resources, while window distribution is better otherwise. Profile-based optimization automatically generates optimized plans for a class of expensive query functions. We further investigate requirements for GSDM in Grid environments. GSDM is a fully functional system for parallel processing of continuous stream queries. GSDM includes components such as a continuous query engine based on a data-driven data flow paradigm, a compiler of CQ specifications into distributed execution plans, stream interfaces and communication primitives. Our experiments with real scientific streams on a shared-nothing architecture show the importance of both efficient processing and communication for efficient and scalable distributed stream processing.
  •  
6.
  • Katchaounov, Timour, 1969- (författare)
  • Query Processing for Peer Mediator Databases
  • 2003
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The ability to physically interconnect many distributed, autonomous and heterogeneous software systems on a large scale presents new opportunities for sharing and reuse of existing, and for the creataion of new information and new computational services. However, finding and combining information in many such systems is a challenge even for the most advanced computer users. To address this challenge, mediator systems logically integrate many sources to hide their heterogeneity and distribution and give the users the illusion of a single coherent system. Many new areas, such as scientific collaboration, require cooperation between many autonomous groups willing to share their knowledge. These areas require that the data integration process can be distributed among many autonomous parties, so that large integration solutions can be constructed from smaller ones. For this we propose a decentralized mediation architecture, peer mediator systems (PMS), based on the peer-to-peer (P2P) paradigm. In a PMS, reuse of human effort is achieved through logical composability of the mediators in terms of other mediators and sources by defining mediator views in terms of views in other mediators and sources. Our thesis is that logical composability in a P2P mediation architecture is an important requirement and that composable mediators can be implemented efficiently through query processing techniques. In order to compute answers of queries in a PMS, logical mediator compositions must be translated to query execution plans, where mediators and sources cooperate to compute query answers. The focus of this dissertation is on query processing methods to realize composability in a PMS architecture in an efficient way that scales over the number of mediators. Our contributions consist of an investigation of the interfaces and capabilities for peer mediators, and the design, implementation and experimental study of several query processing techniques that realize composability in an efficient and scalable way.
  •  
7.
  • Melander, Lars (författare)
  • Integrating Visual Data Flow Programming with Data Stream Management
  • 2016
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Data stream management and data flow programming have many things in common. In both cases one wants to transfer possibly infinite sequences of data items from one place to another, while performing transformations to the data. This Thesis focuses on the integration of a visual programming language with a data stream management system (DSMS) to support the construction, configuration, and visualization of data stream applications. In the approach, analyses of data streams are expressed as continuous queries (CQs) that emit data in real-time. The LabVIEW visual programming platform has been adapted to support easy specification of continuous visualization of CQ results. LabVIEW has been integrated with the DSMS SVALI through a stream-oriented client-server API. Query programming is declarative, and it is desirable to make the stream visualization declarative as well, in order to raise the abstraction level and make programming more intuitive. This has been achieved by adding a set of visual data flow components (VDFCs) to LabVIEW, based on the LabVIEW actor framework. With actor-based data flows, visualization of data stream output becomes more manageable, avoiding the procedural control structures used in conventional LabVIEW programming while still utilizing the comprehensive, built-in LabVIEW visualization tools.The VDFCs are part of the Visual Data stream Monitor (VisDM), which is a client-server based platform for handling real-time data stream applications and visualizing stream output. VDFCs are based on a data flow framework that is constructed from the actor framework, and are divided into producers, operators, consumers, and controls. They allow a user to set up the interface environment, customize the visualization, and convert the streaming data to a format suitable for visualization.Furthermore, it is shown how LabVIEW can be used to graphically define interfaces to data streams and dynamically load them in SVALI through a general wrapper handler. As an illustration, an interface has been defined in LabVIEW for accessing data streams from a digital 3D antenna.VisDM has successfully been tested in two real-world applications, one at Sandvik Coromant and one at the Ångström Laboratory, Uppsala University. For the first case, VisDM was deployed as a portable system to provide direct visualization of machining data streams. The data streams can differ in many ways as do the various visualization tasks. For the second case, data streams are homogenous, high-rate, and query operations are much more computation-demanding. For both applications, data is visualized in real-time, and VisDM is capable of sufficiently high update frequencies for processing and visualizing the streaming data without obstructions.The uniqueness of VisDM is the combination of a powerful and versatile DSMS with visually programmed and completely customizable visualization, while maintaining the complete extensibility of both.
  •  
8.
  • Petrini, Johan, 1975- (författare)
  • Querying RDF Schema Views of Relational Databases
  • 2008
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The amount of data found on the web today and its lack of semantics makes it increasingly harder to retrieve a particular piece of information. With the Resource Description Framework (RDF) every piece of information can be annotated with properties describing its semantics. The higher level language RDF Schema (RDFS) is defined in terms of RDF and provides means to describe classes of RDF resources and properties defined over these classes. Queries over RDFS data can be specified using the standard query language SPARQL. Since the majority of information in the world still resides in relational databases it should be investigated how to view and query their contents as views defined in terms of RDFS meta-data descriptions. However, processing of queries to general RDFS views over relational databases is challenging since the queries and view definitions are complex and the amount of data often is huge. A system, Semantic Web Abridged Relational Databases (SWARD), is developed to enable efficient processing of SPARQL queries to RDFS views of data in existing relational databases. The RDFS views, called universal property views (UPVs), are automatically generated provided a minimum of user input. A UPV is a general RDFS view of a relational database representing both its schema and data. Special attention is devoted to making the UPV represent as much of the relational database semantics as possible, including foreign and composite keys. A general query reduction algorithm, called PARtial evaluation of Queries (PARQ), for queries over complex views, such as UPVs, has been developed. The reduction algorithm is based on the program transformation technique partial evaluation. For UPVs, the PARQ algorithm is shown to elegantly reduce queries dramatically before regular cost-based optimization by a back-end relational DBMS. The results are verified by performance measurements of SPARQL queries to a commercial relational database.
  •  
9.
  • Sabesan, Manivasakan, 1974- (författare)
  • Querying Data Providing Web Services
  • 2010
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Web services are often used for search computing where data is retrieved from servers providing information of different kinds. Such data providing web services return a set of objects for a given set of parameters without any side effects. There is need to enable general and scalable search capabilities of data from data providing web services, which is the topic of this Thesis. The Web Service MEDiator (WSMED) system automatically provides relational views of any data providing web service operations by reading the WSDL documents describing them. These views can be queried with SQL. Without any knowledge of the costs of executing specific web service operations the WSMED query processor automatically and adaptively finds an optimized parallel execution plan calling queried data providing web services. For scalable execution of queries to data providing web services, an algebra operator PAP adaptively parallelizes calls in execution plans to web service operations until no significant performance improvement is measured, based on monitoring the flow from web service operations without any cost knowledge or extensive memory usage. To comply with the Everything as a Service (XaaS) paradigm WSMED itself is implemented as a web service that provides web service operations to query and combine data from data providing web services. A web based demonstration of the WSMED web service provides general SQL queries to any data providing web service operations from a browser. WSMED assumes that all queried data sources are available as web services. To make any data providing system into a data providing web service WSMED includes a subsystem, the web service generator, which generates and deploys the web service operations to access a data source. The WSMED web service itself is generated by the web service generator.
  •  
10.
  • Stefanova, Silvia, 1972- (författare)
  • Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations
  • 2013
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • This Thesis addresses how Semantic Web representations, in particular RDF, can enable flexible and scalable preservation, recreation, and querying of databases.An approach has been developed for selective scalable long-term archival of relational databases (RDBs) as RDF, implemented in the SAQ (Semantic Archive and Query) system. The archival of user-specified parts of an RDB is specified using an extension of SPARQL, A-SPARQL. SAQ automatically generates an RDF view of the RDB, the RD-view. The result of an archival query is RDF triples stored in: i) a data archive file containing the preserved RDB content, and ii) a schema archive file containing sufficient meta-data to reconstruct the archived database. To achieve scalable data preservation and recreation, SAQ uses special query rewriting optimizations for the archival queries. It was experimentally shown that they improve query execution and archival time compared with naïve processing. The performance of SAQ was compared with that of other systems supporting SPARQL queries to views of existing RDBs.When an archived RDB is to be recreated, the reloader module of SAQ first reads the schema archive file and executes a schema reconstruction algorithm to automatically construct the RDB schema. The thus created RDB is populated by reading the data archive and converting the read data into relational attribute values. For scalable recreation of RDF archived data we have developed the Triple Bulk Load (TBL) approach where the relational data is reconstructed by using the bulk load facility of the RDBMS. Our experiments show that the TBL approach is substantially faster than the naïve Insert Attribute Value (IAV) approach, despite the added sorting and post-processing.To view and query semi-structured Topic Maps data as RDF the prototype system TM-Viewer was implemented. A declarative RDF view of Topic Maps, the TM-view, is automatically generated by the TM-viewer using a developed conceptual schema for the Topic Maps data model. To achieve efficient query processing of SPARQL queries to the TM-view query rewrite transformations were developed and evaluated. It was shown that they significantly improve the query execution time.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy