SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:2150 8097 "

Sökning: L773:2150 8097

  • Resultat 1-9 av 9
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Abbas, Zainab, 1991-, et al. (författare)
  • Streaming Graph Partitioning: An Experimental Study
  • 2018
  • Ingår i: Proceedings of the VLDB Endowment. - : ACM Digital Library. - 2150-8097. ; 11:11, s. 1590-1603
  • Tidskriftsartikel (refereegranskat)abstract
    • Graph partitioning is an essential yet challenging task for massive graph analysis in distributed computing. Common graph partitioning methods scan the complete graph to obtain structural characteristics offline, before partitioning. However, the emerging need for low-latency, continuous graph analysis led to the development of online partitioning methods. Online methods ingest edges or vertices as a stream, making partitioning decisions on the fly based on partial knowledge of the graph. Prior studies have compared offline graph partitioning techniques across different systems. Yet, little effort has been put into investigating the characteristics of online graph partitioning strategies.In this work, we describe and categorize online graph partitioning techniques based on their assumptions, objectives and costs. Furthermore, we employ an experimental comparison across different applications and datasets, using a unified distributed runtime based on Apache Flink. Our experimental results showcase that model-dependent online partitioning techniques such as low-cut algorithms offer better performance for communication-intensive applications such as bulk synchronous iterative algorithms, albeit higher partitioning costs. Otherwise, model-agnostic techniques trade off data locality for lower partitioning costs and balanced workloads which is beneficial when executing data-parallel single-pass graph algorithms.
  •  
2.
  • Ahmetaj, Shqiponja, et al. (författare)
  • Magic Shapes for SHACL Validation
  • 2022
  • Ingår i: Proceedings of the VLDB Endowment. - : Association for Computing Machinery (ACM). - 2150-8097. ; 15:10, s. 2284-2296
  • Tidskriftsartikel (refereegranskat)abstract
    • A key prerequisite for the successful adoption of the Shapes Constraint Language (SHACL)—the W3C standardized constraint language for RDF graphs—is the availability of automated tools that efficiently validate targeted constraints (known as shapes graphs) over possibly very large RDF graphs. There are already significant efforts to produce optimized engines for SHACL validation, but they focus on restricted fragments of SHACL. For unrestricted SHACL, that is SHACL with unrestricted recursion and negation, there is no validator beyond a proof-of-concept prototype, and existing techniques are inherently incompatible with the goal-driven approaches being pursued by existing validators. Instead they require a global computation on the entire data graph that is not only computationally very costly, but also brittle, and can easily result in validation failures due to conflicts that are irrelevant to the validation targets. To address these challenges, we present a ‘magic’ transformation— based on Magic Sets as known from Logic Programming—that transforms a SHACL shapes graph S into a new shapes graph S′ whose validation considers only the relevant neighbourhood of the targeted nodes. The new S′ is equivalent to S whenever there are no conflicts between the constraints and the data, and in case the validation of S fails due to conflicts that are irrelevant to the target, S′ may still admit a lazy, target-oriented validation. We implement the algorithm and run preliminary experiments, showing our approach can be a stepping stone towards validators for full SHACL, and that it can significantly improve the performance of the only prototype validator that currently supports full recursion and negation.
  •  
3.
  • Bux, M., et al. (författare)
  • SAASFEE : Scalable scientific workflow execution engine
  • 2015
  • Ingår i: Proceedings of the VLDB Endowment. - : Association for Computing Machinery (ACM). - 2150-8097. ; 8:12, s. 1892-1895, s. 1892-1903
  • Tidskriftsartikel (refereegranskat)abstract
    • Across many fields of science, primary data sets like sensor read-outs, time series, and genomic sequences are analyzed by complex chains of specialized tools and scripts exchanging intermediate results in domain-specific file formats. Scientific work ow management systems (SWfMSs) support the development and execution of these tool chains by providing work ow specification languages, graphical editors, fault-tolerant execution engines, etc. However, many SWfMSs are not prepared to handle large data sets because of inadequate support for distributed computing. On the other hand, most SWfMSs that do support distributed computing only allow static task execution orders. We present SAASFEE, a SWfMS which runs arbitrarily complex work ows on Hadoop YARN. Work ows are specified in Cuneiform, a functional work ow language focusing on parallelization and easy integration of existing software. Cuneiform work ows are executed on Hi-WAY, a higher-level scheduler for running work ows on YARN. Distinct features of SAASFEE are the ability to execute iterative work ows, an adaptive task scheduler, re-executable provenance traces, and compatibility to selected other work ow systems. In the demonstration, we present all components of SAASFEE using real-life work ows from the field of genomics.
  •  
4.
  • Carbone, Paris, et al. (författare)
  • State Management in Apache Flink : Consistent Stateful Distributed Stream Processing
  • 2017
  • Ingår i: Proceedings of the VLDB Endowment. - : ACM Digital Library. - 2150-8097. ; 10, s. 1718-1729
  • Tidskriftsartikel (refereegranskat)abstract
    • Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster reconfigurations, software patches and partial failures. Although prior systems research has addressed some of these specific problems, the practical challenge lies on how such guarantees can be materialized in a transparent, non-intrusive manner that relieves the user from unnecessary constraints. Such needs served as the main design principles of state management in Apache Flink, an open source, scalable stream processor.We present Flink’s core pipelined, in-flight mechanism which guarantees the creation of lightweight, consistent, distributed snapshots of application state, progressively, without impacting continuous execution. Consistent snapshots cover all needs for system reconfiguration, fault tolerance and version management through coarse grained rollback recovery. Application state is declared explicitly to the system, allowing efficient partitioning and transparent commits to persistent storage. We further present Flink’s backend implementations and mechanisms for high availability, external state queries and output commit. Finally, we demonstrate how these mechanisms behave in practice with metrics and large deployment insights exhibiting the low performance trade-offs of our approach and the general benefits of exploiting asynchrony in continuous, yet sustainable system deployments.
  •  
5.
  • Kalavri, Vasiliki, et al. (författare)
  • The shortest path is not always a straight line : Leveraging semimetricity in graph analysis
  • 2016
  • Ingår i: Proceedings of the VLDB Endowment. - : ACM Press. - 2150-8097. ; 9:9, s. 672-683
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we leverage the concept of the metric backbone to improve the effciency of large-scale graph analytics. The metric backbone is the minimum subgraph that preserves the shortest paths of a weighted graph. We use the metric backbone in place of the original graph to compute vari- ous graph metrics exactly or with good approximation. By computing on a smaller graph, we improve the performance of graph analytics applications on two different systems, a batch graph processing system and a graph database. Further, we provide an algorithm for the computation of the metric backbone on large graphs. While one can com- pute the metric backbone by solving the all-pairs-shortest- paths problem, this approach incurs prohibitive time and space complexity for big graphs. Instead, we propose a heuristic that makes computing the metric backbone prac- tical even for large graphs. Additionally, we analyze several real datasets of different sizes and domains and we show that we can approximate the metric backbone by removing only first-order semi-metric edges; edges for which a shorter two-hop path exists. We provide a distributed implementation of our algorithm and apply it in large scale scenarios. We evaluate our algo- rithm using a variety of real graphs, including a Facebook social network subgraph of ~50 billion edges. We measure the impact of using the metric backbone on runtime per- formance in two graph management systems. We achieve query speedups of up to 6.7x in the Neo4j commercial graph database and job speedups of up to 6x in the Giraph graph processing system.
  •  
6.
  • Kotsifakos, Alexios, et al. (författare)
  • Hum-a-song : a subsequence matching with gaps-range-tolerances query-by-humming system
  • 2012
  • Ingår i: Proceedings of the VLDB Endowment. - New York : Association for Computing Machinery (ACM). - 2150-8097. ; 5:12, s. 1930-1933
  • Tidskriftsartikel (refereegranskat)abstract
    • We present "Hum-a-song", a system built for music retrieval, and particularly for the Query-By-Humming (QBH) application. According to QBH, the user is able to hum a part of a song that she recalls and would like to learn what this song is, or find other songs similar to it in a large music repository. We present a simple yet efficient approach that maps the problem to time series subsequence matching. The query and the database songs are represented as 2-dimensional time series conveying information about the pitch and the duration of the notes. Then, since the query is a short sequence and we want to find its best match that may start and end anywhere in the database, subsequence matching methods are suitable for this task. In this demo, we present a system that employs and exposes to the user a variety of state-of-the-art dynamic programming methods, including a newly proposed efficient method named SMBGT that is robust to noise and considers all intrinsic problems in QBH; it allows variable tolerance levels when matching elements, where tolerances are defined as functions of the compared sequences, gaps in both the query and target sequences, and bounds the matching length and (optionally) the minimum number of matched elements. Our system is intended to become open source, which is to the best of our knowledge the first non-commercial effort trying to solve QBH with a variety of methods, and that also approaches the problem from the time series perspective.
  •  
7.
  • Merchant, A., et al. (författare)
  • Succinct Graph Representations as Distance Oracles : An Experimental Evaluation
  • 2022
  • Ingår i: Proceedings of the VLDB Endowment. - : Association for Computing Machinery (ACM). - 2150-8097. ; , s. 2297-2306
  • Konferensbidrag (refereegranskat)abstract
    • Distance oracles answer shortest-path queries between any pair of nodes in a graph. They are often built using succinct graph representations such as spanners, sketches, and compressors to minimize oracle size and query answering latency. Node embeddings, in particular, offer graph representations that place adjacent nodes nearby each other in a low-rank space. However, their use in the design of distance oracles has not been suffciently studied. In this paper, we empirically compare exact distance oracles constructed based on a variety of node embeddings and other succinct representations. We evaluate twelve such oracles along three measures of efficiency: construction time, memory requirements, and query-processing time over fourteen real datasets and four synthetic graphs. We show that distances between embedding vectors are excellent estimators of graph distances when graphs are well-structured, but less so for more unstructured graphs. Overall, our findings suggest that exact oracles based on embeddings can be constructed faster than multi-dimensional scaling (MDS) but slower than compressed adjacency indexes, require less memory than landmark oracles but more than sparsifiers or indexes, can answer queries faster than indexes but slower than MDS, and are exact more often with a smaller additive error than spanners (that have multiplicative error) while not being lossless like adjacency lists. Finally, while the exactness of such oracles is infeasible to maintain for huge graphs even under large amounts of resources, we empirically demonstrate that approximate oracles based on GOSH embeddings can efficiently scale to graphs of 100M+ nodes with only small additive errors in distance estimations. 
  •  
8.
  • Palyvos-Giannas, Dimitrios, 1991, et al. (författare)
  • Ananke: A Streaming Framework for Live Forward Provenance
  • 2020
  • Ingår i: Proceedings of the VLDB Endowment. - : Association for Computing Machinery (ACM). - 2150-8097. ; 14:3, s. 391-403
  • Tidskriftsartikel (refereegranskat)abstract
    • Data streaming enables online monitoring of large and continuous event streams in Cyber-Physical Systems (CPSs). In such scenarios, fine-grained backward provenance tools can connect streaming query results to the source data producing them, allowing analysts to study the dependency/causality of CPS events. While CPS monitoring commonly produces many events, backward provenance does not help prioritize event inspection since it does not specify if an event’s provenance could still contribute to future results. To cover this gap, we introduce Ananke, a framework to extend any fine-grained backward provenance tool and deliver a live bi-partite graph of fine-grained forward provenance. With Ananke, analysts can prioritize the analysis of provenance data based on whether such data is still potentially being processed by the monitoring queries. We prove our solution is correct, discuss multiple implementations, including one leveraging streaming APIs for parallel analysis, and show Ananke results in small overheads, close to those of existing tools for fine-grained backward provenance.
  •  
9.
  • Palyvos-Giannas, Dimitrios, 1991, et al. (författare)
  • Erebus: Explaining the Outputs of Data Streaming Queries
  • 2022
  • Ingår i: Proceedings of the VLDB Endowment. - : Association for Computing Machinery (ACM). - 2150-8097. ; 16:2, s. 230-242
  • Konferensbidrag (refereegranskat)abstract
    • In data streaming, why-provenance can explain why a given outcome is observed but offers no help in understanding why an expected outcome is missing. Explaining missing answers has been addressed in DBMSs, but these solutions are not directly applicable to the streaming setting, because of the extra challenges posed by limited storage and by the unbounded nature of data streams. With our framework, Erebus, we tackle the unaddressed challenges behind explaining missing answers in streaming applications. Erebus allows users to define expectations about the results of a query, verifying at runtime if such expectations hold, and also providing explanations when expected and observed outcomes diverge (missing answers). To the best of our knowledge, Erebus is the first such solution in data streaming. Our thorough evaluation on real data shows that Erebus can explain the (missing) answers with small overheads, both in low-and higher-end devices, even when large portions of the processed data are part of such explanations.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-9 av 9

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy