SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:0885 7458 OR L773:1573 7640 "

Sökning: L773:0885 7458 OR L773:1573 7640

  • Resultat 1-10 av 19
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Berman, F., et al. (författare)
  • New grid scheduling and rescheduling methods in the GrADS Project
  • 2005
  • Ingår i: International journal of parallel programming. - : Springer Science and Business Media LLC. - 0885-7458 .- 1573-7640. ; 33:3-Feb, s. 209-229
  • Tidskriftsartikel (refereegranskat)abstract
    • The goal of the Grid Application Development Software (GrADS) Project is to provide programming tools and an execution environment to ease program development for the Grid. This paper presents recent extensions to the GrADS software framework: a new approach to scheduling workflow computations, applied to a 3-D image reconstruction application; a simple stop/migrate/restart approach to rescheduling Grid applications, applied to a QR factorization benchmark; and a process-swapping approach to rescheduling, applied to an N-body simulation. Experiments validating these methods were carried out on both the GrADS MacroGrid (a small but functional Grid) and the MicroGrid (a controlled emulation of the Grid).
  •  
2.
  • Birath, Bjorn, et al. (författare)
  • High-Level Programming of FPGA-Accelerated Systems with Parallel Patterns
  • 2024
  • Ingår i: International journal of parallel programming. - : SPRINGER/PLENUM PUBLISHERS. - 0885-7458 .- 1573-7640.
  • Tidskriftsartikel (refereegranskat)abstract
    • As a result of frequency and power limitations, multi-core processors and accelerators are becoming more and more prevalent in today's systems. To fully utilize such systems, heterogeneous parallel programming is needed, but this introduces new complexities to the development. High-level frameworks such as SkePU have been introduced to help alleviate these complexities. SkePU is a skeleton programming framework based on a set of programming constructs implementing computational parallel patterns, while presenting a sequential interface to the programmer. Using the various skeleton backends, SkePU programs can execute, without source code modification, on multiple types of hardware such as CPUs, GPUs, and clusters. This paper presents the design and implementation of a new backend for SkePU, adding support for FPGAs. We also evaluate the effect of FPGA-specific optimizations in the new backend and compare it with the existing GPU backend, where the actual devices used are of similar vintage and price point. For simple examples, we find that the FPGA-backend's performance is similar to that of the existing backend for GPUs, while it falls behind in more complex tasks. Finally, some shortcomings in the backend are highlighted and discussed, along with potential solutions.
  •  
3.
  • Dastgeer, Usman, et al. (författare)
  • Smart Containers and Skeleton Programming for GPU-Based Systems
  • 2016
  • Ingår i: International journal of parallel programming. - : SPRINGER/PLENUM PUBLISHERS. - 0885-7458 .- 1573-7640. ; 44:3, s. 506-530
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we discuss the role, design and implementation of smart containers in the SkePU skeleton library for GPU-based systems. These containers provide an interface similar to C++ STL containers but internally perform runtime optimization of data transfers and runtime memory management for their operand data on the different memory units. We discuss how these containers can help in achieving asynchronous execution for skeleton calls while providing implicit synchronization capabilities in a data consistent manner. Furthermore, we discuss the limitations of the original, already optimizing memory management mechanism implemented in SkePU containers, and propose and implement a new mechanism that provides stronger data consistency and improves performance by reducing communication and memory allocations. With several applications, we show that our new mechanism can achieve significantly (up to 33.4 times) better performance than the initial mechanism for page-locked memory on a multi-GPU based system.
  •  
4.
  • Ernstsson, August, 1992-, et al. (författare)
  • A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems
  • 2022
  • Ingår i: International journal of parallel programming. - : Springer / Plenum. - 0885-7458 .- 1573-7640. ; 50, s. 319-340
  • Tidskriftsartikel (refereegranskat)abstract
    • SkePU is a pattern-based high-level programming model for transparent program execution on heterogeneous parallel computing systems. A key feature of SkePU is that, in general, the selection of the execution platform for a skeleton-based function call need not be determined statically. On single-node systems, SkePU can select among CPU, multithreaded CPU, single or multi-GPU execution. Many scientific applications use pseudo-random number generators (PRNGs) as part of the computation. In the interest of correctness and debugging, deterministic parallel execution is a desirable property, which however requires a deterministically parallelized pseudo-random number generator. We present the API and implementation of a deterministic, portable parallel PRNG extension to SkePU that is scalable by design and exhibits the same behavior regardless where and with how many resources it is executed. We evaluate it with four probabilistic applications and show that the PRNG enables scalability on both multi-core CPU and GPU resources, and hence supports the universal portability of SkePU code even in the presence of PRNG calls, while source code complexity is reduced.
  •  
5.
  • Ernstsson, August, 1992-, et al. (författare)
  • Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems
  • 2023
  • Ingår i: International journal of parallel programming. - : Springer / Plenum. - 0885-7458 .- 1573-7640. ; 51, s. 61-82
  • Tidskriftsartikel (refereegranskat)abstract
    • We analyze the performance portability of the skeleton-based, single-source multi-backend high-level programming framework SkePU across multiple different CPU-GPU heterogeneous systems. Thereby, we provide a systematic application efficiency characterization of SkePU-generated code in comparison to equivalent hand-written code in more low-level parallel programming models such as OpenMP and CUDA. For this purpose, we contribute ports of the STREAM benchmark suite and of a part of the NAS Parallel Benchmark suite to SkePU. We show that for STREAM and the EP benchmark, SkePU regularly scores efficiency values above 80% and in particular for CPU systems, SkePU can outperform hand-written code.
  •  
6.
  • Ernstsson, August, et al. (författare)
  • SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems
  • 2018
  • Ingår i: International journal of parallel programming. - : SPRINGER/PLENUM PUBLISHERS. - 0885-7458 .- 1573-7640. ; 46:1, s. 62-80
  • Tidskriftsartikel (refereegranskat)abstract
    • In this article we present SkePU 2, the next generation of the SkePU C++ skeleton programming framework for heterogeneous parallel systems. We critically examine the design and limitations of the SkePU 1 programming interface. We present a new, flexible and type-safe, interface for skeleton programming in SkePU 2, and a source-to-source transformation tool which knows about SkePU 2 constructs such as skeletons and user functions. We demonstrate how the source-to-source compiler transforms programs to enable efficient execution on parallel heterogeneous systems. We show how SkePU 2 enables new use-cases and applications by increasing the flexibility from SkePU 1, and how programming errors can be caught earlier and easier thanks to improved type safety. We propose a new skeleton, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations. We also discuss how the source-to-source compiler can enable a new optimization opportunity by selecting among multiple user function specializations when building a parallel program. Finally, we show that the performance of our prototype SkePU 2 implementation closely matches that of SkePU 1.
  •  
7.
  • Ernstsson, August, et al. (författare)
  • SkePU 3 : Portable High-Level Programming of Heterogeneous Systems and HPC Clusters
  • 2021
  • Ingår i: International journal of parallel programming. - : Springer Nature. - 0885-7458 .- 1573-7640. ; 49:6, s. 846-866
  • Tidskriftsartikel (refereegranskat)abstract
    • We present the third generation of the C++-based open-source skeleton programming framework SkePU. Its main new features include new skeletons, new data container types, support for returning multiple objects from skeleton instances and user functions, support for specifying alternative platform-specific user functions to exploit e.g. custom SIMD instructions, generalized scheduling variants for the multicore CPU backends, and a new cluster-backend targeting the custom MPI interface provided by the StarPU task-based runtime system. We have also revised the smart data containers memory consistency model for automatic data sharing between main and device memory. The new features are the result of a two-year co-design effort collecting feedback from HPC application partners in the EU H2020 project EXA2PRO, and target especially the HPC application domain and HPC platforms. We evaluate the performance effects of the new features on high-end multicore CPU and GPU systems and on HPC clusters.
  •  
8.
  • Gou, C., et al. (författare)
  • Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
  • 2013
  • Ingår i: International Journal of Parallel Programming. - : Springer Science and Business Media LLC. - 0885-7458 .- 1573-7640. ; 41:3, s. 400-429
  • Tidskriftsartikel (refereegranskat)abstract
    • One of the major problems with the GPU on-chip shared memory is bank conflicts. We analyze that the throughput of the GPU processor core is often constrained neither by the shared memory bandwidth, nor by the shared memory latency (as long as it stays constant), but is rather due to the varied latencies caused by memory bank conflicts. This results in conflicts at the writeback stage of the in-order pipeline and causes pipeline stalls, thus degrading system throughput. Based on this observation, we investigate and propose a novel Elastic Pipeline design that minimizes the negative impact of on-chip memory bank conflicts on system throughput, by decoupling bank conflicts from pipeline stalls. Simulation results show that our proposed Elastic Pipeline together with the co-designed bank-conflict aware warp scheduling reduces the pipeline stalls by up to 64.0 % (with 42.3 % on average) and improves the overall performance by up to 20.7 % (on average 13.3 %) for representative benchmarks, at trivial hardware overhead. © 2012 The Author(s).
  •  
9.
  • Hu, X., et al. (författare)
  • A Configurable Hardware Architecture for Runtime Application of Network Calculus
  • 2021
  • Ingår i: International journal of parallel programming. - : Springer Nature. - 0885-7458 .- 1573-7640. ; 49:5, s. 745-760
  • Tidskriftsartikel (refereegranskat)abstract
    • Network Calculus has been a foundational theory for analyzing and ensuring Quality-of-Service (QoS) in a variety of networks including Networks on Chip (NoCs). To fulfill dynamic QoS requirements of applications, runtime application of network calculus is essential. However, the primitive operations in network calculus such as arrival curve, min-plus convolution and min-plus deconvolution are very time consuming when calculated in software because of the large volume and long latency of computation. For the first time, we propose a configurable hardware architecture to enable runtime application of network calculus. It employs a unified pipeline that can be dynamically configured to efficiently calculate the arrival curve, min-plus convolution, and min-plus deconvolution at runtime. We have implemented and synthesized this hardware architecture on a Xilinx FPGA platform to quantify its performance and resource consumption. Furthermore, we have built a prototype NoC system incorporating this hardware for dynamic flow regulation to effectively achieve QoS at runtime. 
  •  
10.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 19

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy