SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "L773:1094 3420 OR L773:1741 2846 "

Search: L773:1094 3420 OR L773:1741 2846

  • Result 1-18 of 18
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Aldinucci, Marco, et al. (author)
  • Preface
  • 2017
  • In: The international journal of high performance computing applications. - : Sage Publications. - 1094-3420 .- 1741-2846. ; 31:3, s. 179-180
  • Journal article (peer-reviewed)
  •  
2.
  • Bauer, Pavol, et al. (author)
  • Fast event-based epidemiological simulations on national scales
  • 2016
  • In: The international journal of high performance computing applications. - : SAGE Publications. - 1094-3420 .- 1741-2846. ; 30, s. 438-453
  • Journal article (peer-reviewed)abstract
    • We present a computational modeling framework for data-driven simulations and analysis of infectious disease spread in large populations. For the purpose of efficient simulations, we devise a parallel solution algorithm targeting multi-socket shared-memory architectures. The model integrates infectious dynamics as continuous-time Markov chains and available data such as animal movements or aging are incorporated as externally defined events. To bring out parallelism and accelerate the computations, we decompose the spatial domain and optimize cross-boundary communication using dependency-aware task scheduling. Using registered livestock data at a high spatiotemporal resolution, we demonstrate that our approach not only is resilient to varying model configurations but also scales on all physical cores at realistic workloads. Finally, we show that these very features enable the solution of inverse problems on national scales.
  •  
3.
  • Berman, F., et al. (author)
  • The GrADS project : Software support for high-level grid application development
  • 2001
  • In: The international journal of high performance computing applications. - : SAGE Publications. - 1094-3420 .- 1741-2846. ; 15:4, s. 327-344
  • Journal article (peer-reviewed)abstract
    • Advances in networking technologies will soon make it possible to use the global information infrastructure in a qualitatively different way-as a computational as well as an information resource. As described in the recent book The Grid: Blueprint for a New Computing Infrastructure, this Grid will connect the nation's computers, databases, instruments, and people in a seamless web of computing and distributed intelligence, which can be used in an on demand fashion as a problem-solving resource in many fields of human endeavor-and, in particular, science and engineering. The availability of grid resources will give rise to dramatically new classes of applications, in which computing resources are no longer localized but, rather, distributed, heterogeneous, and dynamic; computation is increasingly sophisticated and multidisciplinary; and computation is integrated into our daily lives and, hence, subject to stricter time constraints than at present. The impact of these new applications will be pervasive, ranging from new systems for scientific inquiry, through computing support for crisis management, to the use of ambient computing to enhance personal mobile computing environments. To realize this vision, significant scientific and technical obstacles must be overcome. Principal among these is usability. The goal of the Grid Application Development Software (GrADS) project is to simplify distributed heterogeneous computing in the same way that the World Wide Web simplified information sharing over the Internet. To that end, the project is exploring the scientific and technical problems that must be solved to make it easier for ordinary scientific users to develop, execute, and tune applications on the Grid. In this paper, the authors describe the vision and strategies underlying the GrADS project, including the base software architecture for grid execution and performance monitoring, strategies and tools for construction of applications from libraries of grid-aware components, and development of innovative new science and engineering applications that can exploit these new technologies to run effectively in grid environments.
  •  
4.
  • Commer, Michael, et al. (author)
  • Iterative Krylov solution methods for geophysical electromagnetic simulations on throughput-oriented processing units
  • 2012
  • In: The international journal of high performance computing applications. - : SAGE Publications. - 1094-3420 .- 1741-2846. ; 26:4, s. 378-385
  • Journal article (peer-reviewed)abstract
    • Many geo-scientific applications involve boundary value problems arising in simulating electrostatic and electromagnetic fields for geophysical prospecting and subsurface imaging of electrical resistivity. Modeling complex geological media with three-dimensional finite-difference grids gives rise to large sparse linear systems of equations. For such systems, we have implemented three common iterative Krylov solution methods on graphics processing units and compared their performance with parallel host-based versions. The benchmarks show that the device efficiency improves with increasing grid sizes. Limitations are currently given by the device memory resources.
  •  
5.
  • Ekeberg, Tomas, et al. (author)
  • Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters
  • 2015
  • In: The international journal of high performance computing applications. - : SAGE Publications. - 1094-3420 .- 1741-2846. ; 29, s. 233-243
  • Journal article (peer-reviewed)abstract
    • The classical method of determining the atomic structure of complex molecules by analyzing diffraction patterns is currently undergoing drastic developments. Modern techniques for producing extremely bright and coherent X-ray lasers allow a beam of streaming particles to be intercepted and hit by an ultrashort high-energy X-ray beam. Through machine learning methods the data thus collected can be transformed into a three-dimensional volumetric intensity map of the particle itself. The computational complexity associated with this problem is very high such that clusters of data parallel accelerators are required. We have implemented a distributed and highly efficient algorithm for the inversion of large collections of diffraction patterns targeting clusters of hundreds of GPUs. With the expected enormous amount of diffraction data to be produced in the foreseeable future, this is the required scale to approach real-time processing of data at the beam site. Using both real and synthetic data we look at the scaling properties of the application and discuss the overall computational viability of this exciting and novel imaging technique.
  •  
6.
  • Fritzson, Dag, et al. (author)
  • Rolling Bearing Simulation on MIMD Computers
  • 1997
  • In: The international journal of high performance computing applications. - : Sage Publications. - 1094-3420 .- 1741-2846. ; 11:4, s. 299-313
  • Journal article (peer-reviewed)abstract
    • Rolling bearing simulations are very computationally in tensive. Serial simulations may take weeks to execute, and there is a need to use the potential of parallel comput ing. The specific structure of the rolling bearing problem is used to develop suitable scheduling strategies. The authors discuss the system of stiff ordinary differential equations arising from a bearing model and show how to numerically solve these ordinary differential equations on parallel computers. Benchmarking results are presented for two test cases on three platforms.
  •  
7.
  •  
8.
  • Iakymchuk, Roman, et al. (author)
  • General framework for re-assuring numerical reliability in parallel Krylov solvers : a case of bi-conjugate gradient stabilized methods
  • 2024
  • In: The international journal of high performance computing applications. - : Sage Publications. - 1094-3420 .- 1741-2846. ; 38:1, s. 17-33
  • Journal article (peer-reviewed)abstract
    • Parallel implementations of Krylov subspace methods often help to accelerate the procedure of finding an approximate solution of a linear system. However, such parallelization coupled with asynchronous and out-of-order execution often makes more visible the non-associativity impact in floating-point operations. These problems are even amplified when communication-hiding pipelined algorithms are used to improve the parallelization of Krylov subspace methods. Introducing reproducibility in the implementations avoids these problems by getting more robust and correct solutions. This paper proposes a general framework for deriving reproducible and accurate variants of Krylov subspace methods. The proposed algorithmic strategies are reinforced by programmability suggestions to assure deterministic and accurate executions. The framework is illustrated on the preconditioned BiCGStab method and its pipelined modification, which in fact is a distinctive method from the Krylov subspace family, for the solution of non-symmetric linear systems with message-passing. Finally, we verify the numerical behavior of the two reproducible variants of BiCGStab on a set of matrices from the SuiteSparse Matrix Collection and a 3D Poisson’s equation.
  •  
9.
  • Iakymchuk, Roman, et al. (author)
  • Hierarchical approach for deriving a reproducible unblocked LU factorization
  • 2019
  • In: The international journal of high performance computing applications. - : SAGE PUBLICATIONS LTD. - 1094-3420 .- 1741-2846. ; 33:5, s. 791-803
  • Journal article (peer-reviewed)abstract
    • We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.
  •  
10.
  •  
11.
  • Karp, Martin, 1996-, et al. (author)
  • Large-scale direct numerical simulations of turbulence using GPUs and modern Fortran
  • 2023
  • In: The international journal of high performance computing applications. - : SAGE Publications. - 1094-3420 .- 1741-2846. ; , s. 109434202311586-
  • Journal article (peer-reviewed)abstract
    • We present our approach to making direct numerical simulations of turbulence with applications in sustainable shipping. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by performing the world’s first direct numerical simulation of the flow around a Flettner rotor at Re = 30,000 and its interaction with a turbulent boundary layer. We present a performance comparison between the AMD Instinct MI250X and Nvidia A100 GPUs for scalable computational fluid dynamics. Our results show that one MI250X offers performance on par with two A100 GPUs and has a similar power efficiency based on readings from on-chip energy sensors.
  •  
12.
  • Kronbichler, Martin, et al. (author)
  • A fast massively parallel two-phase flow solver for microfluidic chip simulation
  • 2018
  • In: The international journal of high performance computing applications. - : SAGE Publications. - 1094-3420 .- 1741-2846. ; 32, s. 266-287
  • Journal article (peer-reviewed)abstract
    • This work presents a parallel finite element solver of incompressible two-phase flow targeting large-scale simulations of three-dimensional dynamics in high-throughput microfluidic separation devices. The method relies on a conservative level set formulation for representing the fluid-fluid interface and uses adaptive mesh refinement on forests of octrees. An implicit time stepping with efficient block solvers for the incompressible Navier–Stokes equations discretized with Taylor–Hood and augmented Taylor–Hood finite elements is presented. A matrix-free implementation is used that reduces the solution time for the Navier–Stokes system by a factor of approximately three compared to the best matrix-based algorithms. Scalability of the chosen algorithms up to 32,768 cores and a billion degrees of freedom is shown.
  •  
13.
  • Kurzak, Jakub, et al. (author)
  • Automatic Generation of FFT for Translations of Multipole Expansions in Spherical Harmonics
  • 2008
  • In: The international journal of high performance computing applications. - : SAGE Publications. - 1094-3420 .- 1741-2846. ; 22:2, s. 219-230
  • Journal article (peer-reviewed)abstract
    • The fast multipole method (FMM) is an efficient algorithm for calculating electrostatic interactions in molecular simulations and a promising alternative to Ewald summation methods. Translation of multipole expansion in spherical harmonics is the most important operation of the fast multipole method and the fast Fourier transform (FFT) acceleration of this operation is among the fastest methods of improving its performance. The technique relies on highly optimized implementation of fast Fourier transform routines for the desired expansion sizes, which need to incorporate the knowledge of symmetries and zero elements in the input arrays. Here a method is presented for automatic generation of such, highly optimized, routines.
  •  
14.
  • Markidis, Stefano, et al. (author)
  • OpenACC acceleration of the Nek5000 spectral element code
  • 2015
  • In: The international journal of high performance computing applications. - : Sage Publications. - 1094-3420 .- 1741-2846. ; 29:3, s. 311-319
  • Journal article (peer-reviewed)abstract
    • We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. The original NekBone Fortran source code has been used as the base and enhanced by OpenACC directives. The profiling of NekBone provided an assessment of the suitability of the code for GPU systems, and indicated possible kernel optimizations. To port NekBone to GPU systems required little effort and a small number of additional lines of code (approximately one OpenACC directive per 1000 lines of code). The naïve implementation using OpenACC leads to little performance improvement: on a single node, from 16 Gflops obtained with the version without OpenACC, we reached 20 Gflops with the naïve OpenACC implementation. An optimized NekBone version leads to a 43 Gflop performance on a single node. In addition, we ported and optimized NekBone to parallel GPU systems, reaching a parallel efficiency of 79.9% on 1024 GPUs of the Titan XK7 supercomputer at the Oak Ridge National Laboratory.
  •  
15.
  • Memeti, Suejb, et al. (author)
  • A machine learning approach for accelerating DNA sequence analysis
  • 2018
  • In: The international journal of high performance computing applications. - : Sage Publications. - 1094-3420 .- 1741-2846. ; 32:3, s. 363-379
  • Journal article (peer-reviewed)abstract
    • The DNA sequence analysis is a data and computationally intensive problem and therefore demands suitable parallel computing resources and algorithms. In this paper, we describe an optimized approach for DNA sequence analysis on a heterogeneous platform that is accelerated with the Intel Xeon Phi. Such platforms commonly comprise one or two general purpose host central processing units (CPUs) and one or more Xeon Phi devices. We present a parallel algorithm that shares the work of DNA sequence analysis between the host CPUs and the Xeon Phi device to reduce the overall analysis time. For automatic worksharing we use a supervised machine learning approach, which predicts the performance of DNA sequence analysis on the host and device and accordingly maps fractions of the DNA sequence to the host and device. We evaluate our approach empirically using real-world DNA segments for human and various animals on a heterogeneous platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P device with 61 cores.
  •  
16.
  • Mirkovic, D., et al. (author)
  • Automatic performance tuning for fast fourier transforms
  • 2004
  • In: The international journal of high performance computing applications. - : SAGE Publications. - 1094-3420 .- 1741-2846. ; 18:1, s. 47-64
  • Journal article (peer-reviewed)abstract
    • In this paper we discuss architecture-specific performance tuning for fast Fourier transforms (FFTs) implemented in the UHFFT library. The UHFFT library is an adaptive and portable software library for FFTs developed by the authors. We present the optimization methods used at different levels, starting with the algorithm selection used for the library code generation and ending with the actual implementation and specification of the appropriate compiler optimization options. We report on the performance results for several modern microprocessor architectures.
  •  
17.
  • Otten, Matthew, et al. (author)
  • An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication
  • 2016
  • In: The international journal of high performance computing applications. - : Sage Publications. - 1094-3420 .- 1741-2846. ; 30:3, s. 320-334
  • Journal article (peer-reviewed)abstract
    • We present performance results and an analysis of a message passing interface (MPI)/OpenACC implementation of an electromagnetic solver based on a spectral-element discontinuous Galerkin discretization of the time-dependent Maxwell equations. The OpenACC implementation covers all solution routines, including a highly tuned element-by-element operator evaluation and a GPUDirect gather-scatter kernel to effect nearest neighbor flux exchanges. Modifications are designed to make effective use of vectorization, streaming, and data management. Performance results using up to 16,384 graphics processing units of the Cray XK7 supercomputer Titan show more than 2.5x speedup over central processing unit-only performance on the same number of nodes (262,144 MPI ranks) for problem sizes of up to 6.9 billion grid points. We discuss performance-enhancement strategies and the overall potential of GPU-based computing for this class of problems.
  •  
18.
  • Simmendinger, Christian, et al. (author)
  • Interoperability strategies for GASPI and MPI in large-scale scientific applications
  • 2019
  • In: The international journal of high performance computing applications. - : SAGE PUBLICATIONS LTD. - 1094-3420 .- 1741-2846. ; 33:3, s. 554-568
  • Journal article (peer-reviewed)abstract
    • One of the main hurdles of partitioned global address space (PGAS) approaches is the dominance of message passing interface (MPI), which as a de facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like global address space programming interface (GASPI) without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we consider an interoperable GASPI/MPI implementation for the communication/performance crucial parts of the Ludwig and iPIC3D applications. To address the discovered performance limitations, we develop a novel strategy for significantly improved performance and interoperability between both APIs by leveraging GASPI shared windows and shared notifications. First results with a corresponding implementation in the MiniGhost proxy application and the Allreduce collective operation demonstrate the viability of this approach.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-18 of 18

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view