SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Van Loan Charles) "

Sökning: WFRF:(Van Loan Charles)

  • Resultat 1-4 av 4
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  •  
2.
  • Karlsson, Lars, 1982- (författare)
  • Scheduling of parallel matrix computations and data layout conversion for HPC and Multi-Core Architectures
  • 2011
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Dense linear algebra represents fundamental building blocks in many computational science and engineering applications. The dense linear algebra algorithms must be numerically stable, robust, and reliable in order to be usable as black-box solvers by expert as well as non-expert users. The algorithms also need to scale and run efficiently on massively parallel computers with multi-core nodes. Developing high-performance algorithms for dense matrix computations is a challenging task, especially since the widespread adoption of multi-core architectures. Cache reuse is an even more critical issue on multi-core processors than on uni-core processors due to their larger computational power and more complex memory hierarchies. Blocked matrix storage formats, in which blocks of the matrix are stored contiguously, and blocked algorithms, in which the algorithms exhibit large amounts of cache reuse, remain key techniques in the effort to approach the theoretical peak performance. In Paper I, we present a packed and distributed Cholesky factorization algorithm based on a new blocked and packed matrix storage format. High performance node computations are obtained as a result of the blocked storage format, and the use of look-ahead leads to improved parallel efficiency. In Paper II and Paper III, we study the problem of in-place matrix transposition in general and in-place matrix storage format conversion in particular. We present and evaluate new high-performance parallel algorithms for in-place conversion between the standard column-major and row-major formats and the four standard blocked matrix storage formats. Another critical issue, besides cache reuse, is that of efficient scheduling of computational tasks. Many weakly scalable parallel algorithms are efficient only when the problem size per processor is relatively large. A current research trend focuses on developing parallel algorithms which are more strongly scalable and hence more efficient also for smaller problems. In Paper IV, we present a framework for dynamic node-scheduling of two-sided matrix computations and demonstrate that by using priority-based scheduling one can obtain an efficient scheduling of a QR sweep. In Paper V and Paper VI, we present a blocked implementation of two-stage Hessenberg reduction targeting multi-core architectures. The main contributions of Paper V are in the blocking and scheduling of the second stage. Specifically, we show that the concept of look-ahead can be applied also to this two-sided factorization, and we propose an adaptive load-balancing technique that allow us to schedule the operations effectively.
  •  
3.
  • Kågström, Bo, et al. (författare)
  • Algorithm 784: GEMM-based level 3 BLAS: Portability and Optimization Issues
  • 1998
  • Ingår i: ACM Transactions on Mathematical Software. ; 24:3, s. 303-316
  • Tidskriftsartikel (refereegranskat)abstract
    • This companion article discusses portability and optimization issues of the GEMM-based level 3 BLAS model implementations and the performance evaluation benchmark. All software comes in all four data types (single- and double-precision, real and complex) and are designed to be easy to implement and use on different platforms. Each of the GEMM-based routines has a few machine-dependent parameters that specify internal block. sizes, cache characteristics, and branch points for alternative code sections. These parameters provide means for adjustment to the characteristics of a memory hierarchy.
  •  
4.
  • Kågström, Bo, et al. (författare)
  • GEMM-Based Level 3 BLAS: High-Performance Model Implementations and Performance Evaluation Benchmark
  • 1998
  • Ingår i: ACM Transactions on Mathematical Software. ; 24:3, s. 268-302
  • Tidskriftsartikel (refereegranskat)abstract
    • The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architectures the development of optimal level 3 BLAS code is costly and time consuming. However, it is possible to develop a portable and high-performance level 3 BLAS library mainly relying on a highly optimized GEMM, the routine for the general matrix multiply and add operation. With suitable partitioning all the other level 3 BLAS can be defined in terms of GEMM and a small amount of level 1 and level 2 computations. Our contribution is twofold. First, the model implementations in Fortran 77 of the GEMM-based level 3 BLAS are structured to reduce effectively data traffic in a memory hierarchy. Second, the GEMM-based level 3 BLAS performance evaluation benchmark. is a tool for evaluating and comparing different implementations of the level 3 BLAS with the GEMM-based model implementations.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-4 av 4

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy