SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Dongarra J.) "

Sökning: WFRF:(Dongarra J.)

  • Resultat 1-6 av 6
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Berman, F., et al. (författare)
  • New grid scheduling and rescheduling methods in the GrADS Project
  • 2005
  • Ingår i: International journal of parallel programming. - : Springer Science and Business Media LLC. - 0885-7458 .- 1573-7640. ; 33:3-Feb, s. 209-229
  • Tidskriftsartikel (refereegranskat)abstract
    • The goal of the Grid Application Development Software (GrADS) Project is to provide programming tools and an execution environment to ease program development for the Grid. This paper presents recent extensions to the GrADS software framework: a new approach to scheduling workflow computations, applied to a 3-D image reconstruction application; a simple stop/migrate/restart approach to rescheduling Grid applications, applied to a QR factorization benchmark; and a process-swapping approach to rescheduling, applied to an N-body simulation. Experiments validating these methods were carried out on both the GrADS MacroGrid (a small but functional Grid) and the MicroGrid (a controlled emulation of the Grid).
  •  
2.
  • Berman, F., et al. (författare)
  • The GrADS project : Software support for high-level grid application development
  • 2001
  • Ingår i: The international journal of high performance computing applications. - : SAGE Publications. - 1094-3420 .- 1741-2846. ; 15:4, s. 327-344
  • Tidskriftsartikel (refereegranskat)abstract
    • Advances in networking technologies will soon make it possible to use the global information infrastructure in a qualitatively different way-as a computational as well as an information resource. As described in the recent book The Grid: Blueprint for a New Computing Infrastructure, this Grid will connect the nation's computers, databases, instruments, and people in a seamless web of computing and distributed intelligence, which can be used in an on demand fashion as a problem-solving resource in many fields of human endeavor-and, in particular, science and engineering. The availability of grid resources will give rise to dramatically new classes of applications, in which computing resources are no longer localized but, rather, distributed, heterogeneous, and dynamic; computation is increasingly sophisticated and multidisciplinary; and computation is integrated into our daily lives and, hence, subject to stricter time constraints than at present. The impact of these new applications will be pervasive, ranging from new systems for scientific inquiry, through computing support for crisis management, to the use of ambient computing to enhance personal mobile computing environments. To realize this vision, significant scientific and technical obstacles must be overcome. Principal among these is usability. The goal of the Grid Application Development Software (GrADS) project is to simplify distributed heterogeneous computing in the same way that the World Wide Web simplified information sharing over the Internet. To that end, the project is exploring the scientific and technical problems that must be solved to make it easier for ordinary scientific users to develop, execute, and tune applications on the Grid. In this paper, the authors describe the vision and strategies underlying the GrADS project, including the base software architecture for grid execution and performance monitoring, strategies and tools for construction of applications from libraries of grid-aware components, and development of innovative new science and engineering applications that can exploit these new technologies to run effectively in grid environments.
  •  
3.
  • Cooper, K., et al. (författare)
  • New Grid Scheduling and Rescheduling Methods in the GrADS Project
  • 2004
  • Konferensbidrag (refereegranskat)abstract
    • Summary form only given. The goal of the Grid Application Development Software (GrADS) project is to provide programming tools and an execution environment to ease program development for the grid. We present recent extensions to the GrADS software framework: 1. A new approach to scheduling workflow computations, applied to a 3D image reconstruction application; 2. A simple stop/migrate/restart approach to rescheduling grid applications, applied to a QR 3. A process-swapping approach to rescheduling, applied to an N-body simulation. Experiments validating these methods were carried out on both the GrADS MacroGrid (a small but functional grid) and the MicroGrid (a controlled emulation of the grid) and the results were demonstrated at the SC2003 conference.
  •  
4.
  • Kennedy, K., et al. (författare)
  • Telescoping languages : A strategy for automatic generation of scientific problem-solving systems from annotated libraries
  • 2001
  • Ingår i: Journal of Parallel and Distributed Computing. - : Elsevier BV. - 0743-7315 .- 1096-0848. ; 61:12, s. 1803-1826
  • Tidskriftsartikel (refereegranskat)abstract
    • As machines and programs have become more complex., the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more labor-intensive. This has substantially widened the software gap the discrepancy between the need for new software and the aggregate capacity of the workforce to produce it. This problem has been compounded by the slow growth of programming productivity, especially for high-performance programs, over the past two decades. One way to bridge this gap is to make it possible for end users to develop programs in high-level domain-specific programming systems. In the past, a major impediment to the acceptance of such systems has been the poor performance of the resulting applications. To address this problem, we are developing a new compiler-based infrastructure, called TeleGen, that will make it practical to construct efficient domain-specific high-level languages from annotated component libraries. We call these languages telescoping languages, because they can be nested within one another. For programs written in telescoping languages. high performance and reasonable compilation times can be achieved by exhaustively analyzing the component libraries in advance to produce a language processor that recognizes and optimizes library operations as primitives in the language. The key to making this strategy practical is to keep compile times low by generating a custom compiler with extensive built-in knowledge of the underlying libraries. The goal is to achieve compile times that tire linearly proportional to the size of the program presented by the user. rather than to the aggregate size of that program plus the base libraries.
  •  
5.
  • Gustavson, Fred G., et al. (författare)
  • Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms
  • 2013
  • Ingår i: ACM Transactions on Mathematical Software. - : Association for Computing Machinery (ACM). - 0098-3500 .- 1557-7295. ; 39:2, s. 9-
  • Tidskriftsartikel (refereegranskat)abstract
    • Four routines called DPOTF3i, i = a, b, c, d, are presented. DPOTF3i are a novel type of level-3 BLAS for use by BPF (Blocked Packed Format) Cholesky factorization and LAPACK routine DPOTRF. Performance of routines DPOTF3i are still increasing when the performance of Level-2 routine DPOTF2 of LAPACK starts decreasing. This is our main result and it implies, due to the use of larger block size nb, that DGEMM, DSYRK, and DTRSM performance also increases! The four DPOTF3i routines use simple register blocking. Different platforms have different numbers of registers. Thus, our four routines have different register blocking sizes. BPF is introduced. LAPACK routines for POTRF and PPTRF using BPF instead of full and packed format are shown to be trivial modifications of LAPACK POTRF source codes. We call these codes BPTRF. There are two variants of BPF: lower and upper. Upper BPF is "identical" to Square Block Packed Format (SBPF). "LAPACK" implementations on multicore processors use SBPF. Lower BPF is less efficient than upper BPF. Vector inplace transposition converts lower BPF to upper BPF very efficiently. Corroborating performance results for DPOTF3i versus DPOTF2 on a variety of common platforms are given for n approximate to nb as well as results for large n comparing DBPTRF versus DPOTRF.
  •  
6.
  • Gustavson, Fred G., et al. (författare)
  • Rectangular Full Packed Format for Cholesky's Algorithm : Factorization, Solution, and Inversion
  • 2010
  • Ingår i: ACM Transactions on Mathematical Software. - : Association for Computing Machinery (ACM). - 0098-3500 .- 1557-7295. ; 37:2, s. 1-21
  • Tidskriftsartikel (refereegranskat)abstract
    • We describe a new data format for storing triangular, symmetric, and Hermitian matrices called Rectangular Full Packed Format (RFPF). The standard two-dimensional arrays of Fortran and C (also known as full format) that are used to represent triangular and symmetric matrices waste nearly half of the storage space but provide high performance via the use of Level 3 BLAS. Standard packed format arrays fully utilize storage (array space) but provide low performance as there is no Level 3 packed BLAS. We combine the good features of packed and full storage using RFPF to obtain high performance via using Level 3 BLAS as RFPF is a standard full-format representation. Also, RFPF requires exactly the same minimal storage as packed the format. Each LAPACK full and/or packed triangular, symmetric, and Hermitian routine becomes a single new RFPF routine based on eight possible data layouts of RFPF. This new RFPF routine usually consists of two calls to the corresponding LAPACK full-format routine and two calls to Level 3 BLAS routines. This means no new software is required. As examples, we present LAPACK routines for Cholesky factorization, Cholesky solution, and Cholesky inverse computation in RFPF to illustrate this new work and to describe its performance on several commonly used computer platforms. Performance of LAPACK full routines using RFPF versus LAPACK full routines using the standard format for both serial and SMP parallel processing is about the same while using half the storage. Performance gains are roughly one to a factor of 43 for serial and one to a factor of 97 for SMP parallel times faster using vendor LAPACK full routines with RFPF than with using vendor and/or reference packed routines.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-6 av 6

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy