SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Kjelgaard Mikkelsen Carl Christian 1976 ) "

Sökning: WFRF:(Kjelgaard Mikkelsen Carl Christian 1976 )

  • Resultat 1-26 av 26
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Adlerborn, Björn, et al. (författare)
  • Towards Highly Parallel and Compute-Bound Computation of Eigenvectors of Matrices in Schur Form
  • 2017
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • In this paper we discuss the problem of computing eigenvectors for matrices in Schur form using parallel computing. We develop a new parallel algorithm and report on the performance of our MPI based implementation. We have also implemented a new parallel algorithm for scaling during the backsubstitution phase. We have increased the arithmetic intensity by interleaving the compution of several eigenvectors and by merging the backward substitution and the back-transformation of the eigenvector computation.
  •  
2.
  • Karlsson, Lars, 1982-, et al. (författare)
  • Improving Perfect Parallelism
  • 2014
  • Ingår i: Parallel Processing and Applied Mathematics. - Berlin, Heidelberg : Springer Berlin/Heidelberg. - 9783642552243 - 9783642552236 ; , s. 76-85
  • Konferensbidrag (refereegranskat)abstract
    • We reconsider the familiar problem of executing a perfectly parallel workload consisting of N independent tasks on a parallel computer with P << N processors. We show that there are memory-bound problems for which the runtime can be reduced by the forced parallelization of individual tasks across a small number of cores. Specific examples include solving differential equations, performing sparse matrix-vector multiplications, and sorting integer keys.
  •  
3.
  •  
4.
  • Kjelgaard Mikkelsen, Carl Christian, 1976-, et al. (författare)
  • Accelerating Sparse Arithmetic in the Context of Newton's Method for Small Molecules with Bond Constraints
  • 2016
  • Ingår i: Parallel Processing and Applied Mathematics, PPAM 2015, Part I. - Cham : Springer International Publishing Switzerland. - 9783319321493 - 9783319321486 ; , s. 160-171
  • Konferensbidrag (refereegranskat)abstract
    • Molecular dynamics is used to study the time evolution of systems of atoms. It is common to constrain bond lengths in order to increase the time step of the simulation. Here we accelerate Newton's method for solving the constraint equations for a system consisting of many identical small molecules. Starting with a modular and generic base code using a sequential data layout, we apply three different optimization techniques. The compiled code approach is used to generate subroutines equivalent to a single step of Newton's method for a user specified molecule. Differing from the generic subroutines, these specific routines contain no loops and no indirect addressing. Interleaving the data describing different molecules generates vectorizable loops. Finally, we apply task fusion. The simultaneous application of all three techniques increases the speed of the base code by a factor of 15 for single precision calculations.
  •  
5.
  • Kjelgaard Mikkelsen, Carl Christian, 1976- (författare)
  • Any positive residual history is possible for the Arnoldi method for Lyapunov matrix equations
  • 2010
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • In this paper we consider the Lyapunov equation AX+XA^T+bb^T = 0, where A is negative definite n by n matrix and b in R^n. The Arnoldi method is an iterative algorithm which can be used to compute an approximate solution. However, the convergence can be very slow and in this paper we show how to explicitly construct a Lyapunov equation with a given residual curve. The matrix A can be chosen as symmetric negative definite and it is possible to arbitrarily specify the elements on the diagonal of the Cholesky factor of -A. If the symmetry is dropped, then it is possible to arbitrarily specify A+A^T, while retaining the residual curve.
  •  
6.
  • Kjelgaard Mikkelsen, Carl Christian, 1976- (författare)
  • Any positive residual history is possible for the EKSM for Lyapunov matrix equations
  • 2010
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • Let A in be an n by n matrix and let B be an n by p matrix and consider the Lyapunov matrix equation AX+XA^T+BB^T=0. If A+A^T < 0, then the extended Krylov subspace method (EKSM) can be used to compute a sequence of low rank approximations of X. In this paper we show that any positive residual history is possible for the EKSM for Lyapunov matrix equations. In addition, we show how to systematically construct linear time invariant systems for which it is impractical to approximate the action of the product of the system Gramians using the EKSM. This is a property of the underlying Lyapunov matrix equations, rather than a defect of the algorithm.
  •  
7.
  • Kjelgaard Mikkelsen, Carl Christian, 1976-, et al. (författare)
  • Blocked Algorithms for Robust Solution of Triangular Linear Systems
  • 2018
  • Ingår i: Parallel Processing and Applied Mathematics. - Cham : Springer. - 9783319780238 - 9783319780245 ; , s. 68-78
  • Konferensbidrag (refereegranskat)abstract
    • We consider the problem of computing a scaling α such that the solution x of the scaled linear system Tx = αb can be computed without exceeding an overflow threshold Ω. Here T is a non-singular upper triangular matrix and b is a single vector, and Ω is less than the largest representable number. This problem is central to the computation of eigenvectors from Schur forms. We show how to protect individual arithmetic operations against overflow and we present a robust scalar algorithm for the complete problem. Our algorithm is very similar to xLATRS in LAPACK. We explain why it is impractical to parallelize these algorithms. We then derive a robust blocked algorithm which can be executed in parallel using a task-based run-time system such as StarPU. The parallel overhead is increased marginally compared with regular blocked backward substitution.
  •  
8.
  •  
9.
  • Kjelgaard Mikkelsen, Carl Christian, 1976-, et al. (författare)
  • How accurate does Newton have to be?
  • 2023
  • Ingår i: Parallel processing and applied mathematics. - Switzerland : Springer Nature. - 9783031304415 - 9783031304422 ; , s. 3-15
  • Bokkapitel (refereegranskat)abstract
    • We analyze the convergence of quasi-Newton methods in exact and finite precision arithmetic. In particular, we derive an upper bound for the stagnation level and we show that any sufficiently exact quasi-Newton method will converge quadratically until stagnation. In the absence of sufficient accuracy, we are likely to retain rapid linear convergence. We confirm our analysis by computing square roots and solving bond constraint equations in the context of molecular dynamics. We briefly discuss implications for parallel solvers.
  •  
10.
  • Kjelgaard Mikkelsen, Carl Christian, 1976-, et al. (författare)
  • Incomplete cyclic reduction of banded and strictly diagonally dominant linear systems
  • 2012
  • Ingår i: PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I. - Berlin, Heidelberg : Springer. - 9783642314636 ; , s. 80-91
  • Konferensbidrag (refereegranskat)abstract
    • The ScaLAPACK library contains a pair of routines for solving banded linear systems which are strictly diagonally dominant by rows. Mathematically, the algorithm is complete block cyclic reduction corresponding to a particular block partitioning of the system. In this paper we extend Heller’s analysis of incomplete cyclic reduction for block tridiagonal systems to the ScaLAPACK case. We obtain a tight estimate on the significance of the off diagonal blocks of the tridiagonal linear systems generated by the cyclic reduction algorithm. Numerical experiments illustrate the advantage of omitting all but the first reduction step for a class of matrices related to high order approximations of the Laplace operator.
  •  
11.
  • Kjelgaard Mikkelsen, Carl Christian, 1976-, et al. (författare)
  • Newton's method revisited : how accurate do we have to be?
  • 2024
  • Ingår i: Concurrency and Computation. - : John Wiley & Sons. - 1532-0626 .- 1532-0634. ; 36:10
  • Tidskriftsartikel (refereegranskat)abstract
    • We analyze the convergence of quasi-Newton methods in exact and finite precision arithmetic using three different techniques. We derive an upper bound for the stagnation level and we show that any sufficiently exact quasi-Newton method will converge quadratically until stagnation. In the absence of sufficient accuracy, we are likely to retain rapid linear convergence. We confirm our analysis by computing square roots and solving bond constraint equations in the context of molecular dynamics. In particular, we apply both a symmetric variant and Forsgren's variant of the simplified Newton method. This work has implications for the implementation of quasi-Newton methods regardless of the scale of the calculation or the machine.
  •  
12.
  • Kjelgaard Mikkelsen, Carl Christian, 1976-, et al. (författare)
  • Parallel Robust Computation of Generalized Eigenvectors of Matrix Pencils
  • 2020
  • Ingår i: Parallel Processing and Applied Mathematics. - Cham : Springer. - 9783030432287 - 9783030432294 ; , s. 58-69
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we consider the problem of computing generalized eigenvectors of a matrix pencil in real Schur form. In exact arithmetic, this problem can be solved using substitution. In practice, substitution is vulnerable to floating-point overflow. The robust solvers xtgevc in LAPACK prevent overflow by dynamically scaling the eigenvectors.These subroutines are scalar and sequential codes which compute theeigenvectors one by one. In this paper, we discuss how to derive robust algorithms which are blocked and parallel. The new StarNEig librarycontains a robust task-parallel solver Zazamoukh which runs on top of StarPU. Our numerical experiments show that Zazamoukh achieves a super-linear speedup compared with dtgevc for sufficiently large matrices.
  •  
13.
  • Kjelgaard Mikkelsen, Carl Christian, 1976-, et al. (författare)
  • Parallel robust solution of triangular linear systems
  • 2019
  • Ingår i: Concurrency and Computation. - : John Wiley & Sons. - 1532-0626 .- 1532-0634. ; 31:19
  • Tidskriftsartikel (refereegranskat)abstract
    • Triangular linear systems are central to the solution of general linear systems and the computation of eigenvectors. In the absence of floating‐point exceptions, substitution runs to completion and solves a system which is a small perturbation of the original system. If the matrix is well‐conditioned, then the normwise relative error is small. However, there are well‐conditioned systems for which substitution fails due to overflow. The robust solvers xLATRS from LAPACK extend the set of linear systems which can be solved by dynamically scaling the solution and the right‐hand side to avoid overflow. These solvers are sequential and apply to systems with a single right‐hand side. This paper presents algorithms which are blocked and parallel. A new task‐based parallel robust solver (Kiya) is presented and compared against both DLATRS and the non‐robust solvers DTRSV and DTRSM. When there are many right‐hand sides, Kiya performs significantly better than the robust solver DLATRS and is not significantly slower than the non‐robust solver DTRSM.
  •  
14.
  • Kjelgaard Mikkelsen, Carl Christian, 1976- (författare)
  • Retracing the residual curve of a Lyapunov equation solver
  • 2011
  • Ingår i: BIT Numerical Mathematics. - : Springer. - 0006-3835 .- 1572-9125. ; 51:4, s. 959-975
  • Tidskriftsartikel (refereegranskat)abstract
    • Let A ∈ Rn×n and let B ∈ Rn×p and consider the Lyapunov matrix equation AX + XAT + BBT = 0. If A + AT < 0, then the extended Krylov subspacemethod (EKSM) can be used to compute a sequence of low rank approximations of X. In this paper we show how to construct a symmetric negative definite matrix A and a column vector B, for which the EKSM generates a predetermined residual curve.
  •  
15.
  • Kjelgaard Mikkelsen, Carl Christian, 1976-, et al. (författare)
  • Robust Solution of Triangular Linear Systems
  • 2017
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • We consider the problem of computing a scaling α such that the solution x of the scaled linear system Tx=\alpha b can be computed without exceeding the overflow threshold $\Omega$. Here T is a non-singular upper triangular matrix and b is a single vector. We show how to protect individual arithmetic operations against overflow and we present a robust scalar algorithm for the complete problem. Our algorithm is very similar to xLATRS in LAPACK and our main contribution is to simplify and extend the analysis. We explain why it is impractical to parallelize these algorithms. We then derive a robust block algorithm for solving triangular linear systems. Any run-time system such as StarPU which can run a task based backward substitution algorithm can also execute our robust block algorithm in parallel. It is simply a matter of replacing standard kernels with our robust kernels.
  •  
16.
  • Kjelgaard Mikkelsen, Carl Christian, 1976- (författare)
  • The explicit Spike algorithm : Iterative solution of the reduced system
  • 2012
  • Ingår i: High-performance scientific computing. - London : Springer. - 9781447124368 - 9781447124375 ; , s. 147-156
  • Bokkapitel (refereegranskat)abstract
    • The explicit Spike algorithm applies to narrow banded linear systems which are strictly diagonally dominant by rows. The parallel bottleneck is the solution of the so-called reduced system which is block tridiagonal and strictly diagonally dominant by rows. The reduced system can be solved iteratively using the truncated reduced system matrix as a preconditioner. In this paper we derive a tight estimate for the quality of this preconditioner.
  •  
17.
  •  
18.
  •  
19.
  • López-Villellas, Lorién, et al. (författare)
  • Accurate and efficient constrained molecular dynamics of polymers using Newton's method and special purpose code
  • 2023
  • Ingår i: Computer Physics Communications. - : Elsevier. - 0010-4655 .- 1879-2944. ; 288
  • Tidskriftsartikel (refereegranskat)abstract
    • In molecular dynamics simulations we can often increase the time step by imposing constraints on bond lengths and bond angles. This allows us to extend the length of the time interval and therefore the range of physical phenomena that we can afford to simulate. We examine the existing algorithms and software for solving nonlinear constraint equations in parallel and we explain why it is necessary to advance the state-of-the-art. We present ILVES-PC, a new algorithm for imposing bond constraints on proteins accurately and efficiently. It solves the same system of differential algebraic equations as the celebrated SHAKE algorithm, but ILVES-PC solves the nonlinear constraint equations using Newton’s method rather than the nonlinear Gauss-Seidel method. Moreover, ILVES-PC solves the necessary linear systems using a specialized linear solver that exploits the structure of the protein. ILVES-PC can rapidly solve constraint equations as accurately as the hardware will allow. The run-time of ILVES-PC is proportional to the number of constraints. We have integrated ILVES-PC into GROMACS and simulated proteins of different sizes. Compared with SHAKE, we have achieved speedups of up to 4.9× in single-threaded executions and up to 76× in shared-memory multi-threaded executions. Moreover, ILVES-PC is more accurate than P-LINCS algorithm. Our work is a proof-of-concept of the utility of software designed specifically for the simulation of polymers.
  •  
20.
  •  
21.
  • Myllykoski, Mirko, 1988-, et al. (författare)
  • Introduction to StarNEig : A Task-based Library for Solving Nonsymmetric Eigenvalue Problems
  • 2020
  • Ingår i: Parallel Processing and Applied Mathematics. - Cham : Springer. - 9783030432287 - 9783030432294 ; , s. 70-81
  • Konferensbidrag (refereegranskat)abstract
    • Abstract. In this paper, we present the StarNEig library for solvingdense nonsymmetric (generalized) eigenvalue problems. The library isbuilt on top of the StarPU runtime system and targets both shared anddistributed memory machines. Some components of the library supportGPUs. The library is currently in an early beta state and only real arith-metic is supported. Support for complex data types is planned for afuture release. This paper is aimed at potential users of the library. Wedescribe the design choices and capabilities of the library, and contrastthem to existing software such as ScaLAPACK. StarNEig implements aScaLAPACK compatibility layer that should make it easy for new usersto transition to StarNEig. We demonstrate the performance of the librarywith a small set of computational experiments.
  •  
22.
  • Myllykoski, Mirko, 1988-, et al. (författare)
  • Task-Based Parallel Algorithms for Eigenvalue Reordering of Matrices in Real Schur Forms
  • 2017
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • We develop a task-based parallel algorithm for reordering eigenvalues of matrices in real Schur form. We describe how we implemented the algorithm using StarPU runtime system and report on experiments performed on a shared memory machine. Compared with ScaLAPACK we achieve average speedup of 3. We have strong and weak scaling efficiencies which are well above 50%. We are able to achieve more than 50% of the peak flop rate for all but the smallest matrices. The idle time and the overhead is negligible except for the smallest matrices. The next step is to reconfigure and further develop the code so that it can be applied to matrix pairs in generalized Schur forms and run efficiently on distributed memory machines.
  •  
23.
  • Myllykoski, Mirko, 1988-, et al. (författare)
  • Task‐based, GPU‐accelerated and robust library for solving dense nonsymmetric eigenvalue problems
  • 2021
  • Ingår i: Concurrency and Computation. - : John Wiley & Sons. - 1532-0626 .- 1532-0634. ; 33:11
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we present the StarNEig library for solving dense nonsymmetric standard and generalized eigenvalue problems. The library is built on top of the StarPU runtime system and targets both shared and distributed memory machines. Some components of the library have support for GPU acceleration. The library currently applies to real matrices with real and complex eigenvalues and all calculations are done using real arithmetic. Support for complex matrices is planned for a future release. This paper is aimed at potential users of the library. We describe the design choices and capabilities of the library, and contrast them to existing software such as LAPACK and ScaLAPACK. StarNEig implements a ScaLAPACK compatibility layer which should assist new users in the transition to StarNEig. We demonstrate the performance of the library with a sample of computational experiments.
  •  
24.
  • Schwarz, Angelika Beatrix, 1989- (författare)
  • Improving the efficiency of eigenvector-related computations
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • An effective strategy in dense linear algebra is the design of algorithms as tiled algorithms. Tiled algorithms that express the bulk of the computation as matrix-matrix operations (level-3 BLAS) have proven successful in achieving high performance on cache-based architectures. At the same time, tiled algorithms interoperate with dynamic data-driven execution models such as task parallelism and promise good parallel scalability.This thesis applies the concept of tiled algorithms and task-centric execution to algorithms related to the computation of eigenvectors for the dense, non-symmetric eigenvalue problem. First, a standard algorithm for computing eigenvectors from the Schur form is recast such that all computational steps are rich in matrix-matrix operations. Second, inverse iteration on the Hessenberg matrix as an alternative approach to computing eigenvectors is addressed. An existing algorithm is revised to express the computationally most expensive step with matrix-matrix operations. Third, a task-parallel, tiled triangular Sylvester equation solver is amended to solve a larger class of problems. All algorithms have an enhanced performance, which is demonstrated through numerical experiments.
  •  
25.
  • Schwarz, Angelika Beatrix, 1989-, et al. (författare)
  • Robust parallel eigenvector computation for the non-symmetric eigenvalue problem
  • 2020
  • Ingår i: Parallel Computing. - : Elsevier. - 0167-8191 .- 1872-7336. ; 100
  • Tidskriftsartikel (refereegranskat)abstract
    • A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schur form relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.
  •  
26.
  • Schwarz, Angelika Beatrix, et al. (författare)
  • Robust Task-Parallel Solution of the Triangular Sylvester Equation
  • 2020
  • Ingår i: Parallel Processing and Applied Mathematics. - Cham : Springer. - 9783030432287 - 9783030432294 ; , s. 82-92
  • Konferensbidrag (refereegranskat)abstract
    • The Bartels-Stewart algorithm is a standard approach to solving the dense Sylvester equation. It reduces the problem to the solution of the triangular Sylvester equation. The triangular Sylvester equation is solved with a variant of backward substitution. Backward substitution is prone to overflow. Overflow can be avoided by dynamic scaling of the solution matrix. An algorithm which prevents overflow is said to be robust. The standard library LAPACK contains the robust scalar sequential solver dtrsyl. This paper derives a robust, level-3 BLAS-based task-parallel solver. By adding overflow protection, our robust solver closes the gap between problems solvable by LAPACK and problems solvable by existing non-robust task-parallel solvers. We demonstrate that our robust solver achieves a performance similar to non-robust solvers.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-26 av 26

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy