SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Rantakokko Jarmo) "

Sökning: WFRF:(Rantakokko Jarmo)

  • Resultat 1-38 av 38
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  •  
2.
  • Johansson, Henrik, 1975- (författare)
  • Autonomic Management of Partitioners for SAMR Grid Hierarchies
  • 2009
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Parallel structured adaptive mesh refinement methods decrease the execution time and memory usage of partial differential equation solvers by adaptively assigning computational resources to regions with large solution errors. These methods result in a dynamic grid hierarchy. To get good parallel performance, the grid hierarchy is frequently re-partitioned and distributed over the processors. Optimally, the partitioner should minimize all performance-inhibiting factors like load imbalance, communication volumes, synchronization delays, and data migration. No single partitioner performs well for all hierarchies and parallel computers. Because the partitioning conditions change during run-time, dynamically selecting a partitioner is non-trivial. In this thesis, we present the Meta-Partitioner: a partitioning framework that autonomously selects, configures, invokes, and evaluates partitioning algorithms during run-time. For the implementation, we use component-based software-engineering. We predict the performance of the candidate partitioning algorithms with historical performance data for grid hierarchies similar to the current hierarchy. We focus the partitioning effort on the most performance-inhibiting factors — the load imbalance and the synchronization delays. At re-partitioning, a user-specified number of partitioning algorithms is selected and invoked. The performance of each partitioning is evaluated during run-time and the best one is selected. The performance of the selected partitioning algorithms was compared both to the average performance of 768 algorithms and the global minimum at each re-partitioning. The results showed huge improvements both for the load imbalance and the synchronization delays. Compared to the average partitioning, the load imbalance was decreased by 28.2%. The synchronization delays were decreased by 21.5%. Compared to the global optimum, the load imbalance was increased by only 11.5%. For the synchronization delays, the increase was 13.6%. Often, the Meta-Partitioner selected the best algorithm among all candidate algorithms.
  •  
3.
  • Johansson, Henrik (författare)
  • Performance characterization and evaluation of parallel PDE solvers
  • 2006
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Computer simulations that solve partial differential equations (PDEs) are common in many fields of science and engineering. To decrease the execution time of the simulations, the PDEs can be solved on parallel computers. For efficient parallel implementations, the characteristics of both the hardware and the PDE solver must be taken into account. In this thesis, we explore two ways to increase the efficiency of parallel PDE solvers.First, we use full-system simulation of a parallel computer to get detailed knowledge about cache memory usage for three parallel PDE solvers. The results reveal cases of bad cache memory locality. This insight can be used to improve the performance of the PDE solvers.Second, we study the adaptive mesh refinement (AMR) partitioning problem. Using AMR, computational resources are dynamically concentrated to areas in need of a high accuracy. Because of the dynamic resource allocation, the workload must repeatedly be partitioned and distributed over the processors. We perform two comprehensive characterizations of partitioning algorithms for AMR on structured grids. For an efficient parallel AMR implementation, the partitioning algorithm must be dynamically selected at run-time with regard to both the application and computer state. We prove the viability of dynamic algorithm selection and present performance data that show the benefits of using a large number of complementing partitioning algorithms. Finally, we discuss how our characterizations can be used in an algorithm selection framework.
  •  
4.
  •  
5.
  •  
6.
  • Lindblad, Erik, et al. (författare)
  • Implicit-explicit Runge-Kutta methods for stiff combustion problems
  • 2009
  • Ingår i: SHOCK WAVES, VOL 1, PROCEEDINGS. - Berlin, Heidelberg : SPRINGER-VERLAG BERLIN. ; , s. 299-+, s. 299-304
  • Konferensbidrag (refereegranskat)abstract
    • New high order implicit-explicit Runge-Kutta methods have been developed and implemented into a finite volume code to solve the Navier-Stokes equations for reacting gas mixtures. If only the stiff chemistry is treated implicitly, the linear systems in each Newton iteration are simple and solved directly. Numerical simulations of deflagration-to-detonation transition (DDT) show the potential of the new time integration for computational combustion.
  •  
7.
  •  
8.
  • Ljungkvist, Karl, 1986- (författare)
  • Finite Element Computations on Multicore and Graphics Processors
  • 2017
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • In this thesis, techniques for efficient utilization of modern computer hardwarefor numerical simulation are considered. In particular, we study techniques for improving the performance of computations using the finite element method.One of the main difficulties in finite-element computations is how to perform the assembly of the system matrix efficiently in parallel, due to its complicated memory access pattern. The challenge lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype multicore processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data.Secondly, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. In addition to removing the need to store the system matrix, matrix-free methods are attractive due to their low memory footprint and therefore better match the architecture of modern processors where memory bandwidth is scarce and compute power is abundant. Motivated by this, we consider matrix-free implementations of high-order finite-element methods for execution on graphics processors, which have seen a revolutionary increase in usage for numerical computations during recent years due to their more efficient architecture. In the implementation, we exploit sum-factorization techniques for efficient evaluation of matrix-vector products, mesh coloring and atomic updates for concurrent updates, and a geometric multigrid algorithm for efficient preconditioning of iterative solvers. Our performance studies show that on the GPU, a matrix-free approach is the method of choice for elements of order two and higher, yielding both a significantly faster execution, and allowing for solution of considerably larger problems. Compared to corresponding CPU implementations executed on comparable multicore processors, the GPU implementation is about twice as fast, suggesting that graphics processors are about twice as power efficient as multicores for computations of this kind.
  •  
9.
  • Ljungkvist, Karl (författare)
  • Techniques for finite element methods on modern processors
  • 2015
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • In this thesis, methods for efficient utilization of modern computer hardware for numerical simulation are considered. In particular, we study techniques for speeding up the execution of finite-element methods.One of the greatest challenges in finite-element computation is how to efficiently perform the the system matrix assembly efficiently in parallel, due to its complicated memory access pattern. The main difficulty lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data.Furthermore, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. Motivated by its computational properties, we implement the matrix-free method for execution on graphics processors, using either atomic updates or a mesh coloring approach to handle the concurrent updates. A performance study shows that on the GPU, the matrix-free method is faster than a matrix-based implementation for many element types, and allows for solution of considerably larger problems. This suggests that the matrix-free method can speed up execution of large realistic simulations.
  •  
10.
  • Löf, Henrik, et al. (författare)
  • Algorithmic Optimizations of a Conjugate Gradient Solver on Shared Memory Architectures
  • 2004
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming abstractions. Unfortunately, the architecture-independent abstractions sometimes come with the price of low parallel performance. This is especially true for applications with unstructured data access pattern running on distributed shared memory systems (DSM). Here proper data distribution and algorithmic optimizations play a vital role for performance. In this article we have investigated ways of improving the performance of an industrial class conjugate gradient (CG) solver, implemented in OpenMP running on two types of shared memory systems.We have evaluated bandwidth minimization, graph partitioning and reformulations of the original algorithm reducing global barriers. By a detailed analysis of barrier time and memory system performance we found that bandwidth minimization is the most important optimization reducing both L2 misses and remote memory accesses. On an uniform memory system we get perfect scaling. On a NUMA system the performance is significantly improved with the algorithmic optimizations leaving the system dependent global reduction operations as a bottleneck.
  •  
11.
  •  
12.
  • Löf, Henrik, 1974- (författare)
  • Iterative and Adaptive PDE Solvers for Shared Memory Architectures
  • 2006
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Scientific computing is used frequently in an increasing number of disciplines to accelerate scientific discovery. Many such computing problems involve the numerical solution of partial differential equations (PDE). In this thesis we explore and develop methodology for high-performance implementations of PDE solvers for shared-memory multiprocessor architectures. We consider three realistic PDE settings: solution of the Maxwell equations in 3D using an unstructured grid and the method of conjugate gradients, solution of the Poisson equation in 3D using a geometric multigrid method, and solution of an advection equation in 2D using structured adaptive mesh refinement. We apply software optimization techniques to increase both parallel efficiency and the degree of data locality. In our evaluation we use several different shared-memory architectures ranging from symmetric multiprocessors and distributed shared-memory architectures to chip-multiprocessors. For distributed shared-memory systems we explore methods of data distribution to increase the amount of geographical locality. We evaluate automatic and transparent page migration based on runtime sampling, user-initiated page migration using a directive with an affinity-on-next-touch semantic, and algorithmic optimizations for page-placement policies. Our results show that page migration increases the amount of geographical locality and that the parallel overhead related to page migration can be amortized over the iterations needed to reach convergence. This is especially true for the affinity-on-next-touch methodology whereby page migration can be initiated at an early stage in the algorithms. We also develop and explore methodology for other forms of data locality and conclude that the effect on performance is significant and that this effect will increase for future shared-memory architectures. Our overall conclusion is that, if the involved locality issues are addressed, the shared-memory programming model provides an efficient and productive environment for solving many important PDE problems.
  •  
13.
  • Löf, Henrik (författare)
  • Parallelizing the Method of Conjugate Gradients for Shared Memory Architectures
  • 2004
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Solving Partial Differential Equations (PDEs) is an important problem in many fields of science and engineering. For most real-world problems modeled by PDEs, we can only approximate the solution using numerical methods. Many of these numerical methods result in very large systems of linear equations. A common way of solving these systems is to use an iterative solver such as the method of conjugate gradients. Furthermore, due to the size of these systems we often need parallel computers to be able to solve them in a reasonable amount of time.Shared memory architectures represent a class of parallel computer systems commonly used both in commercial applications and in scientific computing. To be able to provide cost-efficient computing solutions, shared memory architectures come in a large variety of configurations and sizes. From a programming point of view, we do not want to spend a lot of effort optimizing an application for a specific computer architecture. We want to find methods and principles of optimizing our programs that are generally applicable to a large class of architectures.In this thesis, we investigate how to implement the method of conjugate gradients efficiently on shared memory architectures. We seek algorithmic optimizations that result in efficient programs for a variety of architectures. To study this problem, we have implemented the method of conjugate gradients using OpenMP and we have measured the runtime performance of this solver on a variety of both uniform and non-uniform shared memory architectures. The input data used in the experiments come from a Finite-Element discretization of the Maxwell equations in three dimensions of a fighter-jet geometry.Our results show that, for all architectures studied, optimizations targeting the memory hierarchy exhibited the largest performance increase. Improving the load balance, by balancing the arithmetical work and minimizing the number of global barriers showed to be of lesser importance. Overall, bandwidth minimization of the iteration matrix showed to be the most efficient optimization.On non-uniform architectures, proper data distribution showed to be very important. In our experiments we used page migration to improve the data distribution during runtime. Our results indicate that page migration can be very efficient if we can keep the migration cost low. Furthermore, we believe that page migration can be introduced in a portable way into OpenMP in the form of a directive with a affinity-on-next-touch semantic.
  •  
14.
  • Naps, Thomas, et al. (författare)
  • Evaluating the educational impact of visualization
  • 2003
  • Ingår i: SIGCSE Bulletin inroads. - : Association for Computing Machinery (ACM). - 0097-8418. ; 35:4, s. 124-136
  • Tidskriftsartikel (refereegranskat)abstract
    • The educational impact of visualization depends not only on how well students learn when they use it, but also on how widely it is used by instructors. Instructors believe that visualization helps students learn. The integration of visualization techniques in classroom instruction, however, has fallen far short of its potential. This paper considers this disconnect, identifying its cause in a failure to understand the needs of a key member in the hierarchy of stakeholders, namely the instructor. We describe these needs and offer guidelines for both the effective deployment of visualizations and the evaluation of instructor satisfaction. We then consider different forms of evaluation and the impact of student learning styles on learner outcomes.
  •  
15.
  • Nordén, Markus, et al. (författare)
  • Dynamic data migration for structured AMR solvers
  • 2007
  • Ingår i: International journal of parallel programming. - : Springer Science and Business Media LLC. - 0885-7458 .- 1573-7640. ; 35, s. 477-491
  • Tidskriftsartikel (refereegranskat)
  •  
16.
  •  
17.
  • Nordén, Markus, et al. (författare)
  • Geographical locality and dynamic data migration for OpenMP implementations of adaptive PDE solvers
  • 2006
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality. In this article, we study the performance of a parallel PDE solver with adaptive mesh refinement. The solver is parallelized using OpenMP and the adaptive mesh refinement makes dynamic load balancing necessary. Due to the dynamically changing memory access pattern caused by the runtime adaption, it is a challenging task to achieve a high degree of geographical locality.The main conclusions of the study are: (1) that geographical locality is very important for the performance of the solver, (2) that the performance can be improved significantly using dynamic page migration of misplaced data, (3) that a migrate-on-next-touch directive works well whereas the first-touch strategy is less advantageous for programs exhibiting a dynamically changing memory access patterns, and (4) that the overhead for such migration is low compared to the total execution time.
  •  
18.
  • Olsson, Peter, et al. (författare)
  • Software tools for parallel CFD on composite grids
  • 1996
  • Ingår i: Parallel Computational Fluid Dynamics. - Amsterdam, The Netherlands : Elsevier Science. - 0444823220 ; , s. 725-732
  • Konferensbidrag (refereegranskat)
  •  
19.
  •  
20.
  •  
21.
  •  
22.
  •  
23.
  •  
24.
  • Rantakokko, Jarmo (författare)
  • Case-centered learning of scientific computing
  • 2006
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • Traditionally courses in scientific computing teach a large number of methods and algorithms for different mathematical problems. The algorithms are applied on simplified problems and not on real applications. The result is that the students can't see the main thread, they focus only on the details of the methods and don't see the entirety. The students can not put what they learn into perspective and their motivation to study becomes diminished. In this paper we suggest a case-centered approach for learning scientific computing. We use a real-life case, weather prediction, as a starting point for learning. The case is analyzed and discussed in class. To follow up the discussions the students are assigned learning tasks in scientific computing defined from the case analysis. The real-life application connects the different topics in scientific computing together and motivates the students. The response from the students has been positive and the case has increased their understanding of what scientific computing is and what it can be used for.
  •  
25.
  •  
26.
  •  
27.
  • Rantakokko, Jarmo, 1966- (författare)
  • Data Partitioning Methods and Parallel Block-Oriented PDE Solvers
  • 1998
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Data partitioning methods for block-structured problems within scientific computing have been studied. The applications are variational data assimilation in meteorology, ocean modeling, and airflow simulation with multiblock grids. Parallel computers offer in a cost efficient way the computational power and memory that is needed for these kinds of applications. An appropriate data partitioning is then necessary to get a high parallel efficiency and utilization of the parallel computer.In the meteorological and oceanographical applications the problem with an irregular workload distribution is treated. In meteorological data assimilation, weather observations are merged together with the dynamical flow model in order to compute an initial state of the atmosphere at a given time. Here, the observations are irregularly distributed in space and time. In ocean modeling the workload varies due to sea depth and ice conditions. New data partitioning methods have been developed for these applications. The new methods are better adapted to the problems and thus give higher efficiency than previous data partitioning methods.In the multiblock applications an additional difficulty is the irregular data dependencies. The blocks in a multiblock grid are usually of different sizes and irregularly coupled. This makes the data partitioning non-trivial. New methods have been developed and different strategies have been investigated - both experimentally and theoretically - using a compressible Navier-Stokes solver as a model problem. The behavior of the different strategies depends very much on the number of subgrids and their sizes as well as the number of processors.Moreover, software tools for block-oriented PDE solvers have been constructed. The tools are written in Fortran 90 with an object-oriented design and support explicit finite difference methods and multiblock grids. Programs using the tools run on parallel computers and the proposed data partitioning methods are utilized.
  •  
28.
  • Rantakokko, Jarmo (författare)
  • Interactive learning of algorithms
  • 2004
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • Visualization is believed to be an effective technique for learning and understanding algorithms in traditional computer science. In this paper, we focus on parallel computing and algorithms. An inherent difficulty with parallel programming is that it requires synchronization and coordination of the concurrent activities. We want to use visualization to help students to understand how the processors work together in an algorithm and how they interact through communication. To conceptualize this we have used two different visualization techniques, computer animations and role plays. As the students can see how the processors run simultaneously in parallel, it illustrates important concepts such as processor load balance, serialization bottlenecks, synchronization and communication. The results show that both animations and role plays are better for learning and understanding algorithms than the textbook.
  •  
29.
  •  
30.
  •  
31.
  •  
32.
  •  
33.
  •  
34.
  •  
35.
  • Steensland, Johan, 1963- (författare)
  • Efficient Partitioning of Dynamic Structured Grid Hierarchies
  • 2002
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • This thesis aims at decreasing execution time for large-scale structured adaptive mesh refinement (SAMR) applications executing on general parallel computers. The key contributions are (1) A conceptual dynamically adaptive meta-partitioner able to select and configure an appropriate partitioning technique, based on application and computer state, and (2) A characterization of new and existing domain-based partitioners, enabling a mapping from application and computer state onto appropriate partitioners, and (3) Sketched scalable solutions, expressed in terms of natural regions and expert algorithms, for the problem to efficiently partition large-scale SAMR applications with deep grid hierarchies executing on general parallel computers, and (4) A software partitioning tool Nature+Fable, implementing the greater parts of these sketched scalable solutions and engineered as part of the meta-partitioner. Both in academia and industry, computer simulations of physical phenomena are becoming increasingly popular as they constitute an important complement to real-life testing. In many cases, such simulations are based on solving partial differential equations by numerical methods. Adaptive methods are crucial to efficiently utilize computer resources such as memory and CPU. But even with adaption, the simulations are computationally demanding and yield huge data sets. Thus, parallelization is a necessity, demanding the next level of wise resource utilization --- the partitioning of data. Adaption causes the workload to change dynamically, calling for dynamic (re-) partitioning to maintain efficient resource utilization. The primary motivation for the present work is twofold, viz. (1) No single partitioning technique performs the best for all applications and computer systems, and (2) No established partitioning technique copes efficiently with large-scale SAMR applications with deep grid hierarchies executing on general parallel computers. The conclusions are that the execution time for large-scale SAMR applications can be decreased by the meta-partitioner, and that our proposed scalable solutions exhibit promising properties and behave as expected or better. Consequently, this thesis takes a step towards decreasing the execution times for large-scale SAMR applications.
  •  
36.
  •  
37.
  • Thuné, Michael, et al. (författare)
  • Object-oriented modeling of parallel PDE solvers
  • 2001
  • Ingår i: The Architecture of Scientific Software. - Norwell, MA : Kluwer Academic Publishers. - 0792373391 ; , s. 159-174
  • Konferensbidrag (refereegranskat)
  •  
38.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-38 av 38
Typ av publikation
konferensbidrag (13)
tidskriftsartikel (10)
doktorsavhandling (5)
rapport (4)
licentiatavhandling (4)
bokkapitel (2)
visa fler...
visa färre...
Typ av innehåll
refereegranskat (25)
övrigt vetenskapligt/konstnärligt (13)
Författare/redaktör
Rantakokko, Jarmo (30)
Löf, Henrik (6)
Thuné, Michael (5)
Nordén, Markus (5)
Rantakokko, Jarmo, D ... (5)
Holmgren, Sverker (4)
visa fler...
Olsson, Peter (2)
Lötstedt, Per (2)
Åhlander, Krister (2)
Hagersten, Erik, Pro ... (2)
Thuné, Michael, Prof ... (2)
Liberman, Michael A. (2)
Otto, Kurt (2)
Lindblad, Erik (2)
Rantakokko, Jarmo, D ... (2)
Johansson, Henrik (1)
Edelvik, Fredrik (1)
Valiev, Damir (1)
Muller, B (1)
Larsson, Elisabeth (1)
Lindskog, Magnus (1)
von Sydow, Lina (1)
Malmi, Lauri (1)
Gustafsson, Nils (1)
Korhonen, Ari (1)
Mossberg, Eva (1)
Fleischer, Rudolf (1)
Cooper, Stephen (1)
Holmgren, Sverker, D ... (1)
Anderson, Jay (1)
Valiev, Damir M. (1)
Pålsson, Stefan (1)
Berre, Loik (1)
Huang, Xiang-Yu (1)
Navascues, Beatriz (1)
Thorsteinsson, Sigur ... (1)
Wallin, Dan (1)
Müller, Bernhard (1)
Steensland, Johan (1)
Johansson, Henrik, 1 ... (1)
Parashar, Manish, Pr ... (1)
Funkquist, Lennart (1)
Ljungkvist, Karl (1)
Mogensen, Kristian S ... (1)
Yang, Xiaohua (1)
Andræ, Ulf (1)
Ljungberg, Malin (1)
Ljungkvist, Karl, 19 ... (1)
Cai, Xing, Professor (1)
Löf, Henrik, 1974- (1)
visa färre...
Lärosäte
Uppsala universitet (38)
Kungliga Tekniska Högskolan (1)
Stockholms universitet (1)
Karlstads universitet (1)
Språk
Engelska (38)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (38)
Samhällsvetenskap (5)
Teknik (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy