↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Träfflista för sökning "WFRF:(Markidis Stefano) "

Sökning: WFRF:(Markidis Stefano)

Resultat 1-10 av 199

Sortera/gruppera träfflistan

Sortering: Träffar per sida:

Numrering	Referens	Omslagsbild	Hitta
1.	Afzal, Ayesha, et al. (författare) Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications 2023 Ingår i: Parallel Processing and Applied Mathematics - 14th International Conference, PPAM 2022, Revised Selected Papers. - : Springer Nature. ; , s. 155-170 Konferensbidrag (refereegranskat)abstract This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new “phase space plot,” we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.
2.	Afzal, Ayesha, et al. (författare) Making applications faster by asynchronous execution : Slowing down processes or relaxing MPI collectives 2023 Ingår i: Future generations computer systems. - : Elsevier BV. - 0167-739X .- 1872-7115. ; 148, s. 472-487 Tidskriftsartikel (refereegranskat)abstract Comprehending the performance bottlenecks at the core of the intricate hardware-software inter-actions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI communication in memory-bound parallel programs on multicore clusters and how it can be facilitated. For instance, slowing down MPI processes by deliberate injection of delays can improve performance if certain conditions are met. This leads to the counter-intuitive conclusion that noise, independent of its source, is not always detrimental but can be leveraged for performance improvements. We employ phase-space graphs as a new tool to visualize parallel program dynamics. They are useful in spotting certain patterns in parallel execution that will easily go unnoticed with traditional tracing tools. We investigate five different microbenchmarks and applications on different supercomputer platforms: an MPI-augmented STREAM Triad, two implementations of Lattice-Boltzmann fluid solvers (D3Q19 and SPEChpc D2Q37), the LULESH and HPCG proxy applications.
3.	Aguilar, Xavier, et al. (författare) A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations 2021 Ingår i: 2021 IEEE International Conference On Cluster Computing (CLUSTER 2021). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 692-697 Konferensbidrag (refereegranskat)abstract We design and develop a new Particle-in-Cell (PIC) method for plasma simulations using Deep-Learning (DL) to calculate the electric field from the electron phase space. We train a Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN) to solve the two-stream instability test. We verify that the DL-based MLP PIC method produces the correct results using the two-stream instability: the DL-based PIC provides the expected growth rate of the two-stream instability. The DL-based PIC does not conserve the total energy and momentum. However, the DL-based PIC method is stable against the cold-beam instability, affecting traditional PIC methods. This work shows that integrating DL technologies into traditional computational methods is a viable approach for developing next-generation PIC algorithms.
4.	Akhmetova, Dana, et al. (författare) Interoperability of GASPI and MPI in large scale scientific applications 2018 Ingår i: 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017. - Cham : Springer Verlag. - 9783319780535 ; , s. 277-287 Konferensbidrag (refereegranskat)abstract One of the main hurdles of a broad distribution of PGAS approaches is the prevalence of MPI, which as a de-facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like GASPI without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we address this challenge by providing our study and preliminary performance results regarding interoperating GASPI and MPI on the performance crucial parts of the Ludwig and iPIC3D applications. In addition, we draw a strategy for better coupling of both APIs.
5.	Akhmetova, D., et al. (författare) On the application task granularity and the interplay with the scheduling overhead in many-core shared memory systems 2015 Ingår i: Proceedings - IEEE International Conference on Cluster Computing, ICCC. - : IEEE. - 9781467365987 ; , s. 428-437 Konferensbidrag (refereegranskat)abstract Task-based programming models are considered one of the most promising programming model approaches for exascale supercomputers because of their ability to dynamically react to changing conditions and reassign work to processing elements. One question, however, remains unsolved: what should the task granularity of task-based applications be? Fine-grained tasks offer more opportunities to balance the system and generally result in higher system utilization. However, they also induce in large scheduling overhead. The impact of scheduling overhead on coarse-grained tasks is lower, but large systems may result imbalanced and underutilized. In this work we propose a methodology to analyze the interplay between application task granularity and scheduling overhead. Our methodology is based on three main points: 1) a novel task algorithm that analyzes an application directed acyclic graph (DAG) and aggregates tasks, 2) a fast and precise emulator to analyze the application behavior on systems with up to 1,024 cores, 3) a comprehensive sensitivity analysis of application performance and scheduling overhead breakdown. Our results show that there is an optimal task granularity between 1.2x10^4 and 10x10^4 cycles for the representative schedulers. Moreover, our analysis indicates that a suitable scheduler for exascale task-based applications should employ a best-effort local scheduler and a sophisticated remote scheduler to move tasks across worker threads.
6.	Al Ahad, Muhammed Abdullah, et al. (författare) Efficient Algorithms for Collective Operations with Notified Communication in Shared Windows 2018 Ingår i: PROCEEDINGS OF PAW-ATM18. - : IEEE. ; , s. 1-10 Konferensbidrag (refereegranskat)abstract Collective operations are commonly used in various parts of scientific applications. Especially in strong scaling scenarios collective operations can negatively impact the overall applications performance: while the load per rank here decreases with increasing core counts, time spent in e.g. barrier operations will increase logarithmically with the core count. In this article, we develop novel algorithmic solutions for collective operations such as Allreduce and Allgather(V)-by leveraging notified communication in shared windows. To this end, we have developed an extension of GASPI which enables all ranks participating in a shared window to observe the entire notified communication targeted at the window. By exploring benefits of this extension, we deliver high performing implementations of Allreduce and Allgather(V) on Intel and Cray clusters. These implementations clearly achieve 2x-4x performance improvements compared to the best performing MPI implementations for various data distributions.
7.	Andersson, Måns, et al. (författare) A Case Study on DaCe Portability & Performance for Batched Discrete Fourier Transforms 2023 Ingår i: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region. - New York, NY, USA : Association for Computing Machinery (ACM). Konferensbidrag (refereegranskat)abstract With the emergence of new computer architectures, portability and performance-portability become significant concerns for developing HPC applications. This work reports our experience and lessons learned using DaCe to create and optimize batched Discrete Fourier Transform (DFT) calculations on different single node computer systems. The batched DFT calculation is an essential component in FFT algorithms and is widely used in computer science, numerical analysis, and signal processing. We implement the batched DFT with three complex-value array data layouts and compare them with the native complex type implementation. We use DaCe, which relies on Stateful DataFlow multiGraphs (SDFG) as an intermediate representation (IR) which can be optimized through transforms and then generates code for different architectures. We present several performance results showcasing the potential of DaCe for expressing HPC applications on different computer systems.
8.	Andersson, Måns, et al. (författare) Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems 2024 Ingår i: 2024 SIAM Conference on Parallel Processing for Scientific Computing, PP 2024. - : Society for Industrial and Applied Mathematics Publications. ; , s. 39-51 Konferensbidrag (refereegranskat)abstract This paper presents the design and development of an Anderson Accelerated Preconditioned Modified Hermitian and Skew-Hermitian Splitting (AA-PMHSS) method for solving complex-symmetric linear systems with application to electromagnetics problems, such as wave scattering and eddy currents. While it has been shown that the Anderson acceleration of real linear systems is essentially equivalent to GMRES, we show here that the formulation using Anderson acceleration leads to a more performant method. We show relatively good robustness compared to existing preconditioned GMRES methods and significantly better performance due to the faster evaluation of the preconditioner. In particular, AA-PMHSS can be applied to solve problems and equations arising from complex-valued systems, such as time-harmonic eddy current simulations discretized with the Finite Element Method. We also evaluate three test systems present in previous literature. We show that the method is competitive with two types of preconditioned GMRES, which share the significant advantage of having a convergence rate that is independent of the discretization size.
9.	Andersson, Måns, et al. (författare) Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software 2023 Ingår i: PPAM 2022. Lecture Notes in Computer Science, vol 13826.. - : Springer Nature. ; , s. 333-345 Konferensbidrag (refereegranskat)abstract GROMACS is one of the most widely used HPC software packages using the Molecular Dynamics (MD) simulation technique. In this work, we quantify GROMACS parallel performance using different configurations, HPC systems, and FFT libraries (FFTW, Intel MKL FFT, and FFT PACK). We break down the cost of each GROMACS computational phase and identify non-scalable stages, such as MPI communication during the 3D FFT computation when using a large number of processes. We show that the Particle-Mesh Ewald phase and the 3D FFT calculation significantly impact the GROMACS performance. Finally, we discuss performance opportunities with a particular interest in developing GROMACS for the FFT calculations.
10.	Andersson, Måns (författare) Leveraging Intermediate Representations for High-Performance Portable Discrete Fourier Transform Frameworks : with Application to Molecular Dynamics 2023 Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract The Discrete Fourier Transform (DFT) and its improved formulations, the Fast Fourier Transforms (FFTs), are vital for scientists and engineers in a range of domains from signal processing to the solution of partial differential equations. A growing trend in Scientific Computing is heterogeneous computing, where accelerators are used instead or together with CPUs. This has led to problems for developers in unifying portability, performance, and productivity. This thesis first motivates this work by showing the importance of having efficient DFT calculations, describes the DFT algorithm and a formulation based on matrix-factorizations which has been developed to formulate FFT algorithms and express their parallelism to exploit modern computer architectures, such as accelerators.The first paper is a motivating study of the breakdown of the performance and scalability of the high-performance Molecular Dynamics code GROMACS where DFT calculations are a main performance bottleneck. In particular, the long-range interactions are solved with the Particle-Mesh Ewald algorithm which uses a three-dimensional Fast Fourier Transform. The two following papers present two approaches to leverage factorization with the help of two different frameworks using Intermediate Representation and compiler technology, for the development of fast and portable code. The second paper presents a front-end and a pipeline for code generation in a domain-specific language based on Multi-Level Intermediate Representation (MLIR) for developing Fast Fourier Transform libraries. The last paper investigates and optimizes an implementation of an important kernel within the matrix-factorization framework: the batched DFT. It is implemented with data-centric programming and a data-centric intermediate representation called Stateful Dataflow multi-graphs (SDFG). The paper evaluates strategies for complex-valued data layout for performance and portability and we show that there is a trade-off between portability and maintainability in using the native complex data type and that an SDFG-level abstraction could be beneficial for developing higher-level applications.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Resultat 1-10 av 199

Avgränsa träffmängd

Typ av publikation: tidskriftsartikel (92); konferensbidrag (88); annan publikation (7); doktorsavhandling (5); licentiatavhandling (3); forskningsöversikt (2); visa fler...; rapport (1); bokkapitel (1); visa färre...

Typ av innehåll: refereegranskat (178); övrigt vetenskapligt/konstnärligt (21)

Författare/redaktör: Markidis, Stefano (196); Laure, Erwin (55); Lapenta, G. (40); Peng, Ivy Bo (33); Podobas, Artur (26); Schlatter, Philipp (20); visa fler...; Jansson, Niclas, 198 ... (16); Chien, Wei Der (14); Innocenti, M. E. (12); Liu, Felix (10); Kestor, G. (9); Olshevsky, Vyachesla ... (9); Deca, J. (8); Iakymchuk, Roman (8); Gioiosa, R. (8); Goldman, M. V. (8); Toth, Gabor (8); Newman, D (7); Andersson, Måns (7); Fischer, Paul (7); Fredriksson, Albin (7); Gong, Jing (7); Newman, D. L. (7); Chen, Yuxi (7); Khotyaintsev, Yuri V ... (6); Vaivads, Andris (6); Goldman, M (6); Aguilar, Xavier (5); Schliephake, Michael (5); Akhmetova, Dana (5); Henri, P. (5); Araújo De Medeiros, ... (5); Wahlgren, Jacob (5); Henri, Pierre (5); Olshevsky, Viachesla ... (5); Cazzola, E. (5); Chien, Steven Wei De ... (5); Sishtla, Chaitanya P ... (5); Chien, Steven W. D. (5); Peng, I. B. (5); Jansson, Niclas (4); Semenov, V. (4); Khotyaintsev, Yu. V. (4); Svedin, Martin (4); Peplinski, Adam (4); Massaro, Daniele (4); Beck, A. (4); Rahn, Mirko (4); Herman, Pawel, 1979- (4); Gombosi, Tamas I. (4); visa färre...

Lärosäte: Kungliga Tekniska Högskolan (199); Uppsala universitet (26); Umeå universitet (1)

Språk: Engelska (198); Svenska (1)

Forskningsämne (UKÄ/SCB): Naturvetenskap (169); Teknik (42); Medicin och hälsovetenskap (3)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

Copyright © LIBRIS - Nationella bibliotekssystem
LIBRIS.kb.se

pil uppåt

Stäng

Kopiera och spara länken för att återkomma till aktuell vy