SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Markidis Stefano) "

Sökning: WFRF:(Markidis Stefano)

  • Resultat 1-50 av 204
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Afzal, Ayesha, et al. (författare)
  • Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications
  • 2023
  • Ingår i: Parallel Processing and Applied Mathematics - 14th International Conference, PPAM 2022, Revised Selected Papers. - : Springer Nature. ; , s. 155-170
  • Konferensbidrag (refereegranskat)abstract
    • This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new “phase space plot,” we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.
  •  
2.
  • Afzal, Ayesha, et al. (författare)
  • Making applications faster by asynchronous execution : Slowing down processes or relaxing MPI collectives
  • 2023
  • Ingår i: Future generations computer systems. - : Elsevier BV. - 0167-739X .- 1872-7115. ; 148, s. 472-487
  • Tidskriftsartikel (refereegranskat)abstract
    • Comprehending the performance bottlenecks at the core of the intricate hardware-software inter-actions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI communication in memory-bound parallel programs on multicore clusters and how it can be facilitated. For instance, slowing down MPI processes by deliberate injection of delays can improve performance if certain conditions are met. This leads to the counter-intuitive conclusion that noise, independent of its source, is not always detrimental but can be leveraged for performance improvements. We employ phase-space graphs as a new tool to visualize parallel program dynamics. They are useful in spotting certain patterns in parallel execution that will easily go unnoticed with traditional tracing tools. We investigate five different microbenchmarks and applications on different supercomputer platforms: an MPI-augmented STREAM Triad, two implementations of Lattice-Boltzmann fluid solvers (D3Q19 and SPEChpc D2Q37), the LULESH and HPCG proxy applications.
  •  
3.
  • Aguilar, Xavier, et al. (författare)
  • A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations
  • 2021
  • Ingår i: 2021 IEEE International Conference On Cluster Computing (CLUSTER 2021). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 692-697
  • Konferensbidrag (refereegranskat)abstract
    • We design and develop a new Particle-in-Cell (PIC) method for plasma simulations using Deep-Learning (DL) to calculate the electric field from the electron phase space. We train a Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN) to solve the two-stream instability test. We verify that the DL-based MLP PIC method produces the correct results using the two-stream instability: the DL-based PIC provides the expected growth rate of the two-stream instability. The DL-based PIC does not conserve the total energy and momentum. However, the DL-based PIC method is stable against the cold-beam instability, affecting traditional PIC methods. This work shows that integrating DL technologies into traditional computational methods is a viable approach for developing next-generation PIC algorithms.
  •  
4.
  • Akhmetova, Dana, et al. (författare)
  • Interoperability of GASPI and MPI in large scale scientific applications
  • 2018
  • Ingår i: 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017. - Cham : Springer Verlag. - 9783319780535 ; , s. 277-287
  • Konferensbidrag (refereegranskat)abstract
    • One of the main hurdles of a broad distribution of PGAS approaches is the prevalence of MPI, which as a de-facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like GASPI without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we address this challenge by providing our study and preliminary performance results regarding interoperating GASPI and MPI on the performance crucial parts of the Ludwig and iPIC3D applications. In addition, we draw a strategy for better coupling of both APIs. 
  •  
5.
  • Akhmetova, D., et al. (författare)
  • On the application task granularity and the interplay with the scheduling overhead in many-core shared memory systems
  • 2015
  • Ingår i: Proceedings - IEEE International Conference on Cluster Computing, ICCC. - : IEEE. - 9781467365987 ; , s. 428-437
  • Konferensbidrag (refereegranskat)abstract
    • Task-based programming models are considered one of the most promising programming model approaches for exascale supercomputers because of their ability to dynamically react to changing conditions and reassign work to processing elements. One question, however, remains unsolved: what should the task granularity of task-based applications be? Fine-grained tasks offer more opportunities to balance the system and generally result in higher system utilization. However, they also induce in large scheduling overhead. The impact of scheduling overhead on coarse-grained tasks is lower, but large systems may result imbalanced and underutilized. In this work we propose a methodology to analyze the interplay between application task granularity and scheduling overhead. Our methodology is based on three main points: 1) a novel task algorithm that analyzes an application directed acyclic graph (DAG) and aggregates tasks, 2) a fast and precise emulator to analyze the application behavior on systems with up to 1,024 cores, 3) a comprehensive sensitivity analysis of application performance and scheduling overhead breakdown. Our results show that there is an optimal task granularity between 1.2x10^4 and 10x10^4 cycles for the representative schedulers. Moreover, our analysis indicates that a suitable scheduler for exascale task-based applications should employ a best-effort local scheduler and a sophisticated remote scheduler to move tasks across worker threads.
  •  
6.
  • Al Ahad, Muhammed Abdullah, et al. (författare)
  • Efficient Algorithms for Collective Operations with Notified Communication in Shared Windows
  • 2018
  • Ingår i: PROCEEDINGS OF PAW-ATM18. - : IEEE. ; , s. 1-10
  • Konferensbidrag (refereegranskat)abstract
    • Collective operations are commonly used in various parts of scientific applications. Especially in strong scaling scenarios collective operations can negatively impact the overall applications performance: while the load per rank here decreases with increasing core counts, time spent in e.g. barrier operations will increase logarithmically with the core count. In this article, we develop novel algorithmic solutions for collective operations such as Allreduce and Allgather(V)-by leveraging notified communication in shared windows. To this end, we have developed an extension of GASPI which enables all ranks participating in a shared window to observe the entire notified communication targeted at the window. By exploring benefits of this extension, we deliver high performing implementations of Allreduce and Allgather(V) on Intel and Cray clusters. These implementations clearly achieve 2x-4x performance improvements compared to the best performing MPI implementations for various data distributions.
  •  
7.
  • Andersson, Måns, et al. (författare)
  • A Case Study on DaCe Portability & Performance for Batched Discrete Fourier Transforms
  • 2023
  • Ingår i: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region. - New York, NY, USA : Association for Computing Machinery (ACM).
  • Konferensbidrag (refereegranskat)abstract
    • With the emergence of new computer architectures, portability and performance-portability become significant concerns for developing HPC applications. This work reports our experience and lessons learned using DaCe to create and optimize batched Discrete Fourier Transform (DFT) calculations on different single node computer systems. The batched DFT calculation is an essential component in FFT algorithms and is widely used in computer science, numerical analysis, and signal processing. We implement the batched DFT with three complex-value array data layouts and compare them with the native complex type implementation. We use DaCe, which relies on Stateful DataFlow multiGraphs (SDFG) as an intermediate representation (IR) which can be optimized through transforms and then generates code for different architectures. We present several performance results showcasing the potential of DaCe for expressing HPC applications on different computer systems.
  •  
8.
  • Andersson, Måns, et al. (författare)
  • Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems
  • 2024
  • Ingår i: 2024 SIAM Conference on Parallel Processing for Scientific Computing, PP 2024. - : Society for Industrial and Applied Mathematics Publications. ; , s. 39-51
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents the design and development of an Anderson Accelerated Preconditioned Modified Hermitian and Skew-Hermitian Splitting (AA-PMHSS) method for solving complex-symmetric linear systems with application to electromagnetics problems, such as wave scattering and eddy currents. While it has been shown that the Anderson acceleration of real linear systems is essentially equivalent to GMRES, we show here that the formulation using Anderson acceleration leads to a more performant method. We show relatively good robustness compared to existing preconditioned GMRES methods and significantly better performance due to the faster evaluation of the preconditioner. In particular, AA-PMHSS can be applied to solve problems and equations arising from complex-valued systems, such as time-harmonic eddy current simulations discretized with the Finite Element Method. We also evaluate three test systems present in previous literature. We show that the method is competitive with two types of preconditioned GMRES, which share the significant advantage of having a convergence rate that is independent of the discretization size.
  •  
9.
  • Andersson, Måns, et al. (författare)
  • Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software
  • 2023
  • Ingår i: PPAM 2022. Lecture Notes in Computer Science, vol 13826.. - : Springer Nature. ; , s. 333-345
  • Konferensbidrag (refereegranskat)abstract
    • GROMACS is one of the most widely used HPC software packages using the Molecular Dynamics (MD) simulation technique. In this work, we quantify GROMACS parallel performance using different configurations, HPC systems, and FFT libraries (FFTW, Intel MKL FFT, and FFT PACK). We break down the cost of each GROMACS computational phase and identify non-scalable stages, such as MPI communication during the 3D FFT computation when using a large number of processes. We show that the Particle-Mesh Ewald phase and the 3D FFT calculation significantly impact the GROMACS performance. Finally, we discuss performance opportunities with a particular interest in developing GROMACS for the FFT calculations.
  •  
10.
  • Andersson, Måns (författare)
  • Leveraging Intermediate Representations for High-Performance Portable Discrete Fourier Transform Frameworks : with Application to Molecular Dynamics
  • 2023
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The Discrete Fourier Transform (DFT) and its improved formulations, the Fast Fourier Transforms (FFTs), are vital for scientists and engineers in a range of domains from signal processing to the solution of partial differential equations.  A growing trend in Scientific Computing is heterogeneous computing, where accelerators are used instead or together with CPUs. This has led to problems for developers in unifying portability, performance, and productivity. This thesis first motivates this work by showing the importance of having efficient DFT calculations, describes the DFT algorithm and a formulation based on matrix-factorizations which has been developed to formulate FFT algorithms and express their parallelism to exploit modern computer architectures, such as accelerators.The first paper is a motivating study of the breakdown of the performance and scalability of the high-performance Molecular Dynamics code GROMACS where DFT calculations are a main performance bottleneck. In particular, the long-range interactions are solved with the Particle-Mesh Ewald algorithm which uses a three-dimensional Fast Fourier Transform. The two following papers present two approaches to leverage factorization with the help of two different frameworks using Intermediate Representation and compiler technology, for the development of fast and portable code. The second paper presents a front-end and a pipeline for code generation in a domain-specific language based on Multi-Level Intermediate Representation (MLIR) for developing Fast Fourier Transform libraries. The last paper investigates and optimizes an implementation of an important kernel within the matrix-factorization framework: the batched DFT. It is implemented with data-centric programming and a data-centric intermediate representation called Stateful Dataflow multi-graphs (SDFG). The paper evaluates strategies for complex-valued data layout for performance and portability and we show that there is a trade-off between portability and maintainability in using the native complex data type and that an SDFG-level abstraction could be beneficial for developing higher-level applications.
  •  
11.
  • Araújo De Medeiros, Daniel (författare)
  • Emerging Paradigms in the Convergence of Cloud and High-Performance Computing
  • 2023
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Traditional HPC scientific workloads are tightly coupled, while emerging scientific workflows exhibit even more complex patterns, consisting of multiple characteristically different stages that may be IO-intensive, compute-intensive, or memory-intensive. New high-performance computer systems are evolving to adapt to these new requirements and are motivated by the need for performance and efficiency in resource usage. On the other hand, cloud workloads are loosely coupled, and their systems have matured technologies under different constraints from HPC.In this thesis, the use of cloud technologies designed for loosely coupled dynamic and elastic workloads is explored, repurposed, and examined in the landscape of HPC in three major parts. The first part deals with the deployment of HPC workloads in cloud-native environments through the use of containers and analyses the feasibility and trade-offs of elastic scaling. The second part relates to the use of workflow management systems in HPC workflows; in particular, a molecular docking workflow executed through Airflow is discussed. Finally, object storage systems, a cost-effective and scalable solution widely used in the cloud, and their usage in HPC applications through MPI I/O are discussed in the third part of this thesis. 
  •  
12.
  • Araújo De Medeiros, Daniel, et al. (författare)
  • LibCOS : Enabling Converged HPC and Cloud Data Stores with MPI
  • 2023
  • Ingår i: Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2023. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 106-116
  • Konferensbidrag (refereegranskat)abstract
    • Recently, federated HPC and cloud resources are becoming increasingly strategic for providing diversified and geographically available computing resources. However, accessing data stores across HPC and cloud storage systems is challenging. Many cloud providers use object storage systems to support their clients in storing and retrieving data over the internet. One popular method is REST APIs atop the HTTP protocol, with Amazon's S3 APIs being supported by most vendors. In contrast, HPC systems are contained within their networks and tend to use parallel file systems with POSIX-like interfaces. This work addresses the challenge of diverse data stores on HPC and cloud systems by providing native object storage support through the unified MPI I/O interface in HPC applications. In particular, we provide a prototype library called LibCOS that transparently enables MPI applications running on HPC systems to access object storage on remote cloud systems. We evaluated LibCOS on a Ceph object storage system and a traditional HPC system. In addition, we conducted performance characterization of core S3 operations that enable individual and collective MPI I/O. Our evaluation in HACC, IOR, and BigSort shows that enabling diverse data stores on HPC and Cloud storage is feasible and can be transparently achieved through the widely adopted MPI I/O. Also, we show that a native object storage system like Ceph could improve the scalability of I/O operations in parallel applications.
  •  
13.
  • Atzori, Marco, et al. (författare)
  • In-situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst
  • 2021
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • In-situ visualization on HPC systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We design and develop in-situ visualization with Paraview Catalyst in Nek5000, a massively parallel Fortran and C code for computational fluid dynamics applications. We perform strong scalability tests up to 2,048 cores on KTH's Beskow Cray XC40 supercomputer and assess in-situ visualization's impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in-situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ~21\% on 2,048 cores (the relative efficiency of Nek5000 without in-situ operations is ~99\%). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in-situ processing time between rank 0 and all other ranks. Better scaling and load-balancing in the parallel image composition would considerably improve the performance and scalability of Nek5000 with in-situ capabilities in large-scale simulation.
  •  
14.
  • Atzori, Marco, 1992-, et al. (författare)
  • In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst
  • 2022
  • Ingår i: Journal of Supercomputing. - : Springer. - 0920-8542 .- 1573-0484. ; 78:3, s. 3605-3620
  • Tidskriftsartikel (refereegranskat)abstract
    • In situ visualization on high-performance computing systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We develop an in situ adaptor for Paraview Catalyst and Nek5000, a massively parallel Fortran and C code for computational fluid dynamics. We perform a strong scalability test up to 2048 cores on KTH’s Beskow Cray XC40 supercomputer and assess in situ visualization’s impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ≈ 21 % on 2048 cores (the relative efficiency of Nek5000 without in situ operations is ≈ 99 %). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses the Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in situ processing time between rank 0 and all other ranks. In our case, better scaling and load-balancing in the parallel image composition would considerably improve the performance of Nek5000 with in situ capabilities. In general, the result of this study highlights the technical challenges posed by the integration of high-performance simulation codes and data-analysis libraries and their practical use in complex cases, even when efficient algorithms already exist for a certain application scenario.
  •  
15.
  • Beck, A., et al. (författare)
  • Multi-level multi-domain algorithm implementation for two-dimensional multiscale particle in cell simulations
  • 2014
  • Ingår i: Journal of Computational Physics. - : Elsevier BV. - 0021-9991 .- 1090-2716. ; 271, s. 430-443
  • Tidskriftsartikel (refereegranskat)abstract
    • There are a number of modeling challenges posed by space weather simulations. Most of them arise from the multiscale and multiphysics aspects of the problem. The multiple scales dramatically increase the requirements, in terms of computational resources, because of the need of performing large scale simulations with the proper small-scales resolution. Lately, several suggestions have been made to overcome this difficulty by using various refinement methods which consist in splitting the domain into regions of different resolutions separated by well defined interfaces. The multiphysics issues are generally treated in a similar way: interfaces separate the regions where different equations are solved. This paper presents an innovative approach based on the coexistence of several levels of description, which differ by their resolutions or, potentially, by their physics. Instead of interacting through interfaces, these levels are entirely simulated and are interlocked over the complete extension of the overlap area. This scheme has been applied to a parallelized, two-dimensional, Implicit Moment Method Particle in Cell code in order to investigate its multiscale description capabilities. Simulations of magnetic reconnection and plasma expansion in vacuum are presented and possible implementation options for this scheme on very large systems are also discussed.
  •  
16.
  • Bragone, Federica (författare)
  • Physics-Informed Neural Networks and Machine Learning Algorithms for Sustainability Advancements in Power Systems Components
  • 2023
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • A power system consists of several critical components necessary for providing electricity from the producers to the consumers. Monitoring the lifetime of power system components becomes vital since they are subjected to electrical currents and high temperatures, which affect their ageing. Estimating the component's ageing rate close to the end of its lifetime is the motivation behind our project. Knowing the ageing rate and life expectancy, we can possibly better utilize and re-utilize existing power components and their parts. In return, we could achieve better material utilization, reduce costs, and improve sustainability designs, contributing to the circular industry development of power system components. Monitoring the thermal distribution and the degradation of the insulation materials informs the estimation of the components' health state. Moreover, further study of the employed paper material of their insulation system can lead to a deeper understanding of its thermal characterization and a possible consequent improvement.Our study aims to create a model that couples the physical equations that govern the deterioration of the insulation systems of power components with modern machine learning algorithms. As the data is limited and complex in the field of components' ageing, Physics-Informed Neural Networks (PINNs) can help to overcome the problem. PINNs exploit the prior knowledge stored in partial differential equations (PDEs) or ordinary differential equations (ODEs) modelling the involved systems. This prior knowledge becomes a regularization agent, constraining the space of available solutions and consequently reducing the training data needed. This thesis is divided into two parts: the first focuses on the insulation system of power transformers, and the second is an exploration of the paper material concentrating on cellulose nanofibrils (CNFs) classification. The first part includes modelling the thermal distribution and the degradation of the cellulose inside the power transformer. The deterioration of one of the two systems can lead to severe consequences for the other. Both abilities of PINNs to approximate the solution of the equations and to find the parameters that best describe the data are explored. The second part could be conceived as a standalone; however, it leads to a further understanding of the paper material. Several CNFs materials and concentrations are presented, and this thesis proposes a basic unsupervised learning using clustering algorithms like k-means and Gaussian Mixture Models (GMMs) for their classification. 
  •  
17.
  • Bragone, Federica, et al. (författare)
  • Unsupervised Learning Analysis of Flow-Induced Birefringence in Nanocellulose: Differentiating Materials and Concentrations
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Cellulose nanofibrils (CNFs) can be used as building blocks for future sustainable materials including strong and stiff filaments. The goal of this paper is to introduce a data analysis of flow-induced birefringence experiments by means of unsupervised learning techniques. By reducing the dimensionality of the data with Principal Component Analysis (PCA) we are able to exploit information for the different cellulose materials at several concentrations and compare them to each other. Our approach aims at classifying the CNF materials at different concentrations by applying unsupervised machine learning algorithms, like k-means and Gaussian Mixture Models (GMMs). Finally, we analyze the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the first principal component, detecting seasonality in lower concentrations. The focus is given to the initial relaxation of birefringence after the flow is stopped to have a better understanding of the Brownian dynamics for the given materials and concentrations.Our method can be used to distinguish the different materials at specific concentrations and could help to identify possible advantages and drawbacks of one material over the other. 
  •  
18.
  • Brown, Nick, et al. (författare)
  • Utilising urgent computing to tackle the spread of mosquito-borne diseases
  • 2021
  • Ingår i: Proceedings of Urgenthpc 2021. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 36-44
  • Konferensbidrag (refereegranskat)abstract
    • It is estimated that around 80% of the world's population live in areas susceptible to at-least one major vector borne disease, and approximately 20% of global communicable diseases are spread by mosquitoes. Furthermore, the outbreaks of such diseases are becoming more common and widespread, with much of this driven in recent years by socio-demographic and climatic factors. These trends are causing significant worry to global health organisations, including the CDC and WHO, and-so an important question is the role that technology can play in addressing them. In this work we describe the integration of an epidemiology model, which simulates the spread of mosquito-borne diseases, with the VESTEC urgent computing ecosystem. The intention of this work is to empower human health professionals to exploit this model and more easily explore the progression of mosquito-borne diseases. Traditionally in the domain of the few research scientists, by leveraging state of the art visualisation and analytics techniques, all supported by running the computational workloads on HPC machines in a seamless fashion, we demonstrate the significant advantages that such an integration can provide. Furthermore we demonstrate the benefits of using an ecosystem such as VESTEC, which provides a framework for urgent computing, in supporting the easy adoption of these technologies by the epidemiologists and disaster response professionals more widely.
  •  
19.
  • Brown, Nick, et al. (författare)
  • Workflows to Driving High-Performance Interactive Supercomputing for Urgent Decision Making
  • 2022
  • Ingår i: High Performance Computing, Isc High Performance 2022 International Workshops. - Cham : Springer Nature. ; , s. 233-244
  • Konferensbidrag (refereegranskat)abstract
    • Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; namely the users, the simulation codes, and external data sources, together in a structured and accessible manner. In this paper we explore the role of workflows from both the perspective of marshalling and control of urgent workloads, and at the individual HPC machine level. Ultimately requiring two workflow systems, by using a space weather prediction urgent use-cases, we explore the benefit that these two workflow systems provide especially when one exploits the flexibility enabled by them interoperating.
  •  
20.
  •  
21.
  • Cazzola, E., et al. (författare)
  • On the electron agyrotropy during rapid asymmetric magnetic island coalescence in presence of a guide field
  • 2016
  • Ingår i: Geophysical Research Letters. - : Blackwell Publishing. - 0094-8276 .- 1944-8007. ; 43:15, s. 7840-7849
  • Tidskriftsartikel (refereegranskat)abstract
    • We present an analysis of the properties of the electron velocity distribution during island coalescence in asymmetric reconnection with and without guide field. In a previous study, three main domains were identified, in the case without guide field, as X, D, and M regions featuring different reconnection evolutions. These regions are also identified here in the case with guide field. We study the departure from isotropic and gyrotropic behavior by means of different robust detection algorithms proposed in the literature. While in the case without guide field these metrics show an overall agreement, when the guide field is present, a discrepancy in the agyrotropy within some relevant regions is observed, such as at the separatrices and inside magnetic islands. Moreover, in light of the new observations from the Multiscale MagnetoSpheric mission, an analysis of the electron velocity phase-space in these domains is presented.
  •  
22.
  • Cazzola, E., et al. (författare)
  • On the electron dynamics during island coalescence in asymmetric magnetic reconnection
  • 2015
  • Ingår i: Physics of Plasmas. - : American Institute of Physics (AIP). - 1070-664X .- 1089-7674. ; 22:9
  • Tidskriftsartikel (refereegranskat)abstract
    • We present an analysis of the electron dynamics during rapid island merging in asymmetric magnetic reconnection. We consider a doubly periodic system with two asymmetric transitions. The upper layer is an asymmetric Harris sheet of finite width perturbed initially to promote a single reconnection site. The lower layer is a tangential discontinuity that promotes the formation of many X-points, separated by rapidly merging islands. Across both layers, the magnetic field and the density have a strong jump, but the pressure is held constant. Our analysis focuses on the consequences of electron energization during island coalescence. We focus first on the parallel and perpendicular components of the electron temperature to establish the presence of possible anisotropies and non-gyrotropies. Thanks to the direct comparison between the two different layers simulated, we can distinguish three main types of behavior characteristic of three different regions of interest. The first type represents the regions where traditional asymmetric reconnections take place without involving island merging. The second type of regions instead shows reconnection events between two merging islands. Finally, the third regions identify the regions between two diverging island and where typical signature of reconnection is not observed. Electrons in these latter regions additionally show a flat-top distribution resulting from the saturation of a two-stream instability generated by the two interacting electron beams from the two nearest reconnection points. Finally, the analysis of agyrotropy shows the presence of a distinct double structure laying all over the lower side facing the higher magnetic field region. This structure becomes quadrupolar in the proximity of the regions of the third type. The distinguishing features found for the three types of regions investigated provide clear indicators to the recently launched Magnetospheric Multiscale NASA mission for investigating magnetopause reconnection involving multiple islands.
  •  
23.
  • Cazzola, E., et al. (författare)
  • On the ions acceleration via collisionless magnetic reconnection in laboratory plasmas
  • 2016
  • Ingår i: Physics of Plasmas. - : American Institute of Physics (AIP). - 1070-664X .- 1089-7674. ; 23:11
  • Tidskriftsartikel (refereegranskat)abstract
    • This work presents an analysis of the ion outflow from magnetic reconnection throughout fully kinetic simulations with typical laboratory plasma values. A symmetric initial configuration for the density and magnetic field is considered across the current sheet. After analyzing the behavior of a set of nine simulations with a reduced mass ratio and with a permuted value of three initial electron temperatures and magnetic field intensity, the best ion acceleration scenario is further studied with a realistic mass ratio in terms of the ion dynamics and energy budget. Interestingly, a series of shock wave structures are observed in the outflow, resembling the shock discontinuities found in recent magnetohydrodynamic simulations. An analysis of the ion outflow at several distances from the reconnection point is presented, in light of possible laboratory applications. The analysis suggests that magnetic reconnection could be used as a tool for plasma acceleration, with applications ranging from electric propulsion to production of ion thermal beams.
  •  
24.
  • Chen, Yuxi, et al. (författare)
  • Global Three-Dimensional Simulation of Earth's Dayside Reconnection Using a Two-Way Coupled Magnetohydrodynamics With Embedded Particle-in-Cell Model : Initial Results
  • 2017
  • Ingår i: Journal of Geophysical Research - Space Physics. - : AMER GEOPHYSICAL UNION. - 2169-9380 .- 2169-9402. ; 122:10, s. 10318-10335
  • Tidskriftsartikel (refereegranskat)abstract
    • We perform a three-dimensional (3-D) global simulation of Earth's magnetosphere with kinetic reconnection physics to study the flux transfer events (FTEs) and dayside magnetic reconnection with the recently developed magnetohydrodynamics with embedded particle-in-cell model. During the 1 h long simulation, the FTEs are generated quasi-periodically near the subsolar point and move toward the poles. We find that the magnetic field signature of FTEs at their early formation stage is similar to a "crater FTE," which is characterized by a magnetic field strength dip at the FTE center. After the FTE core field grows to a significant value, it becomes an FTE with typical flux rope structure. When an FTE moves across the cusp, reconnection between the FTE field lines and the cusp field lines can dissipate the FTE. The kinetic features are also captured by our model. A crescent electron phase space distribution is found near the reconnection site. A similar distribution is found for ions at the location where the Larmor electric field appears. The lower hybrid drift instability (LHDI) along the current sheet direction also arises at the interface of magnetosheath and magnetosphere plasma. The LHDI electric field is about 8 mV/m, and its dominant wavelength relative to the electron gyroradius agrees reasonably with Magnetospheric Multiscale (MMS) observations.
  •  
25.
  • Chen, Yuxi, et al. (författare)
  • Magnetohydrodynamic With Embedded Particle-In-Cell Simulation of the Geospace Environment Modeling Dayside Kinetic Processes Challenge Event
  • 2020
  • Ingår i: Earth and Space Science. - : American Geophysical Union (AGU). - 2333-5084. ; 7:11
  • Tidskriftsartikel (refereegranskat)abstract
    • We use the magnetohydrodynamic (MHD) with embedded particle-in-cell model (MHD-EPIC) to study the Geospace Environment Modeling (GEM) dayside kinetic processes challenge event at 01:50-03:00 UT on 18 November 2015, when the magnetosphere was driven by a steady southward interplanetary magnetic field (IMF). In the MHD-EPIC simulation, the dayside magnetopause is covered by a PIC code so that the dayside reconnection is properly handled. We compare the magnetic fields and the plasma profiles of the magnetopause crossing with the MMS3 spacecraft observations. Most variables match the observations well in the magnetosphere, in the magnetosheath, and also during the current sheet crossing. The MHD-EPIC simulation produces flux ropes, and we demonstrate that some magnetic field and plasma features observed by the MMS3 spacecraft can be reproduced by a flux rope crossing event. We use an algorithm to automatically identify the reconnection sites from the simulation results. It turns out that there are usually multiple X-lines at the magnetopause. By tracing the locations of the X-lines, we find that the typical moving speed of the X-line endpoints is about 70 km/s, which is higher than but still comparable with the ground-based observations.
  •  
26.
  • Chen, Y., et al. (författare)
  • Studying Dawn-Dusk Asymmetries of Mercury's Magnetotail Using MHD-EPIC Simulations
  • 2019
  • Ingår i: Journal of Geophysical Research - Space Physics. - : Blackwell Publishing Ltd. - 2169-9380 .- 2169-9402. ; 124:11, s. 8954-8973
  • Tidskriftsartikel (refereegranskat)abstract
    • MESSENGER has observed a lot of dawn-dusk asymmetries in Mercury's magnetotail, such as the asymmetries of the cross-tail current sheet thickness and the occurrence of flux ropes, dipolarization events, and energetic electron injections. In order to obtain a global pictures of Mercury's magnetotail dynamics and the relationship between these asymmetries, we perform global simulations with the magnetohydrodynamics with embedded particle-in-cell (MHD-EPIC) model, where Mercury's magnetotail region is covered by a PIC code. Our simulations show that the dawnside current sheet is thicker, the plasma density is larger, and the electron pressure is higher than the duskside. Under a strong interplanetary magnetic field driver, the simulated reconnection sites prefer the dawnside. We also found the dipolarization events and the planetward electron jets are moving dawnward while they are moving toward the planet, so that almost all dipolarization events and high-speed plasma flows concentrate in the dawn sector. The simulation results are consistent with MESSENGER observations.
  •  
27.
  • Chien, Steven Wei Der, et al. (författare)
  • An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems
  • 2018
  • Ingår i: Proceedings of the 5th International Conference on Exascale Applications and Software. - : The University of Edinburgh. - 9780992661533 ; , s. 34-
  • Konferensbidrag (refereegranskat)abstract
    • Computational intensive applications such as pattern recognition, and natural language processing, are increasingly popular on HPC systems. Many of these applications use deep-learning, a branch of machine learning, to determine the weights of artificial neural network nodes by minimizing a loss function. Such applications depend heavily on dense matrix multiplications, also called tensorial operations. The use of Graphics Processing Unit (GPU) has considerably speeded up deep-learning computations, leading to a Renaissance of the artificial neural network. Recently, the NVIDIA Volta GPU and the Google Tensor Processing Unit (TPU) have been specially designed to support deep-learning workloads. New programming models have also emerged for convenient expression of tensorial operations and deep-learning computational paradigms. An example of such new programming frameworks is TensorFlow, an open-source deep-learning library released by Google in 2015. TensorFlow expresses algorithms as a computational graph where nodes represent operations and edges between nodes represent data flow. Multi-dimensional data such as vectors and matrices which flows between operations are called Tensors. For this reason, computation problems need to be expressed as a computational graph. In particular, TensorFlow supports distributed computation with flexible assignment of operation and data to devices such as GPU and CPU on different computing nodes. Computation on devices are based on optimized kernels such as MKL, Eigen and cuBLAS. Inter-node communication can be through TCP and RDMA. This work attempts to evaluate the usability and expressiveness of the TensorFlow programming model for traditional HPC problems. As an illustration, we prototyped a distributed block matrix multiplication for large dense matrices which cannot be co-located on a single device and a Conjugate Gradient (CG) solver. We evaluate the difficulty of expressing traditional HPC algorithms using computational graphs and study the scalability of distributed TensorFlow on accelerated systems. Our preliminary result with distributed matrix multiplication shows that distributed computation on TensorFlow is extremely scalable. This study provides an initial investigation of new emerging programming models for HPC.
  •  
28.
  • Chien, Steven W. D., et al. (författare)
  • Characterizing Deep-Learning I/O Workloads in TensorFlow
  • 2018
  • Ingår i: Proceedings of PDSW-DISCS 2018: 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 54-63
  • Konferensbidrag (refereegranskat)abstract
    • The performance of Deep-Learning (DL) computing frameworks rely on the rformance of data ingestion and checkpointing. In fact, during the aining, a considerable high number of relatively small files are first aded and pre-processed on CPUs and then moved to accelerator for mputation. In addition, checkpointing and restart operations are rried out to allow DL computing frameworks to restart quickly from a eckpoint. Because of this, I/O affects the performance of DL plications. this work, we characterize the I/O performance and scaling of nsorFlow, an open-source programming framework developed by Google and ecifically designed for solving DL problems. To measure TensorFlow I/O rformance, we first design a micro-benchmark to measure TensorFlow ads, and then use a TensorFlow mini-application based on AlexNet to asure the performance cost of I/O and checkpointing in TensorFlow. To prove the checkpointing performance, we design and implement a burst ffer. find that increasing the number of threads increases TensorFlow ndwidth by a maximum of 2.3 x and 7.8 x on our benchmark environments. e use of the tensorFlow prefetcher results in a complete overlap of mputation on accelerator and input pipeline on CPU eliminating the fective cost of I/O on the overall performance. The use of a burst ffer to checkpoint to a fast small capacity storage and copy ynchronously the checkpoints to a slower large capacity storage sulted in a performance improvement of 2.6x with respect to eckpointing directly to slower storage on our benchmark environment.
  •  
29.
  • Chien, Steven W.D., et al. (författare)
  • Improving Cloud Storage Network Bandwidth Utilization of Scientific Applications
  • 2023
  • Ingår i: Proceedings of the 7th Asia-Pacific Workshop on Networking, APNET 2023. - : Association for Computing Machinery (ACM). ; , s. 172-173
  • Konferensbidrag (refereegranskat)abstract
    • Cloud providers began to provide managed services to attract scientific applications, which have been traditionally executed on supercomputers. One example is AWS FSx for Lustre, a fully managed parallel file system (PFS) released in 2018. However, due to the nature of scientific applications, the frontend storage network bandwidth is left completely idle for the majority of its lifetime. Furthermore, the pricing model does not match the scalability requirement. We propose iFast, a novel host-side caching mechanism for scientific applications that improves storage bandwidth utilization and end-to-end application performance: by overlapping compute and data writeback through inexpensive local storage. iFast supports the Massage Passing Interface (MPI) library that is widely used by scientific applications and is implemented as a preloaded library. It requires no change to applications, the MPI library, or support from cloud operators. We demonstrate how iFast can accelerate the end-to-end time of a representative scientific application Neko, by 13-40%.
  •  
30.
  • Chien, Steven Wei Der, et al. (författare)
  • TensorFlow Doing HPC An Evaluation of TensorFlow Performance in HPC Applications
  • 2019
  • Ingår i: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 509-518
  • Konferensbidrag (refereegranskat)abstract
    • TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our Tensor-Flow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.
  •  
31.
  • Chien, Steven W. D., et al. (författare)
  • Tf-Darshan : Understanding Fine-grained I/O Performance in Machine Learning Workloads
  • 2020
  • Ingår i: Proceedings - IEEE International Conference on Cluster Computing, ICCC. - : Institute of Electrical and Electronics Engineers Inc.. ; , s. 359-370
  • Konferensbidrag (refereegranskat)abstract
    • Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and allows instrumentation of TensorFlow operations. However, the current profiler only enables analysis at the TensorFlow platform level and does not provide system-level information. In this paper, we extend TensorFlow Profiler and introduce tf-Darshan, both a profiler and tracer, that performs instrumentation through Darshan. We use the same Darshan shared instrumentation library and implement a runtime attachment without using a system preload. We can extract Darshan profiling data structures during TensorFlow execution to enable analysis through the TensorFlow profiler. We visualize the performance results through TensorBoard, the web-based TensorFlow visualization tool. At the same time, we do not alter Darshan's existing implementation. We illustrate tf-Darshan by performing two case studies on ImageNet image and Malware classification. We show that by guiding optimization using data from tf-Darshan, we increase POSIX I/O bandwidth by up to 19% by selecting data for staging on fast tier storage. We also show that Darshan has the potential of being used as a runtime library for profiling and providing information for future optimization.
  •  
32.
  • Chien, Wei Der, et al. (författare)
  • Exploring Scientific Application Performance Using Large Scale Object Storage
  • 2018
  • Ingår i: High Performance Computing. - Cham : Springer International Publishing. ; , s. 117-130
  • Konferensbidrag (refereegranskat)abstract
    • One of the major performance and scalability bottlenecks in large scientific applications is parallel reading and writing to supercomputer I/O systems. The usage of parallel file systems and consistency requirements of POSIX, that all the traditional HPC parallel I/O interfaces adhere to, pose limitations to the scalability of scientific applications. Object storage is a widely used storage technology in cloud computing and is more frequently proposed for HPC workload to address and improve the current scalability and performance of I/O in scientific applications. While object storage is a promising technology, it is still unclear how scientific applications will use object storage and what the main performance benefits will be. This work addresses these questions, by emulating an object storage used by a traditional scientific application and evaluating potential performance benefits. We show that scientific applications can benefit from the usage of object storage on large scales.
  •  
33.
  • Chien, Wei Der (författare)
  • Large-scale I/O Models for Traditional and Emerging HPC Workloads on Next-Generation HPC Storage Systems
  • 2022
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The ability to create value from large-scale data is now an essential part of research and driving technological development everywhere from everyday technology to life-saving medical applications. In almost all scientific fields that require handling large-scale data, such as weather forecast, physics simulation, and computational biology, supercomputers (HPC systems) have emerged as an essential tool for implementing and solving problems. While the computational speed of supercomputers has grown rapidly, the methods for handling large-scale data I/O (reading and writing data) at a high pace have not evolved as much. POSIX-based Parallel File Systems (PFS) and programming interfaces such as MPI-IO remain the norm of I/O workflow in HPC. At the same time, new applications, such as big data, and Machine Learning (ML) have emerged as a new class of widely deployed HPC applications. While all these applications require the ingestion and output of a large amount of data, they have very different usage patterns, giving a different set of requirements. Apart from that, new I/O technologies on HPC such as fast burst buffers and object stores are increasingly available. It currently lacks a novel method to fully exploit them in HPC applications.In this thesis, we evaluate modern storage infrastructures, the I/O programming model landscape, and characterize how HPC applications can take advantage of these I/O models to tackle bottlenecks. In particular, we look into object storage, a promising technology that has the potential of replacing existing I/O subsystems for large-scale data storage. Firstly, we mimic the object storage semantic and create an emulator on top of existing parallel file systems to project the performance improvement that can be expected on a real object store for HPC applications. Secondly, we develop a programming model that supports numerical data storage for scientific applications. The set of interfaces captures the need from parallel applications that use domain decomposition. Finally, we evaluate how the interfaces can be used by scientific applications. More specifically, we show for the first time, how our programming interface can be used to leverage Seagate's Motr object-store. Aside from that, we also showcase how this approach can enable the use of modern node-local hierarchical storage architectures.Aside from advancement on I/O infrastructure, the wide deployment of modern ML workloads introduces unique challenges to HPC and its I/O systems. We first understand the challenges by focusing on a state-of-the-art Deep-Learning (DL) framework called TensorFlow, which is widely used in cloud platforms. We evaluate how data ingestion in TensorFlow differs from traditional HPC applications to understand the challenges. While TensorFlow focuses on DL applications, there are alternative learning methods that pose different sets of challenges. To complement our understanding, we also propose a framework called StreamBrain, which implements a brain-like learning algorithm called the Bayesian Confidence Propagation Neural Network (BCPNN). We find that these alternative methods can potentially impose an even bigger challenge to conventional learning (such as those present in TensorFlow). To explain the I/O behavior of DL training, we perform a series of measurements and profiling on TensorFlow using monitoring tools. However, we find that existing methods are insufficient to derive a fine-grained I/O characteristic on these modern frameworks due to a lack of application-level coupling. To tackle this challenge, we propose a system called tf-Darshan that combines traditional HPC I/O monitoring and an ML workload profiling to enable a fine-grained I/O performance evaluation. Our findings show that the lack of co-design between modern frameworks and the HPC I/O subsystem leads to inefficient I/O (e.g. very small and random reads). They also fail to coordinate I/O requests in an efficient way in a parallel environment. With tf-Darshan, we showcase how knowledge derived from such measurements can be used to explain and improve I/O performance. Some examples include selective data staging to fast storage, and future auto-tuning on I/O parameters.The methods proposed in this thesis are evaluated on a variety of HPC systems, workstations, and prototype systems with different I/O and compute architectures. Different HPC applications are used to validate the approaches. The experiments show that our approaches can enable a good characterization of I/O performance, and our proposed programming model illustrates how applications can use next-generation storage systems.
  •  
34.
  • Chien, Wei Der, et al. (författare)
  • NoaSci : A Numerical Object Array Library for I/O of Scientific Applications on Object Storage
  • 2022
  • Ingår i: <em>2022</em> 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • The strong consistency and stateful workflow are seen as the major factors for limiting parallel I/O performance because of the need for locking and state management. While the POSIX-based I/O model dominates modern HPC storage infrastructure, emerging object storage technology can potentially improve I/O performance by eliminating these bottlenecks.Despite a wide deployment on the cloud, its adoption in HPCremains low. We argue one reason is the lack of a suitable programming interface for parallel I/O in scientific applications. In this work, we introduce NoaSci, a Numerical Object Arraylibrary for scientific applications. NoaSci supports different data formats (e.g. HDF5, binary), and focuses on supporting node-local burst buffers and object stores. We demonstrate for the first time how scientific applications can perform parallel I/Oon Seagate’s Motr object store through NoaSci. We evaluate NoaSci’s preliminary performance using the iPIC3D spaceweather application and position against existing I/O methods.
  •  
35.
  • Chien, Wei Der, et al. (författare)
  • Performance evaluation of advanced features in CUDA unified memory
  • 2019
  • Ingår i: Proceedings of MCHPC 2019. - : Institute of Electrical and Electronics Engineers Inc.. - 9781728160078 ; , s. 50-57
  • Konferensbidrag (refereegranskat)abstract
    • CUDA Unified Memory improves the GPU pro- grammability and also enables GPU memory oversubscription. Recently, two advanced memory features, memory advises and asynchronous prefetch, have been introduced. In this work, we evaluate the new features on two platforms that feature different CPUs, GPUs, and interconnects. We derive a benchmark suite for the experiments and stress the memory system to evaluate both in-memory and oversubscription performance. The results show that memory advises on the Intel-Volta/Pascal- PCIe platform bring negligible improvement for in-memory exe- cutions. However, when GPU memory is oversubscribed by about 50%, using memory advises results in up to 25% performance improvement compared to the basic CUDA Unified Memory. In contrast, the Power9-Volta-NVLink platform can substantially benefit from memory advises, achieving up to 34% performance gain for in-memory executions. However, when GPU memory is oversubscribed on this platform, using memory advises increases GPU page faults and results in considerable performance loss. The CUDA prefetch also shows different performance impact on the two platforms. It improves performance by up to 50% on the Intel-Volta/Pascal-PCI-E platform but brings little benefit to the Power9-Volta-NVLink platform.
  •  
36.
  • Chien, Wei Der, et al. (författare)
  • Posit NPB : Assessing the precision improvement in HPC scientific applications
  • 2020
  • Ingår i: Lecture Notes in Computer Science. - Cham : Springer. ; , s. 301-310
  • Konferensbidrag (refereegranskat)abstract
    • Floating-point operations can significantly impact the accuracy and performance of scientific applications on large-scale parallel systems. Recently, an emerging floating-point format called Posit has attracted attention as an alternative to the standard IEEE floating-point formats because it could enable higher precision than IEEE formats using the same number of bits. In this work, we first explored the feasibility of Posit encoding in representative HPC applications by providing a 32-bit Posit NAS Parallel Benchmark (NPB) suite. Then, we evaluate the accuracy improvement in different HPC kernels compared to the IEEE 754 format. Our results indicate that using Posit encoding achieves optimized precision, ranging from 0.6 to 1.4 decimal digit, for all tested kernels and proxy-applications. Also, we quantified the overhead of the current software implementation of Posit encoding as 4×–19× that of IEEE 754 hardware implementation. Our study highlights the potential of hardware implementations of Posit to benefit a broad range of HPC applications. 
  •  
37.
  • Chien, Wei Der, et al. (författare)
  • SputniPIC : An implicit particle-in-cell code for multi-GPU systems
  • 2020
  • Ingår i: Proceedings - Symposium on Computer Architecture and High Performance Computing. - : IEEE Computer Society. ; , s. 149-156
  • Konferensbidrag (refereegranskat)abstract
    • Large-scale simulations of plasmas are essential for advancing our understanding of fusion devices, space, and astrophysical systems. Particle-in-Cell (PIC) codes have demonstrated their success in simulating numerous plasma phenomena on HPC systems. Today, flagship supercomputers feature multiple GPUs per compute node to achieve unprecedented computing power at high power efficiency. PIC codes require new algorithm design and implementation for exploiting such accelerated platforms. In this work, we design and optimize a three-dimensional implicit PIC code, called sputniPIC, to run on a general multi-GPU compute node. We introduce a particle decomposition data layout, in contrast to domain decomposition on CPU-based implementations, to use particle batches for overlapping communication and computation on GPUs. sputniPIC also natively supports different precision representations to achieve speed up on hardware that supports reduced precision. We validate sputniPIC through the well-known GEM challenge and provide performance analysis. We test sputniPIC on three multi-GPU platforms and report a 200-800x performance improvement with respect to the sputniPIC CPU OpenMP version performance. We show that reduced precision could further improve performance by 45% to 80% on the three platforms. Because of these performance improvements, on a single node with multiple GPUs, sputniPIC enables large-scale three-dimensional PIC simulations that were only possible using clusters.
  •  
38.
  • Coti, Camille, et al. (författare)
  • Integration of Modern HPC Performance Tools in Vlasiator for Exascale Analysis and Optimization
  • 2024
  • Ingår i: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 27-31, San Francisco, California, USA.. - 9798350364606
  • Konferensbidrag (refereegranskat)abstract
    • Key to the success of developing high-performance applications for present and future heterogeneous supercomputers will be the systematic use of measurement and analysis to understand factors that affect delivered performance in the context of parallelization strategy, heterogeneous programming methodology, data partitioning, and scalable algorithm design. The evolving complexity of future exascale platforms makes it unrealistic for application teams to implement their own tools. Similarly, it is naive to expect available robust performance tools to work effectively out-of-the-box, without integration and specialization in respect to application-specific requirements and knowledge. Vlasiator is a powerful massively parallel code for accurate magnetospheric and solar wind plasma simulations. It is being ported to the LUMI HPC system for advanced modeling of the Earth’s magnetosphere and surrounding solar wind. Building on a preexisting Vlasiator performance API called Phiprof, our work significantly advances the performance measurement and analysis capabilities offered to Vlasiator using the TAU, APEX, and IPM tools. The results presented show in-depth characterization of node-level CPU/GPU and MPI communications performance. We highlight the integration of high-level Phiprof events with detailed performance data to expose opportunities for performance tuning. Our results provide important insights to optimize Vlasiator for the upcoming Exascale machines.
  •  
39.
  • Daldorff, Lars K. S., et al. (författare)
  • Two-way coupling of a global Hall magnetohydrodynamics model with a local implicit particle-in-cell model
  • 2014
  • Ingår i: Journal of Computational Physics. - : Elsevier BV. - 0021-9991 .- 1090-2716. ; 268, s. 236-254
  • Tidskriftsartikel (refereegranskat)abstract
    • Computational models based on a fluid description of the plasma, such as magnetohydrodynamic (MHD) and extended magnetohydrodynamic (XMHD) codes are highly efficient, but they miss the kinetic effects due to the assumptions of small gyro radius, charge neutrality, and Maxwellian thermal velocity distribution. Kinetic codes can properly take into account the kinetic effects, but they are orders of magnitude more expensive than the fluid codes due to the increased degrees of freedom. If the fluid description is acceptable in a large fraction of the computational domain, it makes sense to confine the kinetic model to the regions where kinetic effects are important. This coupled approach can be much more efficient than a pure kinetic model. The speed up is approximately the volume ratio of the full domain relative to the kinetic regions assuming that the kinetic code uses a uniform grid. This idea has been advocated by [1] but their coupling was limited to one dimension and they employed drastically different grid resolutions in the fluid and kinetic models. We describe a fully two-dimensional two-way coupling of a Hall MHD model BATS-R-US with an implicit Particle-in-Cell (PIC) model iPIC3D. The coupling can be performed with identical grid resolutions and time steps. We call this coupled computational plasma model MHD-EPIC (MHD with Embedded PIC regions). Our verification tests show that MHD-EPIC works accurately and robustly. We show a two-dimensional magnetosphere simulation as an illustration of the potential future applications of MHD-EPIC.
  •  
40.
  • Deca, J., et al. (författare)
  • Electromagnetic Particle-in-Cell Simulations of the Solar Wind Interaction with Lunar Magnetic Anomalies
  • 2014
  • Ingår i: Physical Review Letters. - 0031-9007 .- 1079-7114. ; 112:15, s. 151102-
  • Tidskriftsartikel (refereegranskat)abstract
    • We present the first three-dimensional fully kinetic and electromagnetic simulations of the solar wind interaction with lunar crustal magnetic anomalies (LMAs). Using the implicit particle-in-cell code IPIC3D, we confirm that LMAs may indeed be strong enough to stand off the solar wind from directly impacting the lunar surface forming a mini-magnetosphere, as suggested by spacecraft observations and theory. In contrast to earlier magnetohydrodynamics and hybrid simulations, the fully kinetic nature of IPIC3D allows us to investigate the space charge effects and in particular the electron dynamics dominating the near-surface lunar plasma environment. We describe for the first time the interaction of a dipole model centered just below the lunar surface under plasma conditions such that only the electron population is magnetized. The fully kinetic treatment identifies electromagnetic modes that alter the magnetic field at scales determined by the electron physics. Driven by strong pressure anisotropies, the mini-magnetosphere is unstable over time, leading to only temporal shielding of the surface underneath. Future human exploration as well as lunar science in general therefore hinges on a better understanding of LMAs.
  •  
41.
  • Deca, Jan, et al. (författare)
  • Electron and Ion Dynamics of the Solar Wind Interaction with a Weakly Outgassing Comet
  • 2017
  • Ingår i: Physical Review Letters. - : American Physical Society. - 0031-9007 .- 1079-7114. ; 118:20
  • Tidskriftsartikel (refereegranskat)abstract
    • Using a 3D fully kinetic approach, we disentangle and explain the ion and electron dynamics of the solar wind interaction with a weakly outgassing comet. We show that, to first order, the dynamical interaction is representative of a four-fluid coupled system. We self-consistently simulate and identify the origin of the warm and suprathermal electron distributions observed by ESA's Rosetta mission to comet 67P/Churyumov-Gerasimenko and conclude that a detailed kinetic treatment of the electron dynamics is critical to fully capture the complex physics of mass-loading plasmas.
  •  
42.
  • Deca, Jan, et al. (författare)
  • General mechanism and dynamics of the solar wind interaction with lunar magnetic anomalies from 3-D particle-in-cell simulations
  • 2015
  • Ingår i: Journal of Geophysical Research - Space Physics. - 2169-9380 .- 2169-9402. ; 120:8, s. 6443-6463
  • Tidskriftsartikel (refereegranskat)abstract
    • We present a general model of the solar wind interaction with a dipolar lunar crustal magnetic anomaly (LMA) using three-dimensional full-kinetic and electromagnetic simulations. We confirm that LMAs may indeed be strong enough to stand off the solar wind from directly impacting the lunar surface, forming a so-called minimagnetosphere, as suggested by spacecraft observations and theory. We show that the LMA configuration is driven by electron motion because its scale size is small with respect to the gyroradius of the solar wind ions. We identify a population of back-streaming ions, the deflection of magnetized electrons via the E x B drift motion, and the subsequent formation of a halo region of elevated density around the dipole source. Finally, it is shown that the presence and efficiency of the processes are heavily impacted by the upstream plasma conditions and, on their turn, influence the overall structure and evolution of the LMA system. Understanding the detailed physics of the solar wind interaction with LMAs, including magnetic shielding, particle dynamics and surface charging is vital to evaluate its implications for lunar exploration.
  •  
43.
  • Deca, J., et al. (författare)
  • Spacecraft charging analysis with the implicit particle-in-cell code iPic3D
  • 2013
  • Ingår i: Physics of Plasmas. - : AIP Publishing. - 1070-664X .- 1089-7674. ; 20:10, s. 102902-
  • Tidskriftsartikel (refereegranskat)abstract
    • We present the first results on the analysis of spacecraft charging with the implicit particle-in-cell code iPic3D, designed for running on massively parallel supercomputers. The numerical algorithm is presented, highlighting the implementation of the electrostatic solver and the immersed boundary algorithm; the latter which creates the possibility to handle complex spacecraft geometries. As a first step in the verification process, a comparison is made between the floating potential obtained with iPic3D and with Orbital Motion Limited theory for a spherical particle in a uniform stationary plasma. Second, the numerical model is verified for a CubeSat benchmark by comparing simulation results with those of PTetra for space environment conditions with increasing levels of complexity. In particular, we consider spacecraft charging from plasma particle collection, photoelectron and secondary electron emission. The influence of a background magnetic field on the floating potential profile near the spacecraft is also considered. Although the numerical approaches in iPic3D and PTetra are rather different, good agreement is found between the two models, raising the level of confidence in both codes to predict and evaluate the complex plasma environment around spacecraft.
  •  
44.
  • Deca, Jan, et al. (författare)
  • Three-dimensional full-kinetic simulation of the solar wind interaction with a vertical dipolar lunarmagnetic anomaly
  • 2016
  • Ingår i: Geophysical Research Letters. - 0094-8276 .- 1944-8007. ; 43:9, s. 4136-4144
  • Tidskriftsartikel (refereegranskat)abstract
    • A detailed understanding of the solar wind interaction with lunar magnetic anomalies (LMAs) is essential to identify its implications for lunar exploration and to enhance our physical understanding of the particle dynamics in a magnetized plasma. We present the first three-dimensional full-kinetic electromagnetic simulation case study of the solar wind interaction with a vertical dipole, resembling a medium-size LMA. In contrast to a horizontal dipole, we show that a vertical dipole twists its field lines and cannot form a minimagnetosphere. Instead, it creates a ring-shaped weathering pattern and reflects up to 21% (four times more as compared to the horizontal case) of the incoming solar wind ions electrostatically through the normal electric field formed above the electron shielding region surrounding the cusp. This work delivers a vital piece to fully comprehend and interpret lunar observations, as we find the amount of reflected ions to be a tracer for the underlying field structure.
  •  
45.
  • Divin, Andrey, et al. (författare)
  • A Fully Kinetic Perspective of Electron Acceleration around a Weakly Outgassing Comet
  • 2020
  • Ingår i: Astrophysical Journal Letters. - : IOP PUBLISHING LTD. - 2041-8205 .- 2041-8213. ; 889:2
  • Tidskriftsartikel (refereegranskat)abstract
    • The cometary mission Rosetta has shown the presence of higher-than-expected suprathermal electron fluxes. In this study, using 3D fully kinetic electromagnetic simulations of the interaction of the solar wind with a comet, we constrain the kinetic mechanism that is responsible for the bulk electron energization that creates the suprathermal distribution from the warm background of solar wind electrons. We identify and characterize the magnetic field-aligned ambipolar electric field that ensures quasi-neutrality and traps warm electrons. Solar wind electrons are accelerated to energies as high as 50-70 eV close to the comet nucleus without the need for wave-particle or turbulent heating mechanisms. We find that the accelerating potential controls the parallel electron temperature, total density, and (to a lesser degree) the perpendicular electron temperature and the magnetic field magnitude. Our self-consistent approach enables us to better understand the underlying plasma processes that govern the near-comet plasma environment.
  •  
46.
  • Divin, A., et al. (författare)
  • A new model for the electron pressure nongyrotropy in the outer electron diffusion region
  • 2016
  • Ingår i: Geophysical Research Letters. - : AMER GEOPHYSICAL UNION. - 0094-8276 .- 1944-8007. ; 43:20, s. 10565-10573
  • Tidskriftsartikel (refereegranskat)abstract
    • We present a new model to describe the electron pressure nongyrotropy inside the electron diffusion region (EDR) in an antiparallel magnetic reconnection scenario. A combination of particle-in-cell simulations and analytical estimates is used to identify such a component of the electron pressure tensor in the rotated coordinates, which is nearly invariant along the outflow direction between the X line and the electron remagnetization points in the outer EDR. It is shown that the EDR two-scale structure (inner and outer parts) is formed due to superposition of the nongyrotropic meandering electron population and gyrotropic electron population with large anisotropy parallel to the magnetic field upstream of the EDR. Inside the inner EDR the influence of the pressure anisotropy can largely be ignored. In the outer EDR, a thin electron layer with electron flow speed exceeding the E x B drift velocity is supported by large-momentum flux produced by the electron pressure anisotropy upstream of the EDR. We find that this fast electron exhaust flow with |V(e)xB|>|E| is in fact a constituent part of the EDR, a finding which will steer the interpretation of the Magnetospheric Multiscale Mission (MMS) data.
  •  
47.
  • Divin, A., et al. (författare)
  • Evolution of the lower hybrid drift instability at reconnection jet front
  • 2015
  • Ingår i: Journal of Geophysical Research - Space Physics. - 2169-9380 .- 2169-9402. ; 120:4, s. 2675-2690
  • Tidskriftsartikel (refereegranskat)abstract
    • We investigate current-driven modes developing at jet fronts during collisionless reconnection. Initial evolution of the reconnection is simulated using conventional 2-D setup starting from the Harris equilibrium. Three-dimensional PIC calculations are implemented at later stages, when fronts are fully formed. Intense currents and enhanced wave activity are generated at the fronts because of the interaction of the fast flow plasma and denser ambient current sheet plasma. The study reveals that the lower hybrid drift instability develops quickly in the 3-D simulation. The instability produces strong localized perpendicular electric fields, which are several times larger than the convective electric field at the front, in agreement with Time History of Events and Macroscale Interactions during Substorms observations. The instability generates waves, which escape the front edge and propagate into the undisturbed plasma ahead of the front. The parallel electron pressure is substantially larger in the 3-D simulation compared to that of the 2-D. In a time similar to Omega(-1)(ci), the instability forms a layer, which contains a mixture of the jet plasma and current sheet plasma. The results confirm that the lower hybrid drift instability is important for the front evolution and electron energization.
  •  
48.
  • Divin, A., et al. (författare)
  • Inner and outer electron diffusion region of antiparallel collisionless reconnection : Density dependence
  • 2019
  • Ingår i: Physics of Plasmas. - : AMER INST PHYSICS. - 1070-664X .- 1089-7674. ; 26:10
  • Tidskriftsartikel (refereegranskat)abstract
    • We study inflow density dependence of substructures within electron diffusion region (EDR) of collisionless symmetric magnetic reconnection. We perform a set of 2.5D particle-in-cell simulations which start from a Harris current layer with a uniform background density n(b). A scan of n(b) ranging from 0:02 n(0) to 2 n(0) of the peak current layer density (n(0)) is studied keeping other plasma parameters the same. Various quantities measuring reconnection rate, EDR spatial scales, and characteristic velocities are introduced. We analyze EDR properties during quasisteady stage when the EDR length measures saturate. Consistent with past kinetic simulations, electrons are heated parallel to the B field in the inflow region. The presence of the strong parallel anisotropy acts twofold: (1) electron pressure anisotropy drift gets important at the EDR upstream edge in addition to the E x B drift speed and (2) the pressure anisotropy term -del.P-(e)/(ne) modifies the force balance there. We find that the width of the EDR demagnetization region and EDR current are proportional to the electron inertial length similar to d(e) and similar to d(e)n(b)(0.22), respectively. Magnetic reconnection is fast with a rate of similar to 0.1 but depends weakly on density as similar to n(b)(-1/8). Such reconnection rate proxies as EDR geometrical aspect or the inflow-to-outflow electron velocity ratio are shown to have different density trends, making electric field the only reliable measure of the reconnection rate. Published under license by AIP Publishing.
  •  
49.
  • Divin, A., et al. (författare)
  • Numerical simulations of separatrix instabilities in collisionless magnetic reconnection
  • 2012
  • Ingår i: Physics of Plasmas. - : AIP Publishing. - 1070-664X .- 1089-7674. ; 19:4, s. 042110-
  • Tidskriftsartikel (refereegranskat)abstract
    • Electron scale dynamics of magnetic reconnection separatrix jets is studied in this paper. Instabilities developing in directions both parallel and perpendicular to the magnetic field are investigated. Implicit particle-in-cell simulations with realistic electron-to-ion mass ratio are complemented by a set of small scale high resolution runs having the separatrix force balance as the initial condition. A special numerical procedure is developed to introduce the force balance into the small scale runs. Simulations show the development of streaming instabilities and consequent formation of electron holes in the parallel direction. A new electron jet instability develops in the perpendicular direction. The instability is closely related to the electron MHD Kelvin-Helmholtz mode and is destabilized by a flow, perpendicular to magnetic field at the separatrix. Tearing instability of the separatrix electron jet is modulated strongly by the electron MHD Kelvin-Helmholtz mode.
  •  
50.
  • Divin, A., et al. (författare)
  • Scaling of the inner electron diffusion region in collisionless magnetic reconnection
  • 2012
  • Ingår i: Journal of Geophysical Research. - 0148-0227 .- 2156-2202. ; 117, s. A06217-
  • Tidskriftsartikel (refereegranskat)abstract
    • The Sweet-Parker analysis of the inner electron diffusion region of collisionless magnetic reconnection is presented. The study includes charged particles motion near the X-line and an appropriate approximation of the off-diagonal term for the electron pressure tensor. The obtained scaling shows that the width of the inner electron diffusion region is equal to the electron inertial length, and that electrons are accelerated up to the electron Alfven velocity in X-line direction. The estimated effective plasma conductivity is based on the electron gyrofrequency rather than the binary collision frequency, and gives the extreme (minimal) value of the plasma conductivity similar to Bohm diffusion. The scaling properties are verified by means of Particle-in-Cell simulations. An ad hoc parameter needs to be introduced to the scaling relations in order to better match the theory and simulations.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 204
Typ av publikation
konferensbidrag (93)
tidskriftsartikel (92)
annan publikation (7)
doktorsavhandling (5)
licentiatavhandling (3)
forskningsöversikt (2)
visa fler...
rapport (1)
bokkapitel (1)
visa färre...
Typ av innehåll
refereegranskat (182)
övrigt vetenskapligt/konstnärligt (22)
Författare/redaktör
Markidis, Stefano (201)
Laure, Erwin (58)
Lapenta, G. (40)
Peng, Ivy Bo (35)
Podobas, Artur (26)
Schlatter, Philipp (20)
visa fler...
Divin, Andrey (19)
Lapenta, Giovanni (18)
Jansson, Niclas, 198 ... (16)
Chien, Wei Der (14)
Innocenti, M. E. (12)
Karp, Martin, 1996- (12)
Liu, Felix (10)
Kestor, G. (9)
Olshevsky, Vyachesla ... (9)
Divin, A. (9)
Rivas-Gomez, Sergio (9)
Deca, J. (8)
Iakymchuk, Roman (8)
Gioiosa, R. (8)
Goldman, M. V. (8)
Toth, Gabor (8)
Williams, Jeremy J. (8)
Hart, Alistair (8)
Newman, D (7)
Andersson, Måns (7)
Fischer, Paul (7)
Fredriksson, Albin (7)
Gong, Jing (7)
Newman, D. L. (7)
Chen, Yuxi (7)
Khotyaintsev, Yuri V ... (6)
Vaivads, Andris (6)
Goldman, M (6)
Araújo De Medeiros, ... (6)
Narasimhamurthy, Sai (6)
Deca, Jan (6)
Aguilar, Xavier (5)
Schliephake, Michael (5)
Akhmetova, Dana (5)
Henri, P. (5)
Wahlgren, Jacob (5)
Henri, Pierre (5)
Olshevsky, Viachesla ... (5)
Cazzola, E. (5)
Chien, Steven Wei De ... (5)
Sishtla, Chaitanya P ... (5)
Chien, Steven W. D. (5)
Peng, I. B. (5)
Costea, Stefan (5)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (204)
Uppsala universitet (26)
Umeå universitet (1)
Språk
Engelska (203)
Svenska (1)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (174)
Teknik (46)
Medicin och hälsovetenskap (3)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy