SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Alekseenko Andrey) "

Sökning: WFRF:(Alekseenko Andrey)

  • Resultat 1-5 av 5
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Alekseenko, Andrey, 1990-, et al. (författare)
  • Comparing the Performance of SYCL Runtimes for Molecular Dynamics Applications
  • 2023
  • Ingår i: International Workshop on OpenCL (IWOCL ’23). - : ACM Digital Library. - 9798400707452
  • Konferensbidrag (refereegranskat)abstract
    • SYCL is a cross-platform, royalty-free standard for programming a wide range of hardware accelerators. It is a powerful and convenient way to write standard C++ 17 code that can take full advantage of available devices. There are already multiple SYCL implementations targeting a wide range of platforms, from embedded to HPC clusters. Since several implementations can target the same hardware, application developers and users must know how to choose the most fitting runtime for their needs. In this talk, we will compare the runtime performance of two major SYCL runtimes targeting GPUs, oneAPI DPC++ and Open SYCL [3], to the native implementations for the purposes of GROMACS, a high-performance molecular dynamics engine.Molecular dynamics (MD) applications were one of the earliest adopters of GPU acceleration, with force calculations being an obvious target for offloading. It is an iterative algorithm where, in its most basic form, on each step, forces acting between particles are computed, and then the equations of motions are integrated. As the computational power of the GPUs grew, the strong scaling problem became apparent: the biophysical systems modeled with molecular dynamics typically have fixed sizes, and the goal is to perform more time steps, each taking less than a millisecond of wall time. This places high demands on the underlying GPU framework, requiring it to efficiently schedule multiple small tasks with minimal overhead, allowing to achieve overlap between CPU and GPU work for large systems and allowing to keep GPU occupied for smaller systems. Another requirement is the ability of application developers to have control over the scheduling to optimize for external dependencies, such as MPI communication.GROMACS is a widely-used MD engine, supporting a wide range of hardware and software platforms, from laptops to the largest supercomputers [1]. Portability and performance across multiple architectures have always been one of the primary goals of the project, necessary to keep the code not only efficient but also maintainable. The initial support for NVIDIA accelerators, using CUDA, was added to GROMACS in 2010. Since then, heterogeneous parallelization has been a major target for performance optimization, not limited to NVIDIA devices but later adding support for GPUs of other vendors, as well as Xeon Phi accelerators. GROMACS initially adopted SYCL in its 2021 release to replace its previous GPU portability layer, OpenCL [2]. In further releases, the number of offloading modes supported by the SYCL backend steadily increased. As of GROMACS 2023, SYCL support in GROMACS achieved near feature parity with CUDA while allowing the use of a single code to target the GPUs of all three major vendors with minimal specialization.While this clearly supports the portability promise of modern SYCL implementations, the performance of such portable code remains an open question, especially given the strict requirements of MD algorithms. In this talk, we compare the performance of GROMACS across a wide range of system sizes when using oneAPI DPC++ and Open SYCL runtimes on high-performance NVIDIA, AMD, and Intel GPUs. Besides the analysis of individual kernel performance, we focus on the runtime overhead and the efficiency of task scheduling when compared to a highly optimized implementation using the native frameworks and discuss the possible sources of suboptimal performance and the amount of vendor-specific code branches, such as intrinsics or workarounds for compiler bugs, required to achieve the optimal performance.
  •  
2.
  • Alekseenko, Andrey, et al. (författare)
  • Experiences with Adding SYCL Support to GROMACS
  • 2021
  • Ingår i: IWOCL'21. - New York, NY, USA : Association for Computing Machinery (ACM).
  • Konferensbidrag (refereegranskat)abstract
    • GROMACS is an open-source, high-performance molecular dynamics (MD) package primarily used for biomolecular simulations, accounting for 5% of HPC utilization worldwide. Due to the extreme computing needs of MD, significant efforts are invested in improving the performance and scalability of simulations. Target hardware ranges from supercomputers to laptops of individual researchers and volunteers of distributed computing projects such as Folding@Home. The code has been designed both for portability and performance by explicitly adapting algorithms to SIMD and data-parallel processors. A SIMD intrinsic abstraction layer provides high CPU performance. Explicit GPU acceleration has long used CUDA to target NVIDIA devices and OpenCL for AMD/Intel devices. In this talk, we discuss the experiences and challenges of adding support for the SYCL platform into the established GROMACS codebase and share experiences and considerations in porting and optimization. While OpenCL offers the benefits of using the same code to target different hardware, it suffers from several drawbacks that add significant development friction. Its separate-source model leads to code duplication and makes changes complicated. The need to use C99 for kernels, while the rest of the codebase uses C++17, exacerbates these issues. Another problem is that OpenCL, while supported by most GPU vendors, is never the main framework and thus is not getting the primary support or tuning efforts. SYCL alleviates many of these issues, employing a single-source model based on the modern C++ standard. In addition to being the primary platform for Intel GPUs, the possibility to target AMD and NVIDIA GPUs through other implementations (e.g., hipSYCL) might make it possible to reduce the number of separate GPU ports that have to be maintained. Some design differences from OpenCL, such as flow directed acyclic graphs (DAGs) instead of in-order queues, made it necessary to reconsider the GROMACS's task scheduling approach and architectural choices in the GPU backend. Additionally, supporting multiple GPU platforms presents a challenge of balancing performance (low-level and hardware-specific code) and maintainability (more generalization and code-reuse). We will discuss the limitations of the existing codebase and interoperability layers with regards to adding the new platform; the compute performance and latency comparisons; code quality considerations; and the issues we encountered with SYCL implementations tested. Finally, we will discuss our goals for the next release cycle for the SYCL backend and the overall architecture of GPU acceleration code in GROMACS.
  •  
3.
  • Alekseenko, Andrey, 1990-, et al. (författare)
  • GROMACS on AMD GPU-Based HPC Platforms : Using SYCL for Performance and Portability
  • 2024
  • Ingår i: CUG2024 Proceedings.
  • Konferensbidrag (refereegranskat)abstract
    • GROMACS is a widely-used molecular dynamics software package with a focus on performance, portability, and maintainability across a broad range of platforms. Thanks to its early algorithmic redesign and flexible heterogeneous parallelization, GROMACS has successfully harnessed GPU accelerators for more than a decade.With the diversification of accelerator platforms in HPC and no obvious choice for a well-suited multi-vendor programming model, the GROMACS project found itself at a crossroads. The performance and portability requirements, as well as a strong preference for a standards-based programming model, motivated our choice to use SYCL for production on both new HPC GPU platforms: AMD and Intel.Since the GROMACS 2022 release, the SYCL backend has been the primary means to target AMD GPUs in preparation for exascale HPC architectures like LUMI and Frontier.SYCL is a cross-platform, royalty-free, C++17-based standard for programming hardware accelerators, from embedded to HPC.It allows using the same code to target GPUs from all three major vendors with minimal specialization, which offers major portability benefits.While SYCL implementations build on native compilers and runtimes, whether such an approach is performant is not immediately evident.Biomolecular simulations have challenging performance characteristics: latency sensitivity, the need for strong scaling, and typical iteration times as short as hundreds of microseconds. Hence, obtaining good performance across the range of problem sizes and scaling regimes is particularly challenging.Here, we share the results of our work on readying GROMACS for AMD GPU platforms using SYCL,and demonstrate performance on Cray EX235a machines with MI250X accelerators. Our findings illustrate that portability is possible without major performance compromises.We provide a detailed analysis of node-level kernel and runtime performance with the aim of sharing best practices with the HPC community on using SYCL as a performance-portable GPU framework.
  •  
4.
  • Alekseenko, Zhanna, et al. (författare)
  • Robust derivation of transplantable dopamine neurons from human pluripotent stem cells by timed retinoic acid delivery
  • 2022
  • Ingår i: Nature Communications. - : Springer Science and Business Media LLC. - 2041-1723. ; 13
  • Tidskriftsartikel (refereegranskat)abstract
    • Stem cell therapies for Parkinson’s disease (PD) have entered first-in-human clinical trials using a set of technically related methods to produce mesencephalic dopamine (mDA) neurons from human pluripotent stem cells (hPSCs). Here, we outline an approach for high-yield derivation of mDA neurons that principally differs from alternative technologies by utilizing retinoic acid (RA) signaling, instead of WNT and FGF8 signaling, to specify mesencephalic fate. Unlike most morphogen signals, where precise concentration determines cell fate, it is the duration of RA exposure that is the key-parameter for mesencephalic specification. This concentration-insensitive patterning approach provides robustness and reduces the need for protocol-adjustments between hPSC-lines. RA-specified progenitors promptly differentiate into functional mDA neurons in vitro, and successfully engraft and relieve motor deficits after transplantation in a rat PD model. Our study provides a potential alternative route for cell therapy and disease modelling that due to its robustness could be particularly expedient when use of autologous- or immunologically matched cells is considered.
  •  
5.
  • Nystedt, Björn, et al. (författare)
  • The Norway spruce genome sequence and conifer genome evolution
  • 2013
  • Ingår i: Nature. - : Nature Publishing Group. - 0028-0836 .- 1476-4687. ; 497:7451, s. 579-584
  • Tidskriftsartikel (refereegranskat)abstract
    • Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-5 av 5
Typ av publikation
konferensbidrag (3)
tidskriftsartikel (2)
Typ av innehåll
refereegranskat (5)
Författare/redaktör
Pall, Szilard (3)
Lindahl, Erik, 1972- (2)
Alekseenko, Andrey, ... (2)
Alekseenko, Andrey (2)
Ingvarsson, Pär K (1)
Niittylä, Totte (1)
visa fler...
Garcia Gil, Rosario (1)
Sundberg, Björn (1)
Lundeberg, Joakim (1)
Zhang, Bo (1)
Nolbrant, Sara (1)
Parmar, Malin (1)
Adler, Andrew (1)
Olson, Åke (1)
Yoshitake, Takashi (1)
Kehr, Jan (1)
Jansson, Stefan (1)
Vasylovska, Svitlana (1)
Keech, Olivier (1)
Ericson, Johan (1)
Tuominen, Hannele (1)
Svensson, Thomas (1)
Alexeyenko, Andrey (1)
Delhomme, Nicolas (1)
Nilsson, Ove (1)
Alekseenko, Zhanna (1)
Dias, José M. (1)
Kozhevnikova, Mariya (1)
van Lunteren, Josina ... (1)
Jeggari, Ashwini (1)
Carlén, Marie (1)
Nystedt, Björn (1)
Vezzi, Francesco (1)
Sherwood, Ellen (1)
de Jong, Pieter (1)
Arvestad, Lars (1)
Andersson, Björn (1)
Wetterbom, Anna (1)
Holmberg, Kristina (1)
Hvidsten, Torgeir R. (1)
Bhalerao, Rishikesh ... (1)
Bohlmann, Joerg (1)
Klasson, Lisa (1)
Elfstrand, Malin (1)
Giacomello, Stefania (1)
Käller, Max (1)
Lysholm, Fredrik (1)
Bousquet, Jean (1)
Koriabine, Maxim (1)
Scofield, Douglas G. (1)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (4)
Karolinska Institutet (2)
Umeå universitet (1)
Uppsala universitet (1)
Stockholms universitet (1)
Lunds universitet (1)
visa fler...
Sveriges Lantbruksuniversitet (1)
visa färre...
Språk
Engelska (5)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (4)
Medicin och hälsovetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy