SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Fechner Martin) "

Sökning: WFRF:(Fechner Martin)

  • Resultat 1-6 av 6
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • de Jong, R. S., et al. (författare)
  • 4MOST : Project overview and information for the First Call for Proposals
  • 2019
  • Ingår i: The Messenger. - : European Southern Observatory. - 0722-6691. ; 175, s. 3-11
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • We introduce the 4-metre Multi-Object Spectroscopic Telescope (4MOST), a new high-multiplex, wide-field spectroscopic survey facility under development for the four-metre-class Visible and Infrared Survey Telescope for Astronomy (VISTA) at Paranal. Its key specifications are: a large field of view (FoV) of 4.2 square degrees and a high multiplex capability, with 1624 fibres feeding two low-resolution spectrographs (R = λ/Δλ ~ 6500), and 812 fibres transferring light to the high-resolution spectrograph (R ~ 20 000). After a description of the instrument and its expected performance, a short overview is given of its operational scheme and planned 4MOST Consortium science; these aspects are covered in more detail in other articles in this edition of The Messenger. Finally, the processes, schedules, and policies concerning the selection of ESO Community Surveys are presented, commencing with a singular opportunity to submit Letters of Intent for Public Surveys during the first five years of 4MOST operations.
  •  
2.
  • Kutzner, Carsten, et al. (författare)
  • Best bang for your buck : GPU nodes for GROMACS biomolecular simulations
  • 2015
  • Ingår i: Journal of Computational Chemistry. - : Wiley. - 0192-8651 .- 1096-987X. ; 36:26, s. 1990-2008
  • Tidskriftsartikel (refereegranskat)abstract
    • The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well-exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)-based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off-loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime.
  •  
3.
  • Kutzner, Carsten, et al. (författare)
  • More bang for your buck : Improved use of GPU nodes for GROMACS 2018
  • 2019
  • Ingår i: Journal of Computational Chemistry. - : Wiley. - 0192-8651 .- 1096-987X. ; 40:27, s. 2418-2431
  • Tidskriftsartikel (refereegranskat)abstract
    • We identify hardware that is optimal to produce molecular dynamics (MD) trajectories on Linux compute clusters with the GROMACS 2018 simulation package. Therefore, we benchmark the GROMACS performance on a diverse set of compute nodes and relate it to the costs of the nodes, which may include their lifetime costs for energy and cooling. In agreement with our earlier investigation using GROMACS 4.6 on hardware of 2014, the performance to price ratio of consumer GPU nodes is considerably higher than that of CPU nodes. However, with GROMACS 2018, the optimal CPU to GPU processing power balance has shifted even more toward the GPU. Hence, nodes optimized for GROMACS 2018 and later versions enable a significantly higher performance to price ratio than nodes optimized for older GROMACS versions. Moreover, the shift toward GPU processing allows to cheaply upgrade old nodes with recent GPUs, yielding essentially the same performance as comparable brand-new hardware.
  •  
4.
  • Kutzner, Carsten, et al. (författare)
  • Software news and update : Speeding up parallel GROMACS on high-latency networks
  • 2007
  • Ingår i: Journal of Computational Chemistry. - : Wiley. - 0192-8651 .- 1096-987X. ; 28:12, s. 2075-84
  • Tidskriftsartikel (refereegranskat)abstract
    • We investigate the parallel scaling of the GROMACS molecular dynamics code on Ethernet Beowulf clusters and what prerequisites are necessary for decent scaling even on such clusters with only limited bandwidth and high latency. GROMACS 3.3 scales well on supercomputers like the IBM p690 (Regatta) and on Linux clusters with a special interconnect like Myrinet or Infiniband. Because of the high single-node performance of GROMACS, however, on the widely used Ethernet switched clusters, the scaling typically breaks down when more than two computer nodes are involved, limiting the absolute speedup that can be gained to about 3 relative to a single-CPU run. With the LAM MPI implementation, the main scaling bottleneck is here identified to be the all-to-all communication which is required every time step. During such an all-to-all communication step, a huge amount of messages floods the network, and as a result many TCP packets are lost. We show that Ethernet flow control prevents network congestion and leads to substantial scaling improvements. For 16 CPUs, e.g., a speedup of 11 has been achieved. However, for more nodes this mechanism also fails. Having optimized an all-to-all routine, which sends the data in an ordered fashion, we show that it is possible to completely prevent packet loss for any number of multi-CPU nodes. Thus, the GROMACS scaling dramatically improves, even for switches that lack flow control. In addition, for the common HP ProCurve 2848 switch we find that for optimum all-to-all performance it is essential how the nodes are connected to the switch's ports. This is also demonstrated for the example of the Car-Parinello MD code.
  •  
5.
  •  
6.
  • Kutzner, Carsten, et al. (författare)
  • Speeding up parallel GROMACS on high-latency networks
  • 2007
  • Ingår i: Journal of Computational Chemistry. - : Wiley. - 0192-8651 .- 1096-987X. ; 28:12, s. 2075-2084
  • Tidskriftsartikel (refereegranskat)abstract
    • We investigate the parallel scaling of the GROMACS molecular dynamics code on Ethernet Beowulf clusters and what prerequisites are necessary for decent scaling even on such clusters with only limited bandwidth and high latency. GROMACS 3.3 scales well on supercomputers like the IBM p690 (Regatta) and on Linux clusters with a special interconnect like Myrinet or Infiniband. Because of the high single-node performance of GROMACS, however, on the widely used Ethernet switched clusters, the scaling typically breaks down when more than two computer nodes are involved, limiting the absolute speedup that can be gained to about 3 relative to a single-CPU run. With the LAM MPI implementation, the main scaling bottleneck is here identified to be the all-to-all communication which is required every time step. During such an all-to-all communication step, a huge amount of messages floods the network, and as a result many TCP packets are lost. We show that Ethernet flow control prevents network congestion and leads to substantial scaling improvements. For 16 CPUs, e.g., a speedup of 11 has been achieved. However, for more nodes this mechanism also fails. Having optimized an all-to-all routine, which sends the data in an ordered fashion, we show that it is possible to completely prevent packet loss for any number of multi-CPU nodes. Thus, the GROMACS scaling dramatically improves, even for switches that lack flow control. In addition, for the common HP ProCurve 2848 switch we find that for optimum all-to-all performance it is essential how the nodes are connected to the switch's ports. This is also demonstrated for the example of the Car-Parinello MD code.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-6 av 6

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy