Sökning: WFRF:(Fechner L.) > Software news and u...
Fältnamn | Indikatorer | Metadata |
---|---|---|
000 | 03613naa a2200409 4500 | |
001 | oai:DiVA.org:kth-82625 | |
003 | SwePub | |
008 | 120212s2007 | |||||||||||000 ||eng| | |
024 | 7 | a https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-826252 URI |
024 | 7 | a https://doi.org/10.1002/jcc.207032 DOI |
040 | a (SwePub)kth | |
041 | a engb eng | |
042 | 9 SwePub | |
072 | 7 | a ref2 swepub-contenttype |
072 | 7 | a art2 swepub-publicationtype |
100 | 1 | a Kutzner, Carstenu Max-Planck Institut Göttingen4 aut |
245 | 1 0 | a Software news and update :b Speeding up parallel GROMACS on high-latency networks |
264 | c 2007 | |
264 | 1 | b Wiley,c 2007 |
338 | a print2 rdacarrier | |
500 | a QC 20120302 | |
520 | a We investigate the parallel scaling of the GROMACS molecular dynamics code on Ethernet Beowulf clusters and what prerequisites are necessary for decent scaling even on such clusters with only limited bandwidth and high latency. GROMACS 3.3 scales well on supercomputers like the IBM p690 (Regatta) and on Linux clusters with a special interconnect like Myrinet or Infiniband. Because of the high single-node performance of GROMACS, however, on the widely used Ethernet switched clusters, the scaling typically breaks down when more than two computer nodes are involved, limiting the absolute speedup that can be gained to about 3 relative to a single-CPU run. With the LAM MPI implementation, the main scaling bottleneck is here identified to be the all-to-all communication which is required every time step. During such an all-to-all communication step, a huge amount of messages floods the network, and as a result many TCP packets are lost. We show that Ethernet flow control prevents network congestion and leads to substantial scaling improvements. For 16 CPUs, e.g., a speedup of 11 has been achieved. However, for more nodes this mechanism also fails. Having optimized an all-to-all routine, which sends the data in an ordered fashion, we show that it is possible to completely prevent packet loss for any number of multi-CPU nodes. Thus, the GROMACS scaling dramatically improves, even for switches that lack flow control. In addition, for the common HP ProCurve 2848 switch we find that for optimum all-to-all performance it is essential how the nodes are connected to the switch's ports. This is also demonstrated for the example of the Car-Parinello MD code. | |
650 | 7 | a NATURVETENSKAPx Kemix Teoretisk kemi0 (SwePub)104072 hsv//swe |
650 | 7 | a NATURAL SCIENCESx Chemical Sciencesx Theoretical Chemistry0 (SwePub)104072 hsv//eng |
650 | 7 | a NATURVETENSKAPx Data- och informationsvetenskapx Programvaruteknik0 (SwePub)102052 hsv//swe |
650 | 7 | a NATURAL SCIENCESx Computer and Information Sciencesx Software Engineering0 (SwePub)102052 hsv//eng |
700 | 1 | a van der Spoel, Davidu Uppsala University4 aut |
700 | 1 | a Fechner, Martinu Max-Planck Institut Göttingen4 aut |
700 | 1 | a Lindahl, Erik,d 1972-u Stockholm University4 aut0 (Swepub:kth)u1u9f2s7 |
700 | 1 | a Schmitt, Udo Wu Max-Planck Institut Göttingen4 aut |
700 | 1 | a de Groot, Bert Lu Max-Planck Institut Göttingen4 aut |
700 | 1 | a Grubmüller, Helmutu Max-Planck Institut Göttingen4 aut |
710 | 2 | a Max-Planck Institut Göttingenb Uppsala University4 org |
773 | 0 | t Journal of Computational Chemistryd : Wileyg 28:12, s. 2075-84q 28:12<2075-84x 0192-8651x 1096-987X |
856 | 4 | u https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/jcc.20703 |
856 | 4 8 | u https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-82625 |
856 | 4 8 | u https://doi.org/10.1002/jcc.20703 |
Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.