SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Fechner L.)
 

Sökning: WFRF:(Fechner L.) > Speeding up paralle...

LIBRIS Formathandbok  (Information om MARC21)
FältnamnIndikatorerMetadata
00003974naa a2200565 4500
001oai:DiVA.org:uu-16712
003SwePub
008080603s2007 | |||||||||||000 ||eng|
024a https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-167122 URI
024a https://doi.org/10.1002/jcc.207032 DOI
040 a (SwePub)uu
041 a engb eng
042 9 SwePub
072 7a ref2 swepub-contenttype
072 7a art2 swepub-publicationtype
100a Kutzner, Carsten4 aut
2451 0a Speeding up parallel GROMACS on high-latency networks
264 c 2007
264 1b Wiley,c 2007
338 a print2 rdacarrier
520 a We investigate the parallel scaling of the GROMACS molecular dynamics code on Ethernet Beowulf clusters and what prerequisites are necessary for decent scaling even on such clusters with only limited bandwidth and high latency. GROMACS 3.3 scales well on supercomputers like the IBM p690 (Regatta) and on Linux clusters with a special interconnect like Myrinet or Infiniband. Because of the high single-node performance of GROMACS, however, on the widely used Ethernet switched clusters, the scaling typically breaks down when more than two computer nodes are involved, limiting the absolute speedup that can be gained to about 3 relative to a single-CPU run. With the LAM MPI implementation, the main scaling bottleneck is here identified to be the all-to-all communication which is required every time step. During such an all-to-all communication step, a huge amount of messages floods the network, and as a result many TCP packets are lost. We show that Ethernet flow control prevents network congestion and leads to substantial scaling improvements. For 16 CPUs, e.g., a speedup of 11 has been achieved. However, for more nodes this mechanism also fails. Having optimized an all-to-all routine, which sends the data in an ordered fashion, we show that it is possible to completely prevent packet loss for any number of multi-CPU nodes. Thus, the GROMACS scaling dramatically improves, even for switches that lack flow control. In addition, for the common HP ProCurve 2848 switch we find that for optimum all-to-all performance it is essential how the nodes are connected to the switch's ports. This is also demonstrated for the example of the Car-Parinello MD code.
650 7a NATURVETENSKAPx Kemi0 (SwePub)1042 hsv//swe
650 7a NATURAL SCIENCESx Chemical Sciences0 (SwePub)1042 hsv//eng
650 7a NATURVETENSKAPx Biologi0 (SwePub)1062 hsv//swe
650 7a NATURAL SCIENCESx Biological Sciences0 (SwePub)1062 hsv//eng
650 7a NATURVETENSKAPx Data- och informationsvetenskap0 (SwePub)1022 hsv//swe
650 7a NATURAL SCIENCESx Computer and Information Sciences0 (SwePub)1022 hsv//eng
653 a GROMACS parallel molecular dynamics
653 a Car-Parrinello MD
653 a Ethernet flow control
653 a MPI_Alltoall
653 a network congestion
653 a Chemistry
653 a Kemi
653 a Biology
653 a Biologi
653 a Information technology
653 a Informationsteknik
700a van der Spoel, Davidu Uppsala universitet,Institutionen för cell- och molekylärbiologi,Van der Spoel4 aut0 (Swepub:uu)davivand
700a Fechner, Martin4 aut
700a Lindahl, Erik4 aut
700a Schmitt, Udo W.4 aut
700a de Groot, Bert L.4 aut
700a Grubmüller, Helmut4 aut
710a Uppsala universitetb Institutionen för cell- och molekylärbiologi4 org
773t Journal of Computational Chemistryd : Wileyg 28:12, s. 2075-2084q 28:12<2075-2084x 0192-8651x 1096-987X
856u http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=17405124&dopt=Citation
856u https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/jcc.20703
8564 8u https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-16712
8564 8u https://doi.org/10.1002/jcc.20703

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy