SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Amer K)
 

Sökning: WFRF:(Amer K) > (2015-2019) > Scaling FMM with da...

  • Amer, AbdelhalimArgonne National Laboratory (författare)

Scaling FMM with data-driven OpenMP tasks on multicore architectures

  • Artikel/kapitelEngelska2016

Förlag, utgivningsår, omfång ...

  • 2016-09-21
  • Cham :Springer International Publishing,2016

Nummerbeteckningar

  • LIBRIS-ID:oai:research.chalmers.se:6d92e626-60e6-447d-b51f-3e38f84a36e6
  • https://doi.org/10.1007/978-3-319-45550-1_12DOI
  • https://research.chalmers.se/publication/246282URI

Kompletterande språkuppgifter

  • Språk:engelska
  • Sammanfattning på:engelska

Ingår i deldatabas

Klassifikation

  • Ämneskategori:art swepub-publicationtype
  • Ämneskategori:ref swepub-contenttype

Anmärkningar

  • Poor scalability on parallel architectures can be attributed to several factors, among which idle times, data movement, and runtime overhead are predominant. Conventional parallel loops and nested parallelism have proved successful for regular computational patterns. For more complex and irregular cases, however, these methods often perform poorly because they consider only a subset of these costs. Although data-driven methods are gaining popularity for efficiently utilizing computational cores, their data movement and runtime costs can be prohibitive for highly dynamic and irregular algorithms, such as fast multipole methods (FMMs). Furthermore, loop tiling, a technique that promotes data locality and has been successful for regular parallel methods, has received little attention in the context of dynamic and irregular parallelism. We present a method to exploit loop tiling in data-driven parallel methods. Here, we specify a methodology to spawn work units characterized by a high data locality potential. Work units operate on tiled computational patterns and serve as building blocks in an OpenMP task-based data-driven execution. In particular, by the adjusting work unit granularity, idle times and runtime overheads are also taken into account. We apply this method to a popular FMM implementation and show that, with careful tuning, the new method outperforms existing parallel-loop and user-level thread-based implementations by up to fourfold on 48 cores.

Ämnesord och genrebeteckningar

Biuppslag (personer, institutioner, konferenser, titlar ...)

  • Matsuoka, S.Tokyo Institute of Technology (författare)
  • Pericas, Miquel,1979Chalmers tekniska högskola,Chalmers University of Technology(Swepub:cth)miquelp (författare)
  • Maruyama, NaoyaRIKEN (författare)
  • Taura, K.University of Tokyo, Japan (författare)
  • Yokota, RioTokyo Institute of Technology (författare)
  • Balaji, PavanArgonne National Laboratory (författare)
  • Argonne National LaboratoryTokyo Institute of Technology (creator_code:org_t)

Sammanhörande titlar

  • Ingår i:Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Cham : Springer International Publishing9903 LNCS, s. 156-1701611-33490302-9743

Internetlänk

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy