SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Taura K.) "

Sökning: WFRF:(Taura K.)

  • Resultat 1-6 av 6
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  •  
2.
  • Amer, Abdelhalim, et al. (författare)
  • Scaling FMM with data-driven OpenMP tasks on multicore architectures
  • 2016
  • Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Cham : Springer International Publishing. - 1611-3349 .- 0302-9743. ; 9903 LNCS, s. 156-170
  • Tidskriftsartikel (refereegranskat)abstract
    • Poor scalability on parallel architectures can be attributed to several factors, among which idle times, data movement, and runtime overhead are predominant. Conventional parallel loops and nested parallelism have proved successful for regular computational patterns. For more complex and irregular cases, however, these methods often perform poorly because they consider only a subset of these costs. Although data-driven methods are gaining popularity for efficiently utilizing computational cores, their data movement and runtime costs can be prohibitive for highly dynamic and irregular algorithms, such as fast multipole methods (FMMs). Furthermore, loop tiling, a technique that promotes data locality and has been successful for regular parallel methods, has received little attention in the context of dynamic and irregular parallelism. We present a method to exploit loop tiling in data-driven parallel methods. Here, we specify a methodology to spawn work units characterized by a high data locality potential. Work units operate on tiled computational patterns and serve as building blocks in an OpenMP task-based data-driven execution. In particular, by the adjusting work unit granularity, idle times and runtime overheads are also taken into account. We apply this method to a popular FMM implementation and show that, with careful tuning, the new method outperforms existing parallel-loop and user-level thread-based implementations by up to fourfold on 48 cores.
  •  
3.
  • Huynh, An, et al. (författare)
  • DAGViz: A DAG Visualization Tool for Analyzing Task Parallel Program Traces
  • 2015
  • Ingår i: Proceedings of the 2nd Workshop on Visual Performance Analysis (VPA). - New York, NY, USA : ACM. - 9781450340137
  • Konferensbidrag (refereegranskat)abstract
    • Copyright 2015 ACM. In task-based parallel programming, programmers can expose logical parallelism of their programs by creating fine-grained tasks at arbitrary places in their code. All other burdens in the parallel execution of these tasks such as thread management, task scheduling, and load balancing are handled automatically by runtime systems. This kind of parallel programming model has been conceived as a promising paradigm that brings intricate parallel programming techniques to a larger audience of programmers because of its high programmability. There have been many languages (e.g., OpenMP, Cilk Plus) and libraries (e.g, Intel TBB, Qthreads, MassiveThreads) supporting task parallelism. However, the nondeterministic nature of task parallel execution which hides runtime scheduling mechanisms from programmers has made it difficult for programmers to understand the cause of suboptimal performance of their programs. As an effort to tackle this problem, and also to clarify differences between task parallel runtime systems, we have developed a toolset that captures and visualizes the trace of an execution of a task parallel program in the form of a directed acyclic graph (DAG). A computation DAG of a task parallel program's run is extracted automatically by our lightweight portable wrapper around all five systems which incurs no intervention into the target systems' code. The DAG is stored in a file and then visualized to analyze performance. We leverage the hierarchical structure of the DAG to enhance the DAG file format and DAG visualization, and make them manageable even with a huge DAG of arbitrarily large numbers of nodes. This DAG visualization provides a task-centric view of the program, which is different from other popular visualizations such as thread-centric timeline visualization and code-centric hotspots analysis. Besides, DAGViz also provides an additional timeline visualization which is constructed by individual nodes of the DAG, and is useful in coordinating user attention to low-parallelism areas on the DAG. We demonstrate usefulness of our DAG visualizations in some case studies. We expect to build other kinds of effective visualizations based on this computation DAG in future work, and make DAGViz an effective tool supporting the process of analyzing task parallel performance and developing scheduling algorithms for task parallel runtime schedulers.
  •  
4.
  • Oikonomou, Vasileios, et al. (författare)
  • The Role of Interferon-γ in Autoimmune Polyendocrine Syndrome Type 1.
  • 2024
  • Ingår i: The New England journal of medicine. - 1533-4406. ; 390:20, s. 1873-1884
  • Tidskriftsartikel (refereegranskat)abstract
    • Autoimmune polyendocrine syndrome type 1 (APS-1) is a life-threatening, autosomal recessive syndrome caused by autoimmune regulator (AIRE) deficiency. In APS-1, self-reactive T cells escape thymic negative selection, infiltrate organs, and drive autoimmune injury. The effector mechanisms governing T-cell-mediated damage in APS-1 remain poorly understood.We examined whether APS-1 could be classified as a disease mediated by interferon-γ. We first assessed patients with APS-1 who were participating in a prospective natural history study and evaluated mRNA and protein expression in blood and tissues. We then examined the pathogenic role of interferon-γ using Aire-/-Ifng-/- mice and Aire-/- mice treated with the Janus kinase (JAK) inhibitor ruxolitinib. On the basis of our findings, we used ruxolitinib to treat five patients with APS-1 and assessed clinical, immunologic, histologic, transcriptional, and autoantibody responses.Patients with APS-1 had enhanced interferon-γ responses in blood and in all examined autoimmunity-affected tissues. Aire-/- mice had selectively increased interferon-γ production by T cells and enhanced interferon-γ, phosphorylated signal transducer and activator of transcription 1 (pSTAT1), and CXCL9 signals in multiple organs. Ifng ablation or ruxolitinib-induced JAK-STAT blockade in Aire-/- mice normalized interferon-γ responses and averted T-cell infiltration and damage in organs. Ruxolitinib treatment of five patients with APS-1 led to decreased levels of T-cell-derived interferon-γ, normalized interferon-γ and CXCL9 levels, and remission of alopecia, oral candidiasis, nail dystrophy, gastritis, enteritis, arthritis, Sjögren's-like syndrome, urticaria, and thyroiditis. No serious adverse effects from ruxolitinib were identified in these patients.Our findings indicate that APS-1, which is caused by AIRE deficiency, is characterized by excessive, multiorgan interferon-γ-mediated responses. JAK inhibition with ruxolitinib in five patients showed promising results. (Funded by the National Institute of Allergy and Infectious Diseases and others.).
  •  
5.
  • Pericas, Miquel, 1979, et al. (författare)
  • Scalable analysis of multicore data reuse and sharing
  • 2014
  • Ingår i: Proceedings of the International Conference on Supercomputing. - New York, NY, USA : ACM. - 9781450326421 ; , s. 353-362
  • Konferensbidrag (refereegranskat)abstract
    • The performance and energy efficiency of multicore systems are increasingly dominated by the costs of communication. As hardware parallelism grows, developers require more powerful tools to assess the data sharing and reuse properties of their algorithms. The reuse distance is an effective metric to study the temporal locality of programs and model private and shared caches. But the application of this method is challenging. First, generating memory traces is very expensive in storage and very intrusive on execution, possibly distorting the parallel schedule. And second, the algorithm is computationally very expensive, limiting the length, memory size and parallelism of analyzable programs. This paper introduces a novel coarse-grained reuse distance method, called Kernel Reuse Distance (KRD), which addresses these challenges. KRD enables a quick assessment of data locality by studying the reuse characteristics of the kernels' inputs and outputs. We analyze the performance of the initial prototype implementation and show two use cases comparing different parallel implementations. On a 24-core system, analyzing a trace from a matrix multiplication representing 24 threads, 1.37 terabytes of streamed data and 800 million distinct accesses, the parallel KRD implementation is able to compute the coherence-aware kernel reuse distance histogram for one socket (six cores) in 11.1 seconds. © 2014 ACM.
  •  
6.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-6 av 6

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy