SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:0743 7315 OR L773:1096 0848 "

Sökning: L773:0743 7315 OR L773:1096 0848

  • Resultat 11-20 av 51
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
11.
  • Grahn, Håkan, et al. (författare)
  • Comparative evaluation of latency-tolerating and -reducing techniques for hardware-only and software-only directory protocols
  • 2000
  • Ingår i: Journal of Parallel and Distributed Computing. - SAN DIEGO : ACADEMIC PRESS INC. - 0743-7315 .- 1096-0848. ; 60:7, s. 807-834
  • Tidskriftsartikel (refereegranskat)abstract
    • We study in this paper how effective latency-tolerating and -reducing techniques are at cutting the memory access times for shared-memory multiprocessors with directory cache protocols managed by hardware and software. A critical issue for the relative efficiency is how many protocol operations such techniques trigger. This paper presents a framework that makes it possible to reason about the expected relative efficiency of a latency-tolerating or -reducing technique by focusing on whether the technique increases, decreases, or does not change the number of protocol operations at the memory module. Since software-only directory protocols handle these operations in software they will perform relatively worse unless the technique reduces the number of protocol operations. Our experimental results from detailed architectural simulations driven by six applications from the SPLASH-2 parallel program suite confirm this expectation, We find that while prefetching performs relatively worse on software-only directory protocols due to useless prefetches, there are examples of protocol optimizations, e.g., optimizations For migratory data, that do relatively better on software-only directory protocols. Overall, this study shows that latency-tolerating techniques must be more carefully selected for software-centric than for hardware-centric implementations of distributed shared-memory systems. (C) 2000 Academic Press.
  •  
12.
  • Grahn, Håkan, et al. (författare)
  • Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection
  • 1996
  • Ingår i: Journal of Parallel and Distributed Computing. - San Diego : Academic. - 0743-7315 .- 1096-0848. ; 39:2, s. 168-180
  • Tidskriftsartikel (refereegranskat)abstract
    • Although directory-based write-invalidate cache coherence protocols have a potential to improve the performance of large-scale multiprocessors, coherence misses limit the processor utilization. Therefore, so-called competitive-update protocols-hybrid protocols that on a per-block basis dynamically switch between write-invalidate and write-update-have been considered as a means to reduce the coherence miss rate and have been shown to be a better coherence policy for a wide range of applications. Unfortunately, such protocols may cause high traffic peaks for applications with extensive use of migratory objects. These traffic peaks can offset the performance gain of a reduced miss rate if the network bandwidth is not sufficient. We propose in this study to extend a competitive-update protocol with a previously published adaptive mechanism that can dynamically detect migratory objects and reduce the coherence traffic they cause. Detailed architectural simulations based on five scientific and engineering applications show that this adaptive protocol outperforms a write-invalidate protocol by reducing the miss rate and bandwidth needed by up to 71 and 26%, respectively.
  •  
13.
  • Grahn, Håkan (författare)
  • Transactional Memory
  • 2010
  • Ingår i: Journal of Parallel and Distributed Computing. - : Elsevier. - 0743-7315 .- 1096-0848. ; 70:10, s. 993-1008
  • Tidskriftsartikel (refereegranskat)abstract
    • Current and future processor generations are based on multicore architectures where the performance increase comes from an increasing number of cores on a chip. In order to utilize the performance potential of multicore architectures the programs also need to be parallel, but writing parallel programs is a non-trivial task. Transactional memory tries to ease parallel program development by providing atomic and isolated execution of code sequences, enabling software composability and protected access to shared data. In addition, transactional memory has the ability to execute atomic code sequences in parallel as long as no data conflicts occur. Transactional memory implementation proposals exit for both hardware and software, as well as hybrid solutions. This special issue on transactional memory introduces transactional memory as a concept, presents an overview of some of the most important approaches so far, and finally, includes five articles that advances the state-of-the-art in transactional memory research.
  •  
14.
  • Guo, Yao, et al. (författare)
  • Synchronization coherence : A transparent hardware mechanism for cache coherence and fine-grained synchronization
  • 2008
  • Ingår i: Journal of Parallel and Distributed Computing. - : Elsevier BV. - 0743-7315 .- 1096-0848. ; 68:2, s. 165-181
  • Tidskriftsartikel (refereegranskat)abstract
    • The quest to improve performance forces designers to explore finer-grained multiprocessor machines. Ever increasing chip densities based on CMOS improvements fuel research in highly parallel chip multiprocessors with 100s of processing elements. With such increasing levels of parallelism, synchronization is set to become a major performance bottleneck and efficient support for synchronization an important design criterion. Previous research has shown that integrating support for fine-grained synchronization can have significant performance benefits compared to traditional coarse-grained synchronization. Not much progress has been made in supporting fine-grained synchronization transparently to processor nodes: a key reason perhaps why wide adoption has not followed. In this paper, we propose a novel approach called synchronization coherence that can provide transparent fine-grained synchronization and caching in a multiprocessor machine and single-chip multiprocessor. Our approach merges fine-grained synchronization mechanisms with traditional cache coherence protocols. It reduces network utilization as well as synchronization related processing overheads while adding minimal hardware complexity as compared to cache coherence mechanisms or previously reported fine-grained synchronization techniques. In addition to its benefit of making synchronization transparent to processor nodes, for the applications studied, it provides up to 23% improvement in performance and up to 24% improvement in energy efficiency with no L2 caches compared to previous fine-grained synchronization techniques. The performance improvement increases up to 38% when simulating with an ideal L2 cache system.
  •  
15.
  • Ha, Phuong, 1976, et al. (författare)
  • Self-tuning reactive diffracting trees
  • 2007
  • Ingår i: Journal of Parallel and Distributed Computing. - : Elsevier BV. - 1096-0848 .- 0743-7315. ; 67:6, s. 674-694
  • Tidskriftsartikel (refereegranskat)abstract
    • Reactive diffracting trees are efficient distributed objects that support synchronization, by distributing sets of memory accesses to different memory banks in a coordinated manner. They adjust their size in order to retain their efficiency in the presence of different contention levels. Their adjustment is sensitive to parameters that have to be manually determined after experimentation. Since these parameters depend on the application as well as on the system configuration and load, determining their optimal values is difficult in practice. Moreover, the adjustments are done one level at a time, hence the cost of multi-level adjustments can be high.This paper presents a new method for reactive diffracting trees, without the need of hand-tuned parameters. The new self-tuning trees (ST-trees) can balance, in an online manner, the trade-off between the tree-traversal latency and the latency due to contention on accessing the leaf nodes (i.e. the nodes where the desirable computation takes place). Moreover, the paper presents a data structure that enables the trees to grow or shrink by several levels in one adjustment step. The behavior of the reactive diffracting trees is illustrated in the paper via experiments performed on a well-known ccNUMA multiprocessor system. The experiments study the new self-tuning trees, also in connection with the original hand-tuned reactive diffracting trees. The experiments have showed that the new self-tuning trees are efficient, and that they react in the same way (i.e. select the same tree depth for the same contention level) as the hand-tuned trees, while they are able to adjust quicker than the latter (as they are able to grow or shrink by several levels in one adjustment step).
  •  
16.
  • Ho, Ching-Tien, et al. (författare)
  • An Efficient Algorithm for Gray–to–Binary Permutation on Hypercubes
  • 1994
  • Ingår i: Journal of Parallel and Distributed Computing. - 0743-7315 .- 1096-0848. ; 20:1, s. 114-120
  • Tidskriftsartikel (refereegranskat)abstract
    •  Both Gray code and binary code are frequently used in mapping arrays into hypercube architectures. While the former is preferred when communication between adjacent array elements is needed, the latter is preferred for FFT-type communication. When different phases of computations have different types of communication patterns, the need arises to remap the data. We give a nearly optimal algorithm for permuting data from a Gray code mapping to a binary code mapping on a hypercube with communication restricted to one input and one output channel per node at a time. Our algorithm improves over the best previously known algorithm [6] by nearly a factor of two and is optimal to within a factor of n=(n Gamma 1) with respect to data transfer time on an n-cube. The expected speedup is confirmed by measurements on an Intel iPSC/2 hypercube
  •  
17.
  • Hoepman, Jaap-Henk, et al. (författare)
  • Self-Stabilization in Wait-Free Shared Memory Objects
  • 2002
  • Ingår i: Journal of Parallel and Distributed Computing. - : Elsevier BV. - 1096-0848 .- 0743-7315. ; 62:5, s. 818-842
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper proposes a general definition of self-stabilizing wait-free shared memory objects. The definition ensures that, even in the face of processor failures, every execution after a transient memory failure is linearizable except for an a priori bounded number of actions. Shared registers have been used extensively as communication medium in self-stabilizing protocols. As an application of our theory, we therefore focus on self-stabilizing implementation of such registers, thus providing a large body of previous research with a more solid foundation. In particular, we prove that one cannot construct a self-stabilizing single-reader single-writer regular bit from self-stabilizing single-reader single-writer safe bits, using only a single bit for the writer. This leads us to postulate a self-stabilizing dual-reader single-writer safe bit as the minimal hardware needed to achieve self-stabilizing wait-free interprocess communication and synchronization. Based on this hardware, adaptations of well-known wait-free implementations of regular and atomic shared registers are proven to be self-stabilizing.
  •  
18.
  • Johnsson, Lennart, et al. (författare)
  • Boolean Cube Emulation of Butterfly Networks Encoded by Gray Code
  • 1994
  • Ingår i: Journal of Parallel and Distributed Computing. - 0743-7315 .- 1096-0848. ; 20:3, s. 261-179
  • Tidskriftsartikel (refereegranskat)abstract
    • The authors present algorithms for butterfly emulation on binary-reflected Gray coded data that require the same number of element transfers in sequence in a Boolean cube network as for a binary encoding. The required code conversion is either performed in local memories, or through concurrent exchanges not effecting the number of element transfers in sequence. The emulation of a butterfly network with one or two elements per processor requires n communication cycles on an n-cube. For more than two elements per processor, one additional communication cycle is required for every pair of elements. The encoding on completion can be either binary, or binary reflected Gray code, or any combination thereof, without affecting the communication complexity.
  •  
19.
  • Johnsson, Lennart (författare)
  • Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures
  • 1987
  • Ingår i: Journal of Parallel and Distributed Computing. - 0743-7315 .- 1096-0848. ; 4:2, s. 133-179
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a few algorithms for embedding loops and multidimensional arrays in hypercubes with emphasis on proximity preserving embeddings. A proximity preserving embedding minimizes the need for communication bandwidth in computations requiring nearest neighbor communication. Two storage schemes for "large" problems on "small" machines are suggested and analyzed and algorithms for matrix transpose, multiplying matrices, factoring matrices,  and solving triangular linear systems are presented. A few complete binary tree embeddings are described and analyzed. The data movement in the matrix algorithms is analyzed and it is shown that in the majority of cases the directed routing paths intersect only at nodes of the hypercube allowing for a maximum degree of pipelining
  •  
20.
  • Kennedy, K., et al. (författare)
  • Telescoping languages : A strategy for automatic generation of scientific problem-solving systems from annotated libraries
  • 2001
  • Ingår i: Journal of Parallel and Distributed Computing. - : Elsevier BV. - 0743-7315 .- 1096-0848. ; 61:12, s. 1803-1826
  • Tidskriftsartikel (refereegranskat)abstract
    • As machines and programs have become more complex., the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more labor-intensive. This has substantially widened the software gap the discrepancy between the need for new software and the aggregate capacity of the workforce to produce it. This problem has been compounded by the slow growth of programming productivity, especially for high-performance programs, over the past two decades. One way to bridge this gap is to make it possible for end users to develop programs in high-level domain-specific programming systems. In the past, a major impediment to the acceptance of such systems has been the poor performance of the resulting applications. To address this problem, we are developing a new compiler-based infrastructure, called TeleGen, that will make it practical to construct efficient domain-specific high-level languages from annotated component libraries. We call these languages telescoping languages, because they can be nested within one another. For programs written in telescoping languages. high performance and reasonable compilation times can be achieved by exhaustively analyzing the component libraries in advance to produce a language processor that recognizes and optimizes library operations as primitives in the language. The key to making this strategy practical is to keep compile times low by generating a custom compiler with extensive built-in knowledge of the underlying libraries. The goal is to achieve compile times that tire linearly proportional to the size of the program presented by the user. rather than to the aggregate size of that program plus the base libraries.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 11-20 av 51
Typ av publikation
tidskriftsartikel (51)
Typ av innehåll
refereegranskat (46)
övrigt vetenskapligt/konstnärligt (5)
Författare/redaktör
Bohm, Christian (1)
Kim, S. H. (1)
Kolanoski, H. (1)
Sander, H. G. (1)
Vallecorsa, S. (1)
Koepke, L. (1)
visa fler...
Christov, A. (1)
Schmitz, M. (1)
Boeser, S. (1)
Zarzhitsky, P. (1)
Bai, X. (1)
Kaminsky, B. (1)
Landsman, H. (1)
Kowalski, M. (1)
Kim, D. (1)
Van Eijndhoven, N. (1)
Aartsen, M. G. (1)
Ackermann, M. (1)
Adams, J. (1)
Aguilar, J. A. (1)
Altmann, D. (1)
Arguelles, C. (1)
Auffenberg, J. (1)
Barwick, S. W. (1)
Baum, V. (1)
Bay, R. (1)
Beatty, J. J. (1)
Tjus, J. Becker (1)
Hultqvist, Klas (1)
BenZvi, S. (1)
Berghaus, P. (1)
Berley, D. (1)
Bernardini, E. (1)
Bernhard, A. (1)
Besson, D. Z. (1)
Binder, G. (1)
Bindig, D. (1)
Bissok, M. (1)
Blaufuss, E. (1)
Blumenthal, J. (1)
Boersma, David J. (1)
Bose, D. (1)
Botner, Olga (1)
Brayeur, L. (1)
Bretz, H. -P (1)
Brown, A. M. (1)
Casey, J. (1)
Casier, M. (1)
Chirkin, D. (1)
Christy, B. (1)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (12)
Chalmers tekniska högskola (10)
Blekinge Tekniska Högskola (7)
Uppsala universitet (5)
Luleå tekniska universitet (5)
Umeå universitet (3)
visa fler...
Mälardalens universitet (3)
RISE (3)
Högskolan i Borås (2)
Karlstads universitet (2)
Högskolan i Halmstad (1)
Stockholms universitet (1)
Linköpings universitet (1)
visa färre...
Språk
Engelska (51)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (45)
Teknik (7)
Medicin och hälsovetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy