SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "LAR1:bth ;lar1:(bth);srt2:(1995-1999);pers:(Stenström Per)"

Sökning: LAR1:bth > Blekinge Tekniska Högskola > (1995-1999) > Stenström Per

  • Resultat 1-5 av 5
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Grahn, Håkan, et al. (författare)
  • Efficient Strategies for Software-Only Directory Protocols in Shared-Memory Multiprocessors
  • 1995
  • Konferensbidrag (refereegranskat)abstract
    • The cost, complexity, and inflexibility of hardware-based directory protocols motivate us to study the performance implications of protocols that emulate directory management using software handlers executed on the compute processors. An important performance limitation of such software-only protocols is that software latency associated with directory management ends up on the critical memory access path for read miss transactions. We propose five strategies that support efficient data transfers in hardware whereas directory management is handled at a slower pace in the background by software handlers. Simulations show that this approach can remove the directory-management latency from the memory access path. Whereas the directory is managed in software, the hardware mechanisms must access the memory state in order to enable data transfers at a high speed. Overall, our strategies reach between 60% and 86% of the hardware-based protocol performance.
  •  
2.
  • Grahn, Håkan, et al. (författare)
  • Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection
  • 1996
  • Ingår i: Journal of Parallel and Distributed Computing. - San Diego : Academic. - 0743-7315 .- 1096-0848. ; 39:2, s. 168-180
  • Tidskriftsartikel (refereegranskat)abstract
    • Although directory-based write-invalidate cache coherence protocols have a potential to improve the performance of large-scale multiprocessors, coherence misses limit the processor utilization. Therefore, so-called competitive-update protocols-hybrid protocols that on a per-block basis dynamically switch between write-invalidate and write-update-have been considered as a means to reduce the coherence miss rate and have been shown to be a better coherence policy for a wide range of applications. Unfortunately, such protocols may cause high traffic peaks for applications with extensive use of migratory objects. These traffic peaks can offset the performance gain of a reduced miss rate if the network bandwidth is not sufficient. We propose in this study to extend a competitive-update protocol with a previously published adaptive mechanism that can dynamically detect migratory objects and reduce the coherence traffic they cause. Detailed architectural simulations based on five scientific and engineering applications show that this adaptive protocol outperforms a write-invalidate protocol by reducing the miss rate and bandwidth needed by up to 71 and 26%, respectively.
  •  
3.
  • Grahn, Håkan, et al. (författare)
  • Implementation and Evaluation of Update-Based Cache Protocols Under Relaxed Memory Consistency Models
  • 1995
  • Ingår i: Future Generation Computer Systems. - Amsterdam : North-Holland. - 0167-739X .- 1872-7115. ; 11:3, s. 247-271
  • Tidskriftsartikel (refereegranskat)abstract
    • The protocols of invalidation-based cache coherence have been extensively studied in the context of large-scale shared-memory multiprocessors. Under a relaxed memory consistency model, most of the write latency can be hidden whereas cache misses still incur a severe performance problem. By contrast, update-based protocols have a potential to reduce both write and read penalties under relaxed memory consistency models because coherence misses can be completely eliminated. This paper compares update- and invalidation-based protocols for their ability to reduce or hide memory access latencies and for their ease of implementation under relaxed memory consistency models.
  •  
4.
  • Grahn, Håkan, et al. (författare)
  • Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques
  • 1997
  • Konferensbidrag (refereegranskat)abstract
    • In both hardware-only and software-only directory protocols the performance is often limited by memory access stall times. To increase the performance, several latency tolerating and reducing techniques have been proposed and shown effective for hardware-only directory protocols. For software-only directory protocols, the efficiency of a technique depends not only on how effective it is as seen by the local processor, but also on how it impacts the software handler execution overhead in the node where a memory block is allocated. Based on architectural simulations and case studies of three techniques, we find that prefetching can degrade the performance of software-only directory protocols due to useless prefetches. A relaxed memory consistency model hides all write latency for software-only directory protocols, but the software handler overhead is virtually unaffected and now constitutes a larger portion of the execution time. Overall, latency tolerating techniques for software-only directory protocols must be chosen with more care than for hardware-only directory protocols.
  •  
5.
  • Stenström, Per, et al. (författare)
  • Boosting the Performance of Shared Memory Multiprocessors
  • 1997
  • Ingår i: Computer. - Long Beach, Calif. : IEEE Computer Society. - 0018-9162 .- 1558-0814. ; 30:7, s. 63-70
  • Tidskriftsartikel (refereegranskat)abstract
    • Shared memory multiprocessors make it practical to convert sequential programs to parallel ones in a variety of applications. An emerging class of shared memory multiprocessors are nonuniform memory access machines with private caches and a cache coherence protocol. Proposed hardware optimizations to CC-NUMA machines can shorten the time processors lose because of cache misses and invalidations. The authors look at cost-performance trade-offs for each of four proposed optimizations: release consistency, adaptive sequential prefetching, migratory sharing detection, and hybrid update/invalidate with a write cache. The four optimizations differ with respect to which application features they attack, what hardware resources they require, and what constraints they impose on the application software. The authors measured the degree of performance improvement using the four optimizations in isolation and in combination, looking at the trade-offs in hardware and programming complexities. Although one combination of the proposed optimizations (prefetching and migratory sharing detection) can boost a sequentially consistent machine to perform as well as a machine with release consistency, release consistency models offer significant performance improvements across a broad application domain at little extra complexity in the machine design. Moreover, a combination of sequential prefetching and hybrid update/invalidate with a write cache cuts the execution time of a sequentially consistent machine by half with fairly modest changes to the second-level cache and the cache protocol. The authors expect that designers will begin to turn more to the release consistency model.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-5 av 5
Typ av publikation
tidskriftsartikel (3)
konferensbidrag (2)
Typ av innehåll
refereegranskat (5)
Författare/redaktör
Grahn, Håkan (5)
Dubois, Michel (2)
Brorsson, Mats (1)
Dahlgren, Fredrik (1)
Lärosäte
Språk
Engelska (5)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (5)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy