SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Davari Mahdad) "

Sökning: WFRF:(Davari Mahdad)

  • Resultat 1-10 av 10
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Besharati, Farshid, et al. (författare)
  • The EVI Distributed Shared Memory System
  • 2015
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • With the data handled by companies and research institutes getting larger and larger every day, there is a clear need for faster computing. At the same time, we have reached the limit of power consumption and more power efficient computing is also called for, both in the datacenter and in the supercomputer room. For that, there is a great push, both in industry and academia, towards increasing the amount of computing power per watt consumed. With this shift towards a different computing paradigm, many older ideas are looked upon in a new light. One of these is the distributed shared memory (DSM) systems. It is becoming harder and harder to achieve higher performance and better power efficiency at the same form factor as we have always had. Furthermore, while we have seen a stop in the constant increase of processor speeds, there is a constant increase in network communication speeds. Software implemented DSM is again a viable solution for high performance computing, without the need for sacrificing ease of programming for performance gains. The goal of this course was to develop such a system, and learn in the process. We chose to work with the Adapteva Parallella boards and design a DSM system there. Over one semester we designed and developed that system.
  •  
2.
  • Davari, Mahdad (författare)
  • Advances Towards Data-Race-Free Cache Coherence Through Data Classification
  • 2017
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Providing a consistent view of the shared memory based on precise and well-defined semantics—memory consistency model—has been an enabling factor in the widespread acceptance and commercial success of shared-memory architectures. Moreover, cache coherence protocols have been employed by the hardware to remove from the programmers the burden of dealing with the memory inconsistency that emerges in the presence of the private caches. The principle behind all such cache coherence protocols is to guarantee that consistent values are read from the private caches at all times.In its most stringent form, a cache coherence protocol eagerly enforces two invariants before each data modification: i) no other core has a copy of the data in its private caches, and ii) all other cores know where to receive the consistent data should they need the data later. Nevertheless, by partly transferring the responsibility for maintaining those invariants to the programmers, commercial multicores have adopted weaker memory consistency models, namely the Total Store Order (TSO), in order to optimize the performance for more common cases.Moreover, memory models with more relaxed invariants have been proposed based on the observation that more and more software is written in compliance with the Data-Race-Free (DRF) semantics. The semantics of DRF software can be leveraged by the hardware to infer when data in the private caches might be inconsistent. As a result, hardware ignores the inconsistent data and retrieves the consistent data from the shared memory. DRF semantics therefore removes from the hardware the burden of eagerly enforcing the strong consistency invariants before each data modification. Instead, consistency is guaranteed only when needed. This results in manifold optimizations, such as reducing the energy consumption and improving the performance and scalability. The efficiency of detecting and discarding the inconsistent data is an important factor affecting the efficiency of such coherence protocols. For instance, discarding the consistent data does not affect the correctness, but results in performance loss and increased energy consumption.In this thesis we show how data classification can be leveraged as an effective tool to simplify the cache coherence based on the DRF semantics. In particular, we introduce simple but efficient hardware-based private/shared data classification techniques that can be used to efficiently detect the inconsistent data, thus enabling low-overhead and scalable cache coherence solutions based on the DRF semantics.
  •  
3.
  • Davari, Mahdad, et al. (författare)
  • An efficient, self-contained, on-chip directory : DIR1-SISD
  • 2015
  • Ingår i: Proc. 24th International Conference on Parallel Architectures and Compilation Techniques. - : IEEE Computer Society. - 9781467395243 ; , s. 317-330
  • Konferensbidrag (refereegranskat)
  •  
4.
  •  
5.
  •  
6.
  • Davari, Mahdad, et al. (författare)
  • System and method for data classification and efficient virtual cache coherence without reverse translation
  • 2013
  • Patent (populärvet., debatt m.m.)abstract
    • An on-chip memory hierarchy organization for a multicore processing system is disclosed. The hierarchy supports virtual- addressed private caches and a physical-addressed shared cache. The hierarchy classifies cache line data as private or shared to support a one-directional request response protocol. The classification can be determined from the generational behavior of a cache line in the private caches. Cache lines having a single generation in a private cache are Private, and cache lines having overlapping generations in two or more private caches are Shared. The Private or Shared classification is performed dynamically at run-time in hardware using a single translation lookaside buffer at the interface between the private and shared caches. The coherence protocol uses the data classification in a dynamic write policy for both shared data race free data and private data, differentiating in when data is put back to the shared cache based on the classification.
  •  
7.
  •  
8.
  •  
9.
  • Davari, Mahdad, et al. (författare)
  • The effects of granularity and adaptivity on private/shared classification for coherence
  • 2015
  • Ingår i: ACM Transactions on Architecture and Code Optimization (TACO). - : Association for Computing Machinery (ACM). - 1544-3566 .- 1544-3973. ; 12:3
  • Tidskriftsartikel (refereegranskat)abstract
    • Classification of data into private and shared has proven to be a catalyst for techniques to reduce coherence cost, since private data can be taken out of coherence and resources can be concentrated on providing coherence for shared data. In this article, we examine how granularity-page-level versus cache-line level- and adaptivity-going from shared to private-affect the outcome of classification and its final impact on coherence. We create a classification technique, called Generational Classification, and a coherence protocol called Generational Coherence, which treats data as private or shared based on cache-line generations. We compare two coherence protocols based on self-invalidation/self-downgrade with respect to data classification. Our findings are enlightening: (i) Some programs benefit from finer granularity, some benefit further from adaptivity, but some do not benefit from either. (ii) Reducing the amount of shared data has no perceptible impact on coherence misses caused by self-invalidation of shared data, hence no impact on performance. (iii) In contrast, classifying more data as private has implications for protocols that employ write-through as a means of self-downgrade, resulting in network traffic reduction-up to 30%-by reducing write-through traffic.
  •  
10.
  • Ros, Alberto, et al. (författare)
  • Hierarchical private/shared classification : The key to simple and efficient coherence for clustered cache hierarchies
  • 2015
  • Ingår i: Proc. 21st International Symposium on High Performance Computer Architecture. - : IEEE Computer Society Digital Library. - 9781479989300 ; , s. 186-197
  • Konferensbidrag (refereegranskat)abstract
    • Hierarchical clustered cache designs are becoming an appealing alternative for multicores. Grouping cores and their caches in clusters reduces network congestion by localizing traffic among several hierarchical levels, potentially enabling much higher scalability. While such architectures can be formed recursively by replicating a base design pattern, keeping the whole hierarchy coherent requires more effort and consideration. The reason is that, in hierarchical coherence, even basic operations must be recursive. As a consequence, intermediate-level caches behave both as directories and as leaf caches. This leads to an explosion of states, protocol-races, and protocol complexity. While there have been previous efforts to extend directory-based coherence to hierarchical designs their increased complexity and verification cost is a serious impediment to their adoption. We aim to address these concerns by encapsulating all hierarchical complexity in a simple function: that of determining when a data block is shared entirely within a cluster (sub-tree of the hierarchy) and is private from the outside. This allows us to eliminate complex recursive operations that span the hierarchy and instead employ simple coherence mechanisms such as self-invalidation and write-through-now restricted to operate within the cluster where a data block is shared. We examine two inclusivity options and discuss the relation of our approach to the recently proposed Hierarchical-Race-Free (HRF) memory models. Finally, comparisons to a hierarchical directory-based MOESI, VIPS-M, and TokenCMP protocols show that, despite its simplicity our approach results in competitive performance and decreased network traffic.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 10

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy