SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Stefanos Kaxiras) srt2:(2010-2014)"

Sökning: WFRF:(Stefanos Kaxiras) > (2010-2014)

  • Resultat 1-10 av 26
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Cebrian, Juan M., et al. (författare)
  • Efficient inter-core power and thermal balancing for multicore processors
  • 2013
  • Ingår i: Computing. - : Springer Science and Business Media LLC. - 0010-485X .- 1436-5057. ; 95:7, s. 537-566
  • Tidskriftsartikel (refereegranskat)abstract
    • Nowadays the market is dominated by processor architectures that employ multiple cores per chip. These architectures have different behavior depending on the applications running on the processor (parallel, multiprogrammed, sequential), but all happen to meet what is called the power and temperature wall. For future technologies (less than 22 nm) and a fixed die size, it is still uncertain the percentage of processor that can be simultaneously powered on. Power saving and power budget mechanisms can be useful to precisely control the amount of power been dissipated by the processor. After an initial analysis we discover that legacy power saving techniques work properly for matching a power budget in thread-independent and multi-programmed workloads, but not in parallel workloads. When running parallel shared-memory applications sacrificing some performance in a single core (thread) in order to be more energy-efficient can unintentionally delay the rest of cores (threads) due to synchronization points (locks/barriers), having a negative impact on global performance. In order to solve this problem we propose power token balancing (PTB) aimed at accurately matching an external power constraint by balancing the power consumed among the different cores. Experimental results show that PTB matches more accurately a predefined power budget (50 % of the original peak power) than other mechanisms like DVFS. The total energy consumed over the budget is reduced to only 8 % for a 16-core CMP with only a 3 % energy increase (overhead). We also introduce a novel mechanism named "Nitro". Nitro will overclock the core that enters a critical section (delimited by locks) in order to free the lock as soon as possible. Experimental results have shown that Nitro is able to reduce the execution time of lock-intensive applications in more than 4 % by overclocking the frequency by 15 % in selected program phases over a period of time that represents a 22 % of the total execution time. We conclude the work with an analysis of the thermal effects of PTB in different CMP configurations using realistic power numbers and heatsink/fan configurations. Results show how PTB not only balances temperature between the different cores, reducing temperature gradient and increasing signal reliability, but also allows a reduction of 28-30 % of both average and peak temperatures for the studied benchmarks when a peak power budget of 50 % is exceeded.
  •  
2.
  •  
3.
  • Cebrian, Juan M., et al. (författare)
  • Managing power constraints in a single-core scenario through power tokens
  • 2014
  • Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 68:1, s. 414-442
  • Tidskriftsartikel (refereegranskat)abstract
    • Current microprocessors face constant thermal and power-related problems during their everyday use, usually solved by applying a power budget to the processor/core. Dynamic voltage and frequency scaling (DVFS) has been an effective technique that allowed microprocessors to match a predefined power budget. However, the continuous increase of leakage power due to technology scaling along with low resolution of DVFS makes it less attractive as a technique to match a predefined power budget as technology goes to deep-submicron. In this paper, we propose the use of microarchitectural techniques to accurately match a power constraint while maximizing the energy-efficiency of the processor. We will predict the processor power dissipation at cycle level (power token throttling) or at a basic block level (basic block level mechanism), using the dissipated power translated into tokens to select between different power-saving microarchitectural techniques. We also introduce a two-level approach in which DVFS acts as a coarse-grain technique to lower the average power dissipation towards the power budget, while microarchitectural techniques focus on removing the numerous power spikes. Experimental results show that the use of power-saving microarchitectural techniques in conjunction with DVFS is up to six times more precise, in terms of total energy consumed over the power budget, than only using DVFS to match a predefined power budget.
  •  
4.
  •  
5.
  • Davari, Mahdad, et al. (författare)
  • System and method for data classification and efficient virtual cache coherence without reverse translation
  • 2013
  • Patent (populärvet., debatt m.m.)abstract
    • An on-chip memory hierarchy organization for a multicore processing system is disclosed. The hierarchy supports virtual- addressed private caches and a physical-addressed shared cache. The hierarchy classifies cache line data as private or shared to support a one-directional request response protocol. The classification can be determined from the generational behavior of a cache line in the private caches. Cache lines having a single generation in a private cache are Private, and cache lines having overlapping generations in two or more private caches are Shared. The Private or Shared classification is performed dynamically at run-time in hardware using a single translation lookaside buffer at the interface between the private and shared caches. The coherence protocol uses the data classification in a dynamic write policy for both shared data race free data and private data, differentiating in when data is put back to the shared cache based on the classification.
  •  
6.
  •  
7.
  • Goel, Bhavishya, 1981, et al. (författare)
  • Infrastructures for Measuring Power
  • 2011
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • Energy-aware resource management requires some means of measuring power consumption. We present three approaches to measuring processor power. The easiest, least intrusive places a power meter between the system and power outlet. Unfortunately, this provides a single system measurement, and acuity is limited by device sampling frequency. Another method samples power at PSU voltage outputs using current transducers. This logs consumption separately per component, but requires custom hardware and an expensive analog acquisition device. A more accurate alternative samples power directly at the processor voltage regulator’s current-sensing pin, but requires motherboard intrusion. We explain implementation of each approach step-by-step.
  •  
8.
  •  
9.
  • Kaxiras, Stefanos, et al. (författare)
  • A New Perspective for Efficient Virtual-Cache Coherence
  • 2013
  • Ingår i: Proceedings of the 40th Annual International Symposium on Computer Architecture. - New York, NY, USA : ACM. - 9781450320795 ; , s. 535-546
  • Konferensbidrag (refereegranskat)abstract
    • Coherent shared virtual memory (cSVM) is highly coveted for heterogeneous architectures as it will simplify program- ming across different cores and manycore accelerators. In this context, virtual L1 caches can be used to great advan- tage, e.g., saving energy consumption by eliminating address translation for hits. Unfortunately, multicore virtual-cache coherence is complex and costly because it requires reverse translation for any coherence request directed towards a vir- tual L1. The reason is the ambiguity of the virtual address due to the possibility of synonyms. In this paper, we take a radically different approach than all prior work which is focused on reverse translation. We examine the problem from the perspective of the coherence protocol. We show that if a coherence protocol adheres to certain conditions, it operates effortlessly with virtual caches, without requir- ing reverse translations even in the presence of synonyms. We show that these conditions hold in a new class of simple and efficient request-response protocols that use both self- invalidation and self-downgrade.This results in a new solu- tion for virtual-cache coherence, significantly less complex and more efficient than prior proposals. We study design choices for TLB placement under our proposal and compare them against those under a directory-MESI protocol. Our approach allows for choices that are particularly effective as for example combining all per-core TLBs in a single logical TLB in front of the last level cache. Significant area, energy, and performance benefits ensue as a result of simplifying the entire multicore memory organization. 
  •  
10.
  • Kaxiras, Stefanos, et al. (författare)
  • Efficient, snoopless, System-on-Chip coherence
  • 2012
  • Ingår i: SOC Conference (SOCC), 2012 IEEE International. - 9781467312950 ; , s. 230-235
  • Konferensbidrag (refereegranskat)abstract
    • Coherence in a System-on-Chip (SoC) introduces complexity and overhead (snooping caches/directory, state bits, invalidations, etc.) in exchange for a clean and uniform shared memory model. As it is typical today, a SoC comprises a variety of cores with local caches, accelerators with local memories, and some form of shared last-level cache (LLC), all interconnected with shared buses. We propose a very simple coherence protocol, fit for this environment, that eliminates L1 snooping and its associated complexity and costs (power). In essence, we remove all coherence decisions from local caches by simply determining at the LLC whether data are private or shared. This makes a write-through policy a practical and effective alternative to maintain coherence. In the local caches, we dynamically select between writeback for private data, or write-through for shared data. Self-invalidation of the shared data on synchronization points eliminates the need to snoop, with just a data-race-free guarantee from software. Our evaluation shows that this simple protocol outperforms a traditional snooping protocol while at the same time significantly reducing L1, shared cache, and bus energy consumption.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 26

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy