↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Träfflista för sökning "WFRF:(Stefanos Kaxiras) srt2:(2010-2014)"

Sökning: WFRF:(Stefanos Kaxiras) > (2010-2014)

Resultat 1-10 av 26

Sortera/gruppera träfflistan

Sortering: Träffar per sida:

Numrering	Referens	Omslagsbild	Hitta
1.	Cebrian, Juan M., et al. (författare) Efficient inter-core power and thermal balancing for multicore processors 2013 Ingår i: Computing. - : Springer Science and Business Media LLC. - 0010-485X .- 1436-5057. ; 95:7, s. 537-566 Tidskriftsartikel (refereegranskat)abstract Nowadays the market is dominated by processor architectures that employ multiple cores per chip. These architectures have different behavior depending on the applications running on the processor (parallel, multiprogrammed, sequential), but all happen to meet what is called the power and temperature wall. For future technologies (less than 22 nm) and a fixed die size, it is still uncertain the percentage of processor that can be simultaneously powered on. Power saving and power budget mechanisms can be useful to precisely control the amount of power been dissipated by the processor. After an initial analysis we discover that legacy power saving techniques work properly for matching a power budget in thread-independent and multi-programmed workloads, but not in parallel workloads. When running parallel shared-memory applications sacrificing some performance in a single core (thread) in order to be more energy-efficient can unintentionally delay the rest of cores (threads) due to synchronization points (locks/barriers), having a negative impact on global performance. In order to solve this problem we propose power token balancing (PTB) aimed at accurately matching an external power constraint by balancing the power consumed among the different cores. Experimental results show that PTB matches more accurately a predefined power budget (50 % of the original peak power) than other mechanisms like DVFS. The total energy consumed over the budget is reduced to only 8 % for a 16-core CMP with only a 3 % energy increase (overhead). We also introduce a novel mechanism named "Nitro". Nitro will overclock the core that enters a critical section (delimited by locks) in order to free the lock as soon as possible. Experimental results have shown that Nitro is able to reduce the execution time of lock-intensive applications in more than 4 % by overclocking the frequency by 15 % in selected program phases over a period of time that represents a 22 % of the total execution time. We conclude the work with an analysis of the thermal effects of PTB in different CMP configurations using realistic power numbers and heatsink/fan configurations. Results show how PTB not only balances temperature between the different cores, reducing temperature gradient and increasing signal reliability, but also allows a reduction of 28-30 % of both average and peak temperatures for the studied benchmarks when a peak power budget of 50 % is exceeded.
2.	Cebrián, Juan M., et al. (författare) Leakage-efficient design of value predictors through state and non-state preserving techniques 2011 Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 55:1, s. 28-50 Tidskriftsartikel (refereegranskat)
3.	Cebrian, Juan M., et al. (författare) Managing power constraints in a single-core scenario through power tokens 2014 Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 68:1, s. 414-442 Tidskriftsartikel (refereegranskat)abstract Current microprocessors face constant thermal and power-related problems during their everyday use, usually solved by applying a power budget to the processor/core. Dynamic voltage and frequency scaling (DVFS) has been an effective technique that allowed microprocessors to match a predefined power budget. However, the continuous increase of leakage power due to technology scaling along with low resolution of DVFS makes it less attractive as a technique to match a predefined power budget as technology goes to deep-submicron. In this paper, we propose the use of microarchitectural techniques to accurately match a power constraint while maximizing the energy-efficiency of the processor. We will predict the processor power dissipation at cycle level (power token throttling) or at a basic block level (basic block level mechanism), using the dissipated power translated into tokens to select between different power-saving microarchitectural techniques. We also introduce a two-level approach in which DVFS acts as a coarse-grain technique to lower the average power dissipation towards the power budget, while microarchitectural techniques focus on removing the numerous power spikes. Experimental results show that the use of power-saving microarchitectural techniques in conjunction with DVFS is up to six times more precise, in terms of total energy consumed over the power budget, than only using DVFS to match a predefined power budget.
4.	Cebrián, Juan M., et al. (författare) Power Token Balancing : Adapting CMPs to power constraints for parallel multithreaded workloads 2011 Ingår i: Proc. 25th International Parallel and Distributed Processing Symposium. - Piscataway, NJ : IEEE. - 9781612843728 ; , s. 431-442 Konferensbidrag (refereegranskat)
5.	Davari, Mahdad, et al. (författare) System and method for data classification and efficient virtual cache coherence without reverse translation 2013 Patent (populärvet., debatt m.m.)abstract An on-chip memory hierarchy organization for a multicore processing system is disclosed. The hierarchy supports virtual- addressed private caches and a physical-addressed shared cache. The hierarchy classifies cache line data as private or shared to support a one-directional request response protocol. The classification can be determined from the generational behavior of a cache line in the private caches. Cache lines having a single generation in a private cache are Private, and cache lines having overlapping generations in two or more private caches are Shared. The Private or Shared classification is performed dynamically at run-time in hardware using a single translation lookaside buffer at the interface between the private and shared caches. The coherence protocol uses the data classification in a dynamic write policy for both shared data race free data and private data, differentiating in when data is put back to the shared cache based on the classification.
6.	Davari, Mahdad, et al. (författare) The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence 2014 Konferensbidrag (refereegranskat)
7.	Goel, Bhavishya, 1981, et al. (författare) Infrastructures for Measuring Power 2011 Rapport (övrigt vetenskapligt/konstnärligt)abstract Energy-aware resource management requires some means of measuring power consumption. We present three approaches to measuring processor power. The easiest, least intrusive places a power meter between the system and power outlet. Unfortunately, this provides a single system measurement, and acuity is limited by device sampling frequency. Another method samples power at PSU voltage outputs using current transducers. This logs consumption separately per component, but requires custom hardware and an expensive analog acquisition device. A more accurate alternative samples power directly at the processor voltage regulator’s current-sensing pin, but requires motherboard intrusion. We explain implementation of each approach step-by-step.
8.	Jimborean, Alexandra, et al. (författare) Fix the code. Don't tweak the hardware : A new compiler approach to Voltage–Frequency scaling 2014 Ingår i: Proc. 12th International Symposium on Code Generation and Optimization. - New York : ACM Press. - 9781450326704 ; , s. 262-272 Konferensbidrag (refereegranskat)
9.	Kaxiras, Stefanos, et al. (författare) A New Perspective for Efficient Virtual-Cache Coherence 2013 Ingår i: Proceedings of the 40th Annual International Symposium on Computer Architecture. - New York, NY, USA : ACM. - 9781450320795 ; , s. 535-546 Konferensbidrag (refereegranskat)abstract Coherent shared virtual memory (cSVM) is highly coveted for heterogeneous architectures as it will simplify program- ming across different cores and manycore accelerators. In this context, virtual L1 caches can be used to great advan- tage, e.g., saving energy consumption by eliminating address translation for hits. Unfortunately, multicore virtual-cache coherence is complex and costly because it requires reverse translation for any coherence request directed towards a vir- tual L1. The reason is the ambiguity of the virtual address due to the possibility of synonyms. In this paper, we take a radically different approach than all prior work which is focused on reverse translation. We examine the problem from the perspective of the coherence protocol. We show that if a coherence protocol adheres to certain conditions, it operates effortlessly with virtual caches, without requir- ing reverse translations even in the presence of synonyms. We show that these conditions hold in a new class of simple and efficient request-response protocols that use both self- invalidation and self-downgrade.This results in a new solu- tion for virtual-cache coherence, significantly less complex and more efficient than prior proposals. We study design choices for TLB placement under our proposal and compare them against those under a directory-MESI protocol. Our approach allows for choices that are particularly effective as for example combining all per-core TLBs in a single logical TLB in front of the last level cache. Significant area, energy, and performance benefits ensue as a result of simplifying the entire multicore memory organization.
10.	Kaxiras, Stefanos, et al. (författare) Efficient, snoopless, System-on-Chip coherence 2012 Ingår i: SOC Conference (SOCC), 2012 IEEE International. - 9781467312950 ; , s. 230-235 Konferensbidrag (refereegranskat)abstract Coherence in a System-on-Chip (SoC) introduces complexity and overhead (snooping caches/directory, state bits, invalidations, etc.) in exchange for a clean and uniform shared memory model. As it is typical today, a SoC comprises a variety of cores with local caches, accelerators with local memories, and some form of shared last-level cache (LLC), all interconnected with shared buses. We propose a very simple coherence protocol, fit for this environment, that eliminates L1 snooping and its associated complexity and costs (power). In essence, we remove all coherence decisions from local caches by simply determining at the LLC whether data are private or shared. This makes a write-through policy a practical and effective alternative to maintain coherence. In the local caches, we dynamically select between writeback for private data, or write-through for shared data. Self-invalidation of the shared data on synchronization points eliminates the need to snoop, with just a data-race-free guarantee from software. Our evaluation shows that this simple protocol outperforms a traditional snooping protocol while at the same time significantly reducing L1, shared cache, and bus energy consumption.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Resultat 1-10 av 26

Avgränsa träffmängd

Typ av publikation: konferensbidrag (17); tidskriftsartikel (4); rapport (3); bok (1); patent (1)

Typ av innehåll: refereegranskat (20); övrigt vetenskapligt/konstnärligt (5); populärvet., debatt m.m. (1)

Författare/redaktör: Kaxiras, Stefanos (25); Spiliopoulos, Vasile ... (8); Keramidas, Georgios (7); Ros, Alberto (5); Cebrian, Juan M. (4); Aragon, Juan L. (4); visa fler...; Black-Schaffer, Davi ... (3); Koukos, Konstantinos (3); Själander, Magnus, 1 ... (2); Davari, Mahdad (2); Sanchez, Daniel (2); Petoumenos, Pavlos (2); Själander, Magnus (1); Goel, Bhavishya, 198 ... (1); McKee, Sally A, 1963 (1); Hagersten, Erik (1); Hansson, Andreas (1); Shariati Nilsson, Ni ... (1); Jimborean, Alexandra (1); García, José M. (1); Ros, A. (1); Spiliopoulos, Vasile ... (1); Keramidas, Georgios, ... (1); Kaxiras, Stefanos, 1 ... (1); Efstathiou, Konstant ... (1); Sembrant, Andreas, 1 ... (1); Psychou, Georgia (1); Cebrián Gonzalez, Ju ... (1); Aragón, Juan Luis (1); Martonosi, Margaret (1); Bagdia, Akash (1); Aldworth, Peter (1); Efstathiou, Konstant ... (1); Spiliopoulos, Vasile ... (1); Strikos, Nikolaos (1); visa färre...

Lärosäte: Uppsala universitet (25); Chalmers tekniska högskola (1)

Språk: Engelska (26)

Forskningsämne (UKÄ/SCB): Naturvetenskap (16); Teknik (8)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

Copyright © LIBRIS - Nationella bibliotekssystem
LIBRIS.kb.se

pil uppåt

Stäng

Kopiera och spara länken för att återkomma till aktuell vy