↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Träfflista för sökning "WFRF:(McKee Sally A 1963) srt2:(2007-2009)"

Sökning: WFRF:(McKee Sally A 1963) > (2007-2009)

Resultat 1-7 av 7

Sortera/gruppera träfflistan

Sortering: Träffar per sida:

Numrering	Referens	Omslagsbild	Hitta
1.	Bhadauria, Major, et al. (författare) Accomodating diversity in CMPs with heterogeneous frequencies 2009 Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Berlin, Heidelberg : Springer Berlin Heidelberg. - 1611-3349 .- 0302-9743. - 9783540929895 ; 5409 LNCS, s. 248-262 Konferensbidrag (refereegranskat)abstract Shrinking process technologies and growing chip sizes have profound effects on process variation. This leads to Chip Multiprocessors (CMPs) where not all cores operate at maximum frequency. Instead of simply disabling the slower cores or using guard banding (running all at the frequency of the slowest logic block), we investigate keeping them active, and examine performance and power efficiency of using frequency-heterogeneous CMPs on multithreaded workloads. With uniform workload partitioning, one might intuitively expect slower cores to degrade performance. However, with non-uniform workload partitioning, we find that using both low and high frequency cores improves performance and reduces energy consumption over just running faster cores. Thread scheduling and workload partitioning naturally play significant roles in these improvements. We find that using under-performing cores improves performance by 16% on average and saves CPU energy by up to 16% across the NAS and SPEC-OMP benchmarks on a quad-core AMD platform. Workload balancing via dynamic partitioning yields results within 5% of the overall ideal value. Finally, we show feasible methods to determine at run time whether using a heterogeneous configuration is beneficial. We validate our work through evaluation on a real CMP.
2.	Bhadauria, Major, et al. (författare) Understanding parsec performance on contemporary CMPS 2009 Ingår i: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IISWC 2009. - 9781424451562 ; , s. 98-107 Konferensbidrag (refereegranskat)abstract PARSEC is a reference application suite used in industry and academia to assess new Chip Multiprocessor (CMP) designs. No investigation to date has profiled PARSEC on real hardware to better understand scaling properties and bottlenecks. This understanding is crucial in guiding future CMP designs for these kinds of emerging workloads. We use hardware performance counters, taking a systems-level approach and varying common architectural parameters: number of out-of-order cores, memory hierarchy configu- rations, number of multiple simultaneous threads, number of memory channels, and processor frequencies. We find these programs to be largely compute-bound, and thus lim- ited by number of cores, micro-architectural resources, and cache-to-cache transfers, rather than by off-chip memory or system bus bandwidth. Half the suite fails to scale lin- early with increasing number of threads, and some applica- tions saturate performance at few threads on all platforms tested. Exploiting thread level parallelism delivers greater payoffs than exploiting instruction level parallelism. To re- duce power and improve performance, we recommend in- creasing the number of arithmetic units per core, increasing support for TLP, and reducing support for ILP.
3.	Bronevetsky, Greg, et al. (författare) Compiler-enhanced incremental checkpointing for openMP applications 2009 Ingår i: 23rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2009; Rome; Italy; 23 May 2009 through 29 May 2009. - 9781424437504 Konferensbidrag (refereegranskat)abstract As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety of causes. Checkpointing is a popular technique for tolerating such failures, enablingapplications to periodically save their state and restart computation after a failure. Although a many automated system-level checkpointing solutions are currently availableto HPC users, manual application-level checkpointing remains more popular due to its superior performance. This paper improves performance of automated checkpointing via a compiler analysis for incremental checkpointing.This analysis, which works with both sequential and OpenMP applications, reduces checkpoint sizes by as much as 80% and enables asynchronous checkpointing.
4.	Islam, Mafijul, 1975, et al. (författare) Cancellation of Loads that Return Zero Using Zero-Value Caches 2009 Ingår i: 23rd International Conference on Supercomputing, ICS'09; Yorktown Heights, NY; United States; 8 June 2009 through 12 June 2009. ; , s. 493-494 Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract The speed gap between processor and memory continues to limit performance. To address this problem, we explore the potential of eliminating Zero Loads — loads accessing memory locations that contain the value “zero” — to improve performance and energy dissipation. Our study shows that such loads comprise as many as 18% of the total number of dynamic loads. We show that a significant fraction of zero loads ends up on the critical memory-access path in out-of-order cores. We propose a non-speculative microarchitectural technique — Zero-Value Cache (ZVC) — to capitalize on zero loads and explore critical design options of such caches. We show that with modest investment (typically a 512-byte structure), we can obtain speedups up to 32%. Most importantly, zero-value caches never cause performance loss.
5.	Islam, Mafijul, 1975, et al. (författare) Zero-Value Caches: Cancelling Loads that Return Zero 2009 Ingår i: Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT. - 1089-795X. - 9780769537719 ; , s. 237-245 Rapport (övrigt vetenskapligt/konstnärligt)abstract The speed gap between processor and memory continues to limit performance. To address this problem, we explore the potential of eliminating Zero Loads — loads accessing memory locations that contain the value “zero” — to improve performance and energy dissipation. Our study shows that such loads comprise as many as 18% of the total number of dynamic loads. We show that a significant fraction of zero loads ends up on the critical memory-access path in out-of-order cores. We propose a non-speculative microarchitectural technique — Zero-Value Cache (ZVC) — to capitalize on zero loads and explore critical design options of such caches. We show that with modest investment (typically a 576-byte structure), we can obtain speedups up to 78% and reduce the overall energy dissipation up to 39%. Most importantly, zero-value caches never cause performance loss.
6.	Transactions on HiPEAC 2007 Samlingsverk (redaktörskap) (refereegranskat)
7.	Weaver, Vincent M., et al. (författare) Code density concerns for new architectures 2009 Ingår i: Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors. - 1063-6404. - 9781424450282 ; , s. 459-464 Konferensbidrag (refereegranskat)abstract Reducing a program's instruction count can improve cache behavior and bandwidth utilization, lower power consumption, and increase overall performance. Nonetheless, code density is an often overlooked feature in studying processor architectures. We hand-optimize an assembly language embedded benchmark for size on 21 different instruction set architectures, finding up to a factor of three difference in code sizes from ISA alone. We find that the architectural features that contribute most heavily to code density are instruction length, number of registers, availability of a zero register, bit-width, hardware divide units, number of instruction operands, and the availability of unaligned loads and stores. We extend our results to investigate operating system, compiler, and system library effects on code density. We find that the executable starting address, executable format, and system call interface all affect program size. While ISA effects are important, the efficiency of the entire system stack must be taken into account when developing a new dense instruction set architecture.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Resultat 1-7 av 7

Avgränsa träffmängd

Typ av publikation: konferensbidrag (5); samlingsverk (redaktörskap) (1); rapport (1)

Typ av innehåll: refereegranskat (5); övrigt vetenskapligt/konstnärligt (2)

Författare/redaktör: McKee, Sally A, 1963 (7); Stenström, Per, 1957 (3); Weaver, Vincent M. (3); Bhadauria, Major (2); Islam, Mafijul, 1975 (2); Bronevetsky, Greg (1); visa fler...; Marques, Daniel (1); Pingali, Keshav (1); Rugina, Radu (1); Cintra, Marcelo (1); O'Boyle, Michael (1); Bodin, Francois (1); visa färre...

Lärosäte: Chalmers tekniska högskola (7)

Språk: Engelska (7)

Forskningsämne (UKÄ/SCB): Naturvetenskap (7)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

Copyright © LIBRIS - Nationella bibliotekssystem
LIBRIS.kb.se

pil uppåt

Stäng

Kopiera och spara länken för att återkomma till aktuell vy