↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Träfflista för sökning "WFRF:(McKee Sally A 1963) srt2:(2015-2018)"

Sökning: WFRF:(McKee Sally A 1963) > (2015-2018)

Resultat 1-10 av 17

Sortera/gruppera träfflistan

Sortering: Träffar per sida:

Numrering	Referens	Omslagsbild	Hitta
1.	Brown, Martin K., et al. (författare) Agave: a Benchmark Suite for Exploring the Complexities of the Android Software Stack 2016 Ingår i: ISPASS 2016 - International Symposium on Performance Analysis of Systems and Software. - 9781509019526 ; 31 May 2016, s. 157-158 Konferensbidrag (refereegranskat)abstract Traditional suites used for benchmarking high-performance computing platforms or for architectural design space exploration use much simpler virtual memory layouts and multitasking/ multithreading schemes, which means that they cannot be used to study the complex interactions among the layers of the Android software stack. To demonstrate this, we present memory reference and concurrency data showing how Android applications differ from traditional C benchmarks. We propose the Agave suite of open-source applications as the basis for a standard, multipurpose Android benchmark suite. We make all sources and tools available in hopes that the community will adopt and build on this initial version of Agave.
2.	Chang, Y. S., et al. (författare) Extending on-chip interconnects for rack-level remote resource access 2016 Ingår i: Proceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016. - 1063-6404. - 9781509051427 ; , s. 56-63 Konferensbidrag (refereegranskat)abstract The need to perform data analytics on exploding data volumes coupled with the rapidly changing workloads in cloud computing places great pressure on data-center servers. To improve hardware resource utilization across servers within a rack, we propose Direct Extension of On-chip Interconnects (DEOI), a high-performance and efficient architecture for remote resource access among server nodes. DEOI extends an SoC server node's on-chip interconnect to access resources in adjacent nodes with no protocol changes, allowing remote memory and network resources to be used as if they were local. Our results on a four-node FPGA prototype show that the latency of user-level, cross-node, random reads to DEOI-connected remote memory is as low as 1.16?s, which beats current commercial technologies. We exploit DEOI remote access to improve performance of the Redis in-memory key-value framework by 47%. When using DEOI to access remote network resources, we observe an 8.4% average performance degradation and only a 2.52?s ping-pong latency disparity compared to using local assets. These results suggest that DEOI can be a promising mechanism for increasing both performance and efficiency in next-generation data-center servers.
3.	Cui, Z. H., et al. (författare) Twin-Load: Bridging the Gap between Conventional Direct-Attached and Buffer-on-Board Memory Systems 2016 Ingår i: Memsys 2016: Proceedings of the International Symposium on Memory Systems. - New York, NY, USA : ACM. - 9781450343053 ; , s. 164-176 Konferensbidrag (refereegranskat)abstract Conventional systems with direct-attached DRAM struggle to meet growing memory capacity demands: the number of channels is limited by pin count, and the number of modules per channel is limited by signal integrity issues. Recent buffer-on-board (BOB) designs move some memory controller functionality to a separate buff er chip, which lets them support larger capacities (by adding more DRAM or denser, non-volatile components). Nonetheless, lower-cost, lower-latency, direct-attached DRAM still represents a better price-performance solution for many applications. Most processors exclusively implement either the direct-attached or the BOB approach. Combining both technologies within one processor has obvious bene fits, but current memory-interface requirements complicate this straightforward solution. The standard DRAM interface is DDR, which requires data to be returned at a fixed latency. In contrast, the BOB interface supports diverse memory technologies precisely because it allows asynchrony. We propose Twin-Load technology to enable one processor to support both direct-attached and BOB memory. We show how to use Twin-Load to support BOB memory over standard DDR interfaces with minimal processor modifications. We build an asynchronous protocol over the existing, synchronous interface by splitting each memory read into twinned loads. The first acts as a prefetch to the buffer chip, and the second asynchronously fetches the data. We describe three methods for generating twinned loads, each leveraging different layers of the system stack.
4.	Dickov, B., et al. (författare) Self-Tuned Software-Managed Energy Reduction in Infiniband Links 2016 Ingår i: 21st IEEE International Conference on Parallel and Distributed Systems, ICPADS 2015, Melbourne, Australia, 14-17 December 2015. - 1521-9097. - 9780769557854 ; , s. 649-657 Konferensbidrag (refereegranskat)abstract One of the biggest challenges in high-performance computing is to reduce the power and energy consumption. Research in energy efficiency has focused mainly on energy consumption at the node level. Less attention has been given to the interconnect, which is becoming a significant source of energy-inefficiency. Although supercomputers undoubtedly require a high-performance interconnect, previous work has shown that network links have low average utilization. It is therefore possible to save energy using low-power modes, but link wake-up latencies must not lead to a loss in performance. This paper proposes the Self-tuned Pattern Prediction System (SPPS), a self-tuned algorithm for energy proportionality, which reduces interconnect energy consumption without needing any application-specific configuration parameters. The algorithm uses prediction to discover repetitive patterns in the application's communication, and it is implemented inside the MPI library, so that existing MPI programs do not need to be modified. We build on previous work, which showed how the application structure can be successfully exploited to predict the communication idle intervals. The previous work, however, required the manual adjustment of a critical idle interval length, whose value depends on the application and has a major effect on energy savings. The new technique automatically discovers the optimal value of this parameter, resulting in a self-tuned algorithm that obtains large interconnect energy savings at little performance cost. We study the effectiveness of our approach using ten real applications and benchmarks. Our simulations show average energy savings in the network links of up to 21%. Moreover, the link wake-up latencies and additional computation times have a negligible effect on performance, with an average penalty less than 1%.
5.	Dong, Jianbo, et al. (författare) Venice: Exploring Server Architectures for Effective Resource Sharing 2016 Ingår i: Proceedings - International Symposium on High-Performance Computer Architecture. - 1530-0897. - 9781467392112 ; 2016-April, s. 507-518 Konferensbidrag (refereegranskat)abstract Consolidated server racks are quickly becoming the backbone of IT infrastructure for science, engineering, and business, alike. These servers are still largely built and organized as when they were distributed, individual entities. Given that many fields increasingly rely on analytics of huge datasets, it makes sense to support flexible resource utilization across servers to improve cost-effectiveness and performance. We introduce Venice, a family of data-center server architectures that builds a strong communication substrate as a first-class resource for server chips. Venice provides a diverse set of resource-joining mechanisms that enables user programs to efficiently leverage non-local resources.To better understand the implications of design decisionsabout system support for resource sharing we have constructed a hardware prototype that allows us to more accurately measure end-to-end performance of at-scale applications and to explore tradeoffs among performance, power, and resource-sharing transparency. We present results from our initial studies analyzing these tradeoffs when sharing memory, accelerators, or NICs. We find that it is particularly important to reduce or hide latency, that data-sharing access patterns should match the features of the communication channels employed, and that inter-channel collaboration can be exploited for better performance.
6.	Goel, Bhavishya, 1981, et al. (författare) A Methodology for Modeling Dynamic and Static Power Consumption 2016 Ingår i: Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016. - 9781509021406 ; , s. 273-282 Konferensbidrag (refereegranskat)abstract System designers and application programmers must consider trade-offs between performance and energy. Making energy-aware decisions when designing an application or runtime system requires quantitative information about power consumed by different processor components. We present a methodology to model static and dynamic power consumption of individual cores and the uncore components, and we validate our power model for both sequential and parallel benchmarks at different voltage-frequency pairs on an Intel Haswell platform.Our power models yield the following insights about energy-efficient scaling. (1) We show that uncore energy accounts for up to 74% of total energy. In particular, uncore static energy can be as high as 61% of total energy, potentially making it a major source of energy inefficiency. (2) We find that the frequency at which an application expends the lowest energy depends on how memory-bound it is. (3) We demonstrate that even though using more cores may improve performance, the energy consumed by stalled cores during serial portions of the program can make using fewer cores more energy-efficient.
7.	Jiang, Tao, et al. (författare) Adapting Memory Hierarchies for Emerging Datacenter Interconnects 2015 Ingår i: Journal of Computer Science and Technology. - : Springer Science and Business Media LLC. - 1860-4749 .- 1000-9000. ; 30:1, s. 97-109 Tidskriftsartikel (refereegranskat)abstract Efficient resource utilization requires that emerging datacenter interconnects support both high performance communication and efficient remote resource sharing. These goals require that the network be more tightly coupled with the CPU chips. Designing a new interconnection technology thus requires considering not only the interconnection itself, but also the design of the processors that will rely on it. In this paper, we study memory hierarchy implications for the design of high-speed datacenter interconnects particularly as they affect remote memory access and we use PCIe as the vehicle for our investigations. To that end, we build three complementary platforms: a PCIe-interconnected prototype server with which we measure and analyze current bottlenecks; a software simulator that lets us model microarchitectural and cache hierarchy changes; and an FPGA prototype system with a streamlined switchless customized protocol Thunder with which we study hardware optimizations outside the processor. We highlight several architectural modifications to better support remote memory access and communication, and quantify their impact and limitations.
8.	Lidman, Jacob, 1985, et al. (författare) Verifying reliability properties using the hyperball abstract domain 2017 Ingår i: ACM Transactions on Programming Languages and Systems. - : Association for Computing Machinery (ACM). - 1558-4593 .- 0164-0925. ; 40:1 Tidskriftsartikel (refereegranskat)abstract Modern systems are increasingly susceptible to soft errors that manifest themselves as bit flips and possibly alter the semantics of an application. We would like to measure the quality degradation on semantics due to such bit flips, and thus we introduce a Hyperball abstract domain that allows us to determine the worst-case distance between expected and actual results. Similar to intervals, hyperballs describe a connected and dense space. The semantics of low-level code in the presence of bit flips is hard to accurately describe in such a space. We therefore combine the Hyperball domain with an existing affine system abstract domain that we extend to handle bit flips, which are introduce as disjunctions. Bit-flips can reduce the precision of our analysis, and we therefor introduce the Scale domain as a disjunctive refinement to minimize precision loss. This domain bounds the number of disjunctive elements by quantifying the over-approximation of different partitions and uses submodular optimization to find a good partitioning (within a bound of optimal).We evaluate these domains to show benefits and potential problems. For the application we examine here, adding the Scale domain to the Hyperball abstraction improves accuracy by up to two orders of magnitude. Our initial results demonstrate the feasibility of this approach, although we would like to further improve execution efficiency.
9.	Radulovic, M., et al. (författare) Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC? 2015 Ingår i: ACM International Conference Proceeding Series - Proceedings of the 1st International Symposium on Memory Systems, MEMSYS 2015, Washington, United States, 14-15 August 2015. - New York, NY, USA : ACM. - 9781450336048 ; 05-08-October-2015, s. 31-36 Konferensbidrag (refereegranskat)abstract First defined two decades ago, the memory wall remains a fundamental limitation to system performance. Recent innovations in 3D-stacking technology enable DRAM devices with much higher bandwidths than traditional DIMMs. The first such products will soon hit the market, and some of the publicity claims that they will break through the memory wall. Here we summarize our analysis and expectations of how such 3D-stacked DRAMs will affect the memory wall for a set of representative HPC applications. We conclude that although 3D-stacked DRAM is a major technological innovation, it cannot eliminate the memory wall.
10.	Sanchez, Carlos, et al. (författare) Redesigning a tagless access buffer to require minimal ISA changes 2016 Ingår i: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2016. - New York, NY, USA : ACM. - 9781450321389 - 9781450344821 ; , s. Article number 2968504- Konferensbidrag (refereegranskat)abstract Energy efficiency is a first-order design goal for nearly all classes of processors, but it is particularly important in mobile and embedded systems. Data caches in such systems account for a large portion of the processor's energy usage, and thus techniques to improve the energy efficiency of the cache hierarchy are likely to have high impact. Our prior work reduced data cache energy via a tagless access buffer (TAB) that sits at the top of the cache hierarchy. Strided memory references are redirected from the level-one data cache (L1D) to the smaller, more energy-efficient TAB. These references need not access the data translation lookaside buffer (DTLB), and they can avoid unnecessary transfers from lower levels of the memory hierarchy. The original TAB implementation requires changing the immediate field of load and store instructions, necessitating substantial ISA modifications. Here we present a new TAB design that requires minimal instruction set changes, gives software more explicit control over TAB resource management, and remains compatible with legacy (non-TAB) code. With a line size of 32 bytes, a four-line TAB can eliminate 31% of L1D accesses, on average. Together, the new TAB, L1D, and DTLB use 22% less energy than a TAB-less hierarchy, and the TAB system decreases execution time by 1.7%.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Resultat 1-10 av 17

Avgränsa träffmängd

Typ av publikation: konferensbidrag (14); tidskriftsartikel (3)

Typ av innehåll: refereegranskat (17)

Författare/redaktör: McKee, Sally A, 1963 (17); Zhang, L. (4); Zhang, Lixin (4); Chen, M. Y. (4); Hou, Rui (4); Dong, Jianbo (3); visa fler...; Wang, L (2); Zhang, K. (2); Shin, H. (2); Jiang, Tao (2); Chang, Y. S. (2); Zhan, J. (2); Liu, X (1); Liu, K. (1); Zhang, J. (1); Larsson-Edefors, Per ... (1); Yu, L (1); Zhao, R. (1); Goel, Bhavishya, 198 ... (1); Pericas, Miquel, 197 ... (1); Xie, B. (1); Själander, Magnus, 1 ... (1); Gavin, Peter (1); Whalley, David (1); Wei, Wei (1); Zhang, H.-X. (1); Xu, Zhiwei (1); Brown, Martin K. (1); Yannes, Zachary (1); Lustig, Michael (1); Sanati, Mazdak (1); Tyson, Gary S., 1963 (1); Reinhardt, Steven K. (1); Fan, J (1); Ruan, Y (1); Sanchez, Carlos (1); Ren, Liqiang (1); Lu, T. Y. (1); Cui, Z. H. (1); Pan, H. Y. (1); Tian, B (1); Dickov, B. (1); Carpenter, P.M. (1); Huang, Michael (1); Zhao, Boyan (1); Wang, Haibin (1); Cui, Xiaosong (1); Jia, Z (1); Son, J. (1); Chai, L. (1); visa färre...

Lärosäte: Chalmers tekniska högskola (17); Uppsala universitet (1)

Språk: Engelska (17)

Forskningsämne (UKÄ/SCB): Naturvetenskap (16); Teknik (3)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

Copyright © LIBRIS - Nationella bibliotekssystem
LIBRIS.kb.se

pil uppåt

Stäng

Kopiera och spara länken för att återkomma till aktuell vy