↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Träfflista för sökning "LAR1:uu ;mspu:(report);pers:(Hagersten Erik)"

Sökning: LAR1:uu > Rapport > Hagersten Erik

Resultat 1-10 av 31

Sortera/gruppera träfflistan

Sortering: Träffar per sida:

Numrering	Referens	Omslagsbild	Hitta
1.	Berg, Erik, et al. (författare) A Statistical Multiprocessor Cache Model 2005 Rapport (övrigt vetenskapligt/konstnärligt)abstract The introduction of general purpose microprocessors running multiple threads will put a focus on methods and tools helping a programmer to write efficient parallel applications. Such tool should be fast enough to meet a software developer's need for short turn-around time, but also accurate and flexible enough to provide trend-correct and intuitive feedback. This paper describes an efficient and flexible approach for modeling the memory system of a multiprocessor, such as those of chip multiprocessors (CMPs). Sparse data is sampled during a multithreaded execution. The data collected consist of the reuse distance and invalidation distribution for a small subset of the memory accesses. Based on the sampled data from a single run, a new mathematical formula can be used to estimate the miss rate for a memory hierarchy built from caches of arbitrarily size, cacheline size and degree of sharing. The formula further divides the misses into six categories to further aid the software developer. The method is evaluated using a large number of commercial and technical multithreaded applications. The result produced by our algorithm fed with sparse sampling data is shown to be consistent with results gathered during traditional architecture simulation.
2.	Berg, Erik, et al. (författare) Efficient Data-Locality Analysis of Long-Running Applications 2004 Rapport (övrigt vetenskapligt/konstnärligt)abstract Analysis of application data cache behavior is important for program optimization and architectural design decisions. Current methods include hardware monitoring and simulation, but these methods lack from either limited flexibility or large run-time overhead that prevents realistic workloads. This paper describes a new fast and flexible tool based on StatCache. This tool is based on a probabilistic cache model instead of a functional cache simulator and use sparsely sampled run-time information instead of complete traces or sampled contiguous subtraces. A post-run analyzer calculates miss ratios of fully associative caches of arbitrary size and cache line size, from statistics gathered at a single run. It can also produce various data-locality metrics and give data-structure centric data-locality figures. The implementation utilizes simple-hardware and operating-system support available in most operating systems and runs uninstrumented optimized code. We evaluate the method using the SPEC benchmark suite using the largest (ref) input sets and show that the accuracy is high. We also show the run-time overhead for this flexible “cache simulator” to be less than 20% for long-running applications, much faster than current simulators.
3.	Berg, Erik, et al. (författare) Low-Overhead Spatial and Temporal Data Locality Analysis 2003 Rapport (övrigt vetenskapligt/konstnärligt)abstract Performance is getting increasingly sensitive to cache behavior because of the growing gap between processor cycle time and memory latency. To improve performance, applications need to be optimized for data locality. Run-time analysis of spatial and temporal data locality can be used to facilitate this and should help both manual tuning and feedback-based compiler optimizations. Identifying cache behavior of individual data structures further enhances the optimization process. Current methods to perform such analysis include simulation combined with set sampling or time sampling, and hardware monitoring. Sampling often suffers from either poor accuracy or large run-time overhead, while hardware measurements have limited flexibility.We present DLTune, a prototype tool that performs spatial and temporal data-locality analysis in run time. It measures both spatial and temporal locality for the entire application and individual data structures in a single run, and effectively exposes poor data locality based on miss ratio estimates of fully-associative caches. The tool is based on an elaborate and novel sampling technique that allows all information to be collected in a single run with an overall sampling rate as low as one memory reference in ten million and an average slowdown below five on large workloads.
4.	Berg, Erik, et al. (författare) StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis 2003 Rapport (övrigt vetenskapligt/konstnärligt)abstract The widening memory gap reduces performance of applications with poor data locality. This problem can be analyzed using working-set graphs. Current methods to generate such graphs include set sampling and time sampling, but cold start effects and unrepresentative set selection impair accuracy.In this paper we present StatCache, a novel sample-based method that can perform data-locality analysis on realistic workloads. During the execution of an application, sparse discrete memory accesses are sampled, and their reuse distances are measured using a simple watchpoint mechanism. StatCache uses the information collected from a single run to accurately estimate miss ratios of fully-associative caches of arbitrary sizes and generates working-set graphs.We evaluate StatCache using the SPEC CPU2000 benchmarks and show that StatCache gives accurate results with a sampling rate as low as 10-4. We also provide a proof-of-concept implementation, and discuss potentially very fast implementation alternatives.
5.	Davari, Mahdad, et al. (författare) Scope-Aware Classification : Taking the hierarchical private/shared data classification to the next level 2017 Rapport (övrigt vetenskapligt/konstnärligt)
6.	Davari, Mahdad, et al. (författare) The best of both works : A hybrid data-race-free cache coherence scheme 2017 Rapport (övrigt vetenskapligt/konstnärligt)
7.	Eklöv, David, et al. (författare) A Profiling Method for Analyzing Scalability Bottlenecks on Multicores 2012 Rapport (övrigt vetenskapligt/konstnärligt)abstract A key goodness metric of multi-threaded programs is how their execution times scale when increasing the number of threads. However, there are several bottlenecks that can limit the scalability of a multi-threaded program, e.g., contention for shared cache capacity and off-chip memory bandwidth; and synchronization overheads. In order to improve the scalability of a multi-threaded program, it is vital to be able to quantify how the program is impacted by these scalability bottlenecks. We present a software-only profiling method for obtaining ıt speedup stacks. A speedup stack reports how much each scalability bottleneck limits the scalability of a multi-threaded program. It thereby quantifies how much its scalability can be improved by eliminating a given bottleneck. A software developer can use this information to determine what optimizations are most likely to improve scalability, while a computer architect can use it to analyze the resource demands of emerging workloads. The proposed method profiles the program on real commodity multi-cores (i.e., no simulations required) using existing performance counters. Consequently, the obtained speedup stacks accurately account for all idiosyncrasies of the machine on which the program is profiled. While the main contribution of this paper is the profiling method to obtain speedup stacks, we present several examples of how speedup stacks can be used to analyze the resource requirements of multi-threaded programs. Furthermore, we discuss how their scalability can be improved by both software developers and computer architects.
8.	Eklöv, David, et al. (författare) Cache Pirating : Measuring the curse of the shared cache 2011 Ingår i: Proc. 40th International Conference on Parallel Processing. - : IEEE Computer Society. - 9781457713361 ; , s. 165-175 Rapport (övrigt vetenskapligt/konstnärligt)
9.	Eklöv, David, et al. (författare) Design and Evaluation of the Bandwidth Bandit 2012 Rapport (övrigt vetenskapligt/konstnärligt)abstract Applications that are co-scheduled on a multicore compete for shared resources, such as cache capacity and memory bandwidth. The performance degradation resulting from this contention can be substantial, which makes it important to effectively manage these shared resources. This, however, requires an understanding of how applications are impacted by such contention. While the effects of contention for cache capacity have been extensively studied, less is known about the effects of contention for memory bandwidth. This is in large due to its complex nature, as sensitivity to bandwidth contention depends on bottlenecks at several levels of the memory-system, the interaction and locality properties of the application s access stream. This paper explores the contention effects of increased latency and decreased memory parallelism at different points in the memory hierarchy, both of which cause decreases in available bandwidth. To understand the impact of such contention on applications, it also presents a method whereby an application s overall sensitivity to different degrees of bandwidth contention can be directly measured. This method is used to demonstrate the varying contention sensitivity across a selection of benchmarks, and explains why some of them experience substantial slowdowns long before the overall memory bandwidth saturates.
10.	Eklöv, David, et al. (författare) StatCC: Design and Evaluation 2010 Rapport (övrigt vetenskapligt/konstnärligt)abstract This work presents StatCC, a simple and efficient model for estimating the shared cache miss ratios of co-scheduled applications on architectures with a hierarchy of private and shared caches. StatCC leverages the StatStack cache model to estimate the co-scheduled applications' cache miss ratios from their individual memory reuse distance distributions, and a simple performance model that estimates their CPIs based on the shared cache miss ratios. These methods are combined into a system of equations that explicitly models the CPIs in terms of the shared miss ratios and can be solved to determine both. The result is a fast algorithm with a 2% error across the SPEC CPU2006 benchmark suite compared to a simulated in-order processor and a hierarchy of private and shared caches.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Resultat 1-10 av 31

Avgränsa träffmängd

Typ av publikation: rapport (31)Ta bort avgränsningen

Typ av innehåll: övrigt vetenskapligt/konstnärligt (30); populärvet., debatt m.m. (1)

Författare/redaktör: Hagersten, Erik (31)Ta bort avgränsningen; Zeffer, Håkan (7); Radovic, Zoran (7); Nikoleris, Nikos (6); Black-Schaffer, Davi ... (5); Eklöv, David (5); visa fler...; Karlsson, Martin (4); Berg, Erik (4); Kaxiras, Stefanos (3); Grenholm, Oskar (3); Wallin, Dan (3); Carlson, Trevor E. (2); Davari, Mahdad (2); Khan, Muneeb (2); Löf, Henrik (2); Spjuth, Mathias (2); Sandberg, Andreas (1); Holmgren, Sverker (1); Sandberg, Andreas, 1 ... (1); Sembrant, Andreas (1); Spiliopoulos, Vasile ... (1); Keramidas, Georgios (1); visa färre...

Lärosäte: Uppsala universitet (31)

Språk: Engelska (31)

Forskningsämne (UKÄ/SCB): Naturvetenskap (17); Teknik (3)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

Copyright © LIBRIS - Nationella bibliotekssystem
LIBRIS.kb.se

pil uppåt

Stäng

Kopiera och spara länken för att återkomma till aktuell vy