SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "LAR1:bth ;lar1:(bth);srt2:(1995-1999);pers:(Lundberg Lars)"

Sökning: LAR1:bth > Blekinge Tekniska Högskola > (1995-1999) > Lundberg Lars

  • Resultat 1-10 av 18
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Broberg, Magnus, et al. (författare)
  • Performance Optimization using Critical Path Analysis in Multithreaded Programs on Multiprocessors
  • 1999
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • Efficient performance tuning of parallel programs is often hard. Optimization is often done when the program is written as a last effort to increase the performance. With sequential programs each (executed) code segment will affect the total execution time of the program. Thus, any code segment that is optimized in a sequential program will decrease the execution time. In the case of a parallel program executed on a multiprocessor this is not always true. This is due to dependencies between the different threads. As a result, certain code segments of the execution may not affect the total execution time of the program. Thus, optimization of such code segments will not increase the performance. In this paper we present a new approach to perform the optimization phase. Our approach finds the critical path of the multithreaded program and the optimization is only done on those specific code segments of the program. We have implemented the critical path analysis in a performance optimization tool.
  •  
2.
  • Broberg, Magnus, et al. (författare)
  • Visualization and performance prediction of multithreaded Solaris programs by tracing kernel threads
  • 1999
  • Konferensbidrag (refereegranskat)abstract
    • Efficient performance tuning of parallel programs is often hard. We present a performance prediction and visualization tool called VPPB. Based on a monitored uni-processor execution, VPPB shows the (predicted) behaviour of a multithreaded program using any number of processors and the program behaviour is visualized as a graph. The first version of VPPB was unable to handle I/O operations. This version has, by an improved tracing technique, added the possibility to trace activities at the kernel level as well. Thus, VPPB is now able to trace various I/O activities, e.g., manipulation of OS internal buffers, physical disk I/O, socket I/O, and RPC. VPPB allows flexible performance tuning of parallel programs developed for shared memory multiprocessors using a standardized environment; C/C++ programs that lues the thread package in Solaris 2.X.
  •  
3.
  • Broberg, Magnus, et al. (författare)
  • VPPB : A Visualization and Performance Prediction Tool for Multitreaded Solaris Programs
  • 1998
  • Konferensbidrag (refereegranskat)abstract
    • Efficient performance tuning of parallel programs is often hard. In this paper we describe an approach that uses a uni-processor execution of a multithreaded program as reference to simulate a multiprocessor execution. The speed-up is predicted, and the program behaviour is visualized as a graph, which can be used in the performance tuning process. The simulator considers scheduling as well as hardware parameters, e.g., the thread priority, no. of LWPs, and no. of CPUs. The visualization part shows the simulated execution in two graphs: one showing the threads’ behaviour over time and the other the amount of parallel-ism over time. In the first graph is it possible to relate an event in the graph to the code line causing the event. Validation using a Sun multiprocessor with eight processors and five scientific parallel applications shows that the speed-up predictions are within +/-6% of a real execution.
  •  
4.
  • Lennerstad, Håkan, et al. (författare)
  • An Optimal Execution Time Estimate of Static Versus Dynamic Allocation in Multiprocessor Systems
  • 1995
  • Ingår i: SIAM journal on computing (Print). - Philadelphia, Pa. : SIAM publications. - 0097-5397 .- 1095-7111. ; 24:4, s. 751-764
  • Tidskriftsartikel (refereegranskat)abstract
    • Consider a multiprocessor with k identical processors, executing parallel programs consisting of n processes. Let T$-s$/(P) and T$-d$/(P) denote the execution times for the program P with optimal static and dynamic allocations, respectively, i.e., allocations giving minimal execution time. We derive a general and explicit formula for the following maximal execution time ratio: g(n, k) $EQ max T$-s$/(P)/T$-d$/(P), where the maximum is taken over all programs P consisting of n processes. Any interprocess dependency structure for the programs P is allowed only by avoiding deadlock. Overhead for synchronization and reallocation is neglected. Basic properties of the function g(n, k) are established, from which we obtain a global description of the function. Plots of g(n, k) are included. The results are obtained by investigating a mathematical formulation. The mathematical tools involved are essentially tools of elementary combinatorics. The formula is a combinatorial function applied on certain extremal matrices corresponding to extremal programs. It is mathematically complicated but rapidly computed for reasonable n and k, in contrast to the np-completeness of the problems of finding optimal allocations.
  •  
5.
  • Lennerstad, Håkan, et al. (författare)
  • Combinatorics for multiprocessor scheduling optimization and other contexts in computer architecture
  • 1996
  • Konferensbidrag (refereegranskat)abstract
    • The method described consists of two steps. First, unnecessary programs are eliminated through a sequence of program transformations. Second, within the remaining set of programs, sometimes regarded as matrices, those where all possible combinations of synchronizations occur equally frequently are proven to be extremal. At this stage we obtain a formulation which is simple enough to allow explicit formulas to be derived. It turns out that the same method can be used for obtaining worst-case bounds on other NP-hard problems within computer architecture.
  •  
6.
  • Lennerstad, Håkan, et al. (författare)
  • Optimal Scheduling Results for Parallel Computing
  • 1996
  • Ingår i: Applications on advanced architecture computers. - Philadelphia, USA : SIAM. - 0898713684 ; , s. 155-164
  • Bokkapitel (refereegranskat)abstract
    • Load balancing is one of many possible causes of poor performance on parallel machines. If good load balancing of the decomposed algorithm or data is not achieved, much of the potential gain of the parallel algorithm is lost to idle processors. Each of the two extremes of load balancing - static allocation and dynamic allocation - has advantages and disadvantages. This chapter illustrates the relationship between static and dynamic allocation of tasks.
  •  
7.
  • Lennerstad, Håkan, et al. (författare)
  • Optimal Worst Case Formulas Comparing Cache Memory Associativity
  • 1995
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • Consider an arbitrary program $P$ which is to be executed on a computer with two alternative cache memories. The first cache is set associative or direct mapped. It has $k$ sets and $u$ blocks in each set, this is called a (k,u)$-cache. The other is a fully associative cache with $q$ blocks - a $(1,q)$-cache. We present formulas optimally comparing the performance of a $(k,u)$-cache compared to a $(1,q)$-cache for worst case programs. Optimal mappings of the program variables to the cache blocks are assumed. Let $h(P,k,u)$ denote the number of cache hits for the program $P$, when using a $(k,u)$-cache and an optimal mapping of the program variables of $P$ to the cache blocks. We establish an explicit formula for the quantity $$\inf_P \frac{h(P,k,u)}{h(P,1,q)},$$ where the infimum is taken over all programs $P$ which contain $n$ variables. The formula is a function of the parameters $n,k,u$ and $q$ only. We also deduce a formula for the infimum taken over all programs of any number of variables, this formula is a function of $k,u$ and $q$. We further prove that programs which are extremal for this minimum may have any hit ratio, i.e. any ratio $h(P,1,q)/m(P)$. Here $m(P)$ is the total number of memory references for the program P. We assume the commonly used LRU replacemant policy, that each variable can be stored in one memory block, and is free to be stored in any block. Since the problems of finding optimal mappings are NP-hard, the results provide optimal bounds for NP-hard quantities. The results on cache hits can easily be transformed to results on access times for different cache architectures.
  •  
8.
  • Lundberg, Lars, et al. (författare)
  • An Optimal Lower Bound on the Maximal Speedup in Multiprocessors with Clusters
  • 1995
  • Konferensbidrag (refereegranskat)abstract
    • We consider an ideal multiprocessor system with q processors and a centralized scheduler without overhead that select processes from one common pool, permitting dynamic relocation of processes. A parallel program P consisting of n processes is executed on this system and terminates when all processes are completed. Due to synchronizations, processes may be blocked while waiting for events in other processes. The parallel program is executed using some schedule of processes to processors, resulting in a speedup $sigma@. We then consider an ideal multiprocessor with k clusters containing u processors each. In this system processes may not be relocated between clusters. Finding a schedule which results in maximum speedup is NP-hard. Here, we present a formula for the optimal lower bound on the maximum speedup for program P, as a function of q, n, $sigma@, k and u. We also present a formula for the optimal lower bound when the number of processes (n) is unknown. Using these results we are able to decide if a certain schedule is close to optimal or if it is worth-while to look for other schedules. This is demonstrated by evaluating the speedup of a specific schedule of a particular program.
  •  
9.
  • Lundberg, Lars, et al. (författare)
  • Bounding the gain of changing the number of memory modules in shared memory multiprocessors
  • 1997
  • Ingår i: Nordic Journal of Computing. - Finland : Publishing Assoc. Nordic Journal of Comput. - 1236-6064. ; 4:3, s. 233-58
  • Tidskriftsartikel (refereegranskat)abstract
    • We consider a multiprocessor, with p processors and m memory modules. If more than one processor want to access a memory module simultaneously, these accesses will be serialized due to memory contention. The completion time of a program executing on this system is thus affected by the way addresses are mapped onto memory modules, and finding a mapping which results in minimal completion time is NP-hard. If we change the number of memory modules from m to m’, while keeping the number processors constant, we will generally change the minimal completion time. The gain of changing the number of memory modules from m to m’ for a certain program is defined as the ratio between the minimal completion times, using m and m’ modules respectively. Here, we present an optimal upper bound on the maximum gain of changing the number of memory modules, as a function of m, m’ and p, including the case when m’ is infinitely large. The bound is obtained by investigating a mathematical formulation. The mathematical tools involved are essentially elementary combinatorics. The formula for calculating the bound is mathematically complicated but can be rapidly computed for reasonable m, m’ and p. This bound makes it possible to do price-performance trade-offs and to compare any mapping of addresses to memory modules with the optimal case. The results are applicable to different multiprocessor architectures, e.g. systems with crossbar networks and systems with multiple buses. The results also make it possible to calculate optimal performance bounds for multiprocessors containing cache memories: as well as for multiprocessors with no cache memories. Moreover, we discuss how the results can be used for calculating bounds for programs with as well as without synchronizations.
  •  
10.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 18
Typ av publikation
konferensbidrag (10)
tidskriftsartikel (5)
rapport (2)
bokkapitel (1)
Typ av innehåll
refereegranskat (16)
övrigt vetenskapligt/konstnärligt (2)
Författare/redaktör
Lennerstad, Håkan (10)
Grahn, Håkan (3)
Broberg, Magnus (3)
Roos, Marco (1)
Lärosäte
Språk
Engelska (18)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (18)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy