SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1040 3108 OR L773:1096 9128 "

Sökning: L773:1040 3108 OR L773:1096 9128

  • Resultat 1-5 av 5
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Brunschen, C., et al. (författare)
  • OdinMP/CCp - a portable implementation of OpenMP for C
  • 2000
  • Ingår i: Concurrency. - 1040-3108 .- 1096-9128. ; 12:12, s. 1193-1203
  • Tidskriftsartikel (refereegranskat)abstract
    • We describe here the design and performance of OdinMP/CCp, which is a portable compiler for C-programs using the OpenMP directives for parallel processing with shared memory. OdinMP/CCp was written in Java for portability reasons and takes a C-program with OpenMP directives and produces a C-program for POSIX threads. We describe some of the ideas behind the design of OdinMP/CCp and show some performance results achieved on an SGI Origin 2000 and a Sun E10000, Speedup measurements relative to a sequential version of the test programs show that OpenMP programs using OdinMP/CCp exhibit excellent performance on the Sun E10000 and reasonable performance on the Origin 2000,
  •  
2.
  •  
3.
  • Gerogiannis, D.C, et al. (författare)
  • Histogram Computation on Distributed Memory Architectures
  • 1989
  • Ingår i: Concurrency: Practice and Experience. - : Wiley. - 1040-3108 .- 1096-9128. ; 1:2, s. 219-237
  • Tidskriftsartikel (refereegranskat)abstract
    • One data-independent and one data-dependent algorithm for the computation of image histograms on parallel computers are presented, analysed and implemented on the Connection Machine system CM-2. The data-dependent algorithm has a lower requirement on communication bandwidth by only transferring bins with a non-zero count. Both algorithms perform all-to-all reduction, which is implemented through a sequence of exchanges as defined by a butterfly network. The two algorithms are compared based on predicted and actual performance on the Connection Machine CM-2. With few pixels per processor the data-dependent algorithm requires in the order of √B data transfers for B bins compared to B data transfers for the data-independent algorithm. As the number of pixels per processor grows the advantage of the data-dependent algorithm decreases. The advantage of the data-dependent algorithm increases with the number of bins of the histogram.
  •  
4.
  • Kessler, Christoph, 1966-, et al. (författare)
  • Classification and generation of schedules for VLIW processors
  • 2007
  • Ingår i: Concurrency. - : Wiley. - 1040-3108 .- 1096-9128. ; 19, s. 2369-2389
  • Tidskriftsartikel (refereegranskat)abstract
    • We identify and analyze different classes of schedules for instruction-level parallel processor architectures. The classes are induced by various common techniques for generating or enumerating them, such as integer linear programming or list scheduling with backtracking. In particular, we study the relationship between VLIW schedules and their equivalent linearized forms (which may be used, e.g., with superscalar processors), and we identify classes of VLIW schedules that can be created from a linearized form using an in-order VLIW compaction heuristic, which is just the static equivalent of the dynamic instruction dispatch algorithm of in-order issue superscalar processors. We formulate and give a proof of the dominance of greedy schedules for instruction-level parallel architectures where all instructions have multiblock reservation tables, and we show how scheduling anomalies can occur in the presence of instructions with non-multiblock reservation tables. We also show that, in certain situations, certain schedules generally cannot be constructed by incremental scheduling algorithms that are based on topological sorting of the data dependence graph. We also discuss properties of strongly linearizable schedules, out-of-order schedules and non-dawdling schedules, and show their relationships to greedy schedules and to general schedules. We summarize our findings as a hierarchy of classes of VLIW schedules. Finally we provide an experimental evaluation showing the sizes of schedule classes in the above hierarchy, for different benchmarks and example VLIW architectures, including a single-cluster version of the TI C62x DSP processor and variants of that. Our results can sharpen the interpretation of the term optimality used with various methods for optimal VLIW scheduling, and help to identify sets of schedules that can be safely ignored when searching for a time-optimal schedule.
  •  
5.
  • Laure, Erwin, et al. (författare)
  • On the implementation of the opus coordination language
  • 2000
  • Ingår i: Concurrency: Practice and Experience. - 1096-9128. ; 12:4, s. 227-249
  • Tidskriftsartikel (refereegranskat)abstract
    • Opus is a new programming language designed to assist in coordinating the execution of multiple, independent program modules. With the help of Opus, coarse grained tush parallelism between data parallel modules can be expressed in a clean and structured way, In this paper we address the problems of how to build a compilation and runtime support system that can efficiently implement the Opus constructs, Our design considers the often-conflicting goals of efficiency and modular construction through software re-use, In particular, we present the system requirements for an efficient Opus implementation, the Opus runtime system, and describe how they work together to provide the underlying services that the Opus compiler needs for a broad class of machines, Copyright (C) 2000 John Wiley & Sons, Ltd.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-5 av 5

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy