SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Johnsson Lennart) srt2:(1980-1989)"

Sökning: WFRF:(Johnsson Lennart) > (1980-1989)

  • Resultat 1-10 av 47
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Baillie, Clive, et al. (författare)
  • QCD with Dynamical Fermions on the Connection Machine
  • 1989
  • Konferensbidrag (refereegranskat)abstract
    • We have implemented Quantum Chromo-Dynamics (QCD) on the massively parallel Connection Machine in *Lisp. The code uses dynamical Wilson fermions and the Hybrid Monte Carlo Algorithm (HMCA) to update the lattice. We describe our program and give performance measurements for it. With no tuning or optimization, the code runs at approximately 500 to 1000 MFLOPS on a 64-K Connection Machine, model CM-2, depending on the VP ratio.
  •  
2.
  • Chiang, Chao-Lin, et al. (författare)
  • Residue Arithmetic and VLSI
  • 1983
  • Konferensbidrag (refereegranskat)abstract
    • In the residue number system arithmetic is carried out on each digit individually. There is no carry chain. This locality is of particular interest in VLSI. An evaluation of different implementations of residue arithmetic is carried out, and the effects of reduced feature sizes estimated. At the current state of technology the traditional table lookup method is preferable for a range that requires a maximum modulus that is represented by up to 4 bits, while an array of adders offers the best performance fur 7 or more bits. A combination of adders and tables covers 5 and 6 bits the best. At 0.5 mu m feature size table lookup is competitive only up to 3 bits, These conclusions are based on sample designs in nMOS.
  •  
3.
  •  
4.
  • Cohen, D, et al. (författare)
  • Mathematical Approach to Computational Networks
  • 1983
  • Konferensbidrag (refereegranskat)abstract
    • This report deals with design principles for iterative computational networks. Such computational networks are used for performing repetitive computations which typically are not data-dependent. Most of the signal processing algorithms, like FFT and filtering, belong to this class. The main idea in this report is the development of mathematical notation for expressing such designs. This notation captures the important features and properties of these computational networks, and can be used for analyzing, designing, and objectively evaluating computational networks.
  •  
5.
  •  
6.
  • Gerogiannis, D.C, et al. (författare)
  • Histogram Computation on Distributed Memory Architectures
  • 1989
  • Ingår i: Concurrency: Practice and Experience. - : Wiley. - 1040-3108 .- 1096-9128. ; 1:2, s. 219-237
  • Tidskriftsartikel (refereegranskat)abstract
    • One data-independent and one data-dependent algorithm for the computation of image histograms on parallel computers are presented, analysed and implemented on the Connection Machine system CM-2. The data-dependent algorithm has a lower requirement on communication bandwidth by only transferring bins with a non-zero count. Both algorithms perform all-to-all reduction, which is implemented through a sequence of exchanges as defined by a butterfly network. The two algorithms are compared based on predicted and actual performance on the Connection Machine CM-2. With few pixels per processor the data-dependent algorithm requires in the order of √B data transfers for B bins compared to B data transfers for the data-independent algorithm. As the number of pixels per processor grows the advantage of the data-dependent algorithm decreases. The advantage of the data-dependent algorithm increases with the number of bins of the histogram.
  •  
7.
  • Harris, Tim, et al. (författare)
  • Matrix Multiplication on the Connection Machine
  • 1989
  • Konferensbidrag (refereegranskat)abstract
    • A data parallel implementation of the multiplication of matrices of arbitrary shapes and sizes is presented. A systolic algorithm based on a rectangular processor layout is used by the implementation. All processors contain submatrices of the same size for a given operand. Matrix-vector multiplication is used as a primitive for local matrix-matrix multiplication in the Connection Machine system CM-2 implementation. The peak performance of the local matrix-matrix multiplication is in excess of 20 Gflops s-1. The overall algorithm including all required data motion has a peak performance of 5.8 Gflops s-1.
  •  
8.
  • Ho, Ching-Tien, et al. (författare)
  • Algorithms for Matrix Transposition on Boolean Cube Configured Ensemble Architectures
  • 1987
  • Konferensbidrag (refereegranskat)abstract
    • In a multiprocessor with distributed storage the data structures have a significant impact on the communication complexity. In this paper we present a few algorithms for performing matrix transposition on a Boolean $n$-cube. One algorithm performs the transpose in a time proportional to the lower bound both with respect to communication start-ups and to element transfer times. We present algorithms for transposing a matrix embedded in the cube by a binary encoding, a binary-reflected Gray code encoding of rows and columns, or combinations of these two encodings. The transposition of a matrix when several matrix elements are identified to a node by consecutive or cyclic partitioning is also considered and lower bound algorithms given. Experimental data are provided for the Intel iPSC and the Connection Machine.
  •  
9.
  • Ho, Cieng-Tien, et al. (författare)
  • Dilation d Embeddings of a Hyper–Pyramid into a Hypercube
  • 1989
  • Konferensbidrag (refereegranskat)abstract
    • A P(k, d) hyper-pyramid is a level structure of k Boolean cubes where the cube at level i is of dimension id, and a node at level i - 1 connects to every node in a d dimensional Boolean subcube at level i, except for the leaf level k. Hyper-pyramids contain pyramids as proper subgraphs. We show that a P(k, d) hyper-pyramid can be embedded in a Boolean cube with minimal expansion and dilation d. The congestion is bounded from above by 2d+1/d+2 and from below by 1 + 2d-d/kd+1. For P(k, 2) hyper-pyramids we present a dilation 2 and congestion 2 embedding. As a corollary a complete n-ary tree can be embedded in a Boolean cube with dilation max(2, log2n) and expansion 2klog2n + 1/nk+1-1/n-1. We also discuss multiple pyramid embeddings.
  •  
10.
  • Ho, Ching-Tien, et al. (författare)
  • Distributed Routing Algorithms for Broadcasting and Personalized Communication in Hypercubes
  • 1986
  • Konferensbidrag (refereegranskat)abstract
    • High communication bandwidth in standard technologies is more expensive to realize than a high rate of arithmetic or logic operations. The effective utilization of communication resources is crucial for good overall performance in highly concurrent systems. In this paper we address two different communication problems in Boolean n-cube configured multiprocessors: 1) broadcasting, i.e., distribution of common data from a single source to all other nodes, and 2) sending personalized data from a single source to all other nodes. The well known spanning tree algorithm obtained by bit-wise complementation of leading zeroes (referredto as the SBT algorithm for Spanning Binomial nee) is compared with an algorithm using multiple spanning binomial trees (MSBT). The MSBT dgorithm offers a potential speed-up over the SBT dgorithm by afactor of log2 N. We also present a balanced #panning tree algorithm (BST) that offers a lower complexity than the SBT algorithm for Case 2. The potential improvement is by a factor of 3 log2 N. The analysis takes into account the size of the data sets, the communication bandwidth, and the overhead in communication. We also provide some experimental data for the Intel iPSC'd7.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 47

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy