SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:0163 5964 OR L773:1943 5851 "

Sökning: L773:0163 5964 OR L773:1943 5851

  • Resultat 1-10 av 13
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Bengtsson, Jerker, et al. (författare)
  • A Domain-specific Approach for Software Development on Manycore Platforms
  • 2008
  • Ingår i: SIGARCH Computer Architecture News. - New York : ACM Press. - 0163-5964 .- 1943-5851. ; 36:5, s. 2-10
  • Tidskriftsartikel (refereegranskat)abstract
    • The programming complexity of increasingly parallel processors calls for new tools that assist programmers in utilising the parallel hardware resources. In this paper we present a set of models that we have developed as part of a tool for mapping dataflow graphs onto manycores. One of the models captures the essentials of manycores identified as suitable for signal processing, and which we use as target for our algorithms. As an intermediate representation we introduce timed configuration graphs, which describe the mapping of a model of an application onto a machine model. Moreover, we show how a timed configuration graph by very simple means can be evaluated using an abstract interpretation to obtain performance feedback. This information can be used by our tool and by the programmer in order to discover improved mappings.
  •  
2.
  • Fang, Huan, et al. (författare)
  • Scalable directory architecture for distributed shared memory chip multiprocessors
  • 2008
  • Ingår i: SIGARCH Computer Architecture News. - : ACM Press. - 0163-5964 .- 1943-5851. ; 36:5, s. 56-64
  • Tidskriftsartikel (refereegranskat)abstract
    • Traditional Directory-based cache coherence protocol is far from optimal for large-scale cache coherent shared memory multiprocessors due to the increasing latency to access directories stored in DRAM memory. Instead of keeping directories in main memory, we consider distributing the directory together with L2 cache across all nodes on a Chip Multiprocessor. Each node contains a processing unit, a private L1 cache, a slice of L2 cache, memory controller and a router. Both L2 cache and memories are distributed shared and interleaved by a subset of memory address bits. All nodes are interconnected through a low latency two dimensional Mesh network. Directory, being a split component to L2 cache, only stores sharing information for blocks while L2 cache stores only data blocks exclusive with L1 cache. Shared L2 cache can increase total effective cache capacity on chip, but also increase the miss latency when data is on a remote node. Being different from Directory Cache structure, our proposal totally removes the directory from memory, which saves memory space and reduces access latency. Compared to L2 cache that combines directory information internally, our L2 cache structure saves up to 88% cache space and achieves similar performance.
  •  
3.
  • Grahn, Håkan (författare)
  • Introduction to the Special Issue on the First Swedish Workshop on Multi-Core Computing
  • 2008
  • Ingår i: SIGARCH Computer Architecture News. - : ACM. - 0163-5964 .- 1943-5851. ; 36:5, s. 1-1
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • Multicore processors have become the main computing platform for current and future computer systems. This calls for a forum to discuss the challenges and opportunities of both designing and using multicore systems. The objective of this workshop was to bring together researchers and practitioners from academia and industry to present and discuss the recent work in the area of multicore computing. The workshop was the first of its kind in Sweden, and was co-organized by Blekinge Institute of Technology (http://www.bth.se/) and the Swedish Multicore Initiative (http://www.swedishmulticore.se/). The technical program was put together by a distinguished program committee consisting of people from both academia and industry in Sweden. We received 16 extended abstracts. Each abstract was sent to four members of the program committee. In total, we collected 64 review reports. The abstracts were judged based on their merits in terms of relevance to the workshop, significance and originality, as well as the scientific and presentation quality. Based on the reviews, the program committee decided to accept 12 papers for inclusion in the workshop, giving an acceptance rate of 75%. The accepted papers cover a broad range of topics, such as programming techniques and languages, compiler and library support, coherence and consistency issues, and verification techniques for multicore systems. This workshop was the result of many people’s effort. First of all, I would like to thank Monica Nilsson and Madeleine Rovegård for their help with many practical arrangements and organizational issues around the workshop. Then, I would like to thank the program committee for their dedicated and hard work, especially finishing all reviews on time despite the short time frame so we could send out author notifications as scheduled. I would also like to thank the steering committee of the Sweden Multicore Initiative for valuable and fruitful discussions about how to make this workshop successful. Finally, I would like to thank the SIGARCH Computer Architecture News editor Doug DeGroot for his help when compiling this special issue. On behalf of the program committee, I hope you find the included papers interesting. The workshop will continue as an annual activity within the Swedish Multicore Initiative.
  •  
4.
  • Jonsson, Bengt, 1957- (författare)
  • State-space exploration for concurrent algorithms under weak memory orderings
  • 2008
  • Ingår i: SIGARCH Computer Architecture News. - : Association for Computing Machinery (ACM). - 0163-5964 .- 1943-5851. ; 36:5, s. 65-71
  • Tidskriftsartikel (refereegranskat)abstract
    • Several concurrent implementations of familiar data abstractions such as queues, sets, or maps typically do not follow locking disciplines, and often use lock-free synchronization to gain performance. Since such algorithms are exposed to a weak memory model, they are notoriously hard  to get correct, as witnessed by many bugs found in published algorithms We outline a technique for analyzing correctness of concurrent algorithms under weak memory models, in which a model checker is used to search for correctness violations. The algorithm to be analyzed is transformed into a form where statements may be reordered according to a particular weak memory ordering. The transformed algorithm can then be analyzed by a model-checking tool, e.g., by enumerative state exploration. We illustrate the approach on a small example of a queue, which allows an enqueue operation to be concurrent with a dequeue operation, which we analyze with respect to the RMO memory model defined in SPARC v9.
  •  
5.
  • Lundvall, Håkan, 1975-, et al. (författare)
  • Automatic Parallelization of Simulation Code for Equation-based Models with Software Pipelining and Measurements on Three Platforms
  • 2008
  • Ingår i: Proceedings from the First Swedish Workshop on Multi-Core Computing, MCC-08, November 27-28, 2008, Ronneby, Sweden. - Ronneby, Sweden : Blekinge Institute of Technology. ; 36:5
  • Konferensbidrag (refereegranskat)abstract
    • In this work we report results from a new integrated method of automatically generating parallel code from Modelica models by combining parallelization at two levels of abstraction. Performing inline expansion of a Runge-Kutta solver combined with fine-grained automatic parallelization of the right-hand side of the resulting equation system opens up new possibilities for generating high performance code, which is becoming increasingly relevant when multi-core computers are becoming commonplace. An implementation, in the form of a backend module for the OpenModelica compiler, has been developed and used for measurements on two architectures: Intel Xeon and SGI Altix 3700 Bx2. This paper also contains some very recent results of a prototype implementation of this parallelization approach on the Cell BE processor architecture.
  •  
6.
  • Malik, Jamshaid Sarwar, et al. (författare)
  • Effort, resources, and abstraction vs performance in high-level synthesis : finding new answers to an old question
  • 2012
  • Ingår i: SIGARCH Computer Architecture News. - : Association for Computing Machinery (ACM). - 0163-5964 .- 1943-5851. ; 40:5, s. 64-69
  • Tidskriftsartikel (refereegranskat)abstract
    • This work provides new perspectives on impact of design effort,consumed resources and design abstraction on hardwareperformance in a high-level synthesis flow. We have shown thatcounter to published literature as well as intuition; more designeffort may not always result in better performance. We developeda kernel that simulates Brownian motion, and investigatedimprovement in hardware performance with design effort atvarious abstraction levels. Our results indicate that a designershould be careful in putting more effort at a particular abstractionlevel. In our case, we achieved best performance/effort ratio atalgorithm level rather than lower abstraction levels. This stronglysuggests that design effort is not always proportional tocorresponding improvement in performance.
  •  
7.
  • Naeem, Abdul, et al. (författare)
  • Scalability of Relaxed Consistency Models in NoC based Multicore Architectures
  • 2009
  • Ingår i: SIGARCH Computer Architecture News. - : ACM Press. - 0163-5964 .- 1943-5851. ; 37:5, s. 8-15
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • This paper studies realization of relaxed memory consistency models in the network-on-chip based distributed shared memory (DSM) multi-core systems. Within DSM systems, memory consistency is a critical issue since it affects not only the performance but also the correctness of programs. We investigate the scalability of the relaxed consistency models (weak, release consistency) implemented by using transaction counters. Our experimental results compare the average and maximum code, synchronization and data latencies of the two consistency models for various network sizes with regular mesh topologies. The observed latencies rise for both the consistency models as the network size grows. However, the scaling behaviors are different. With the release consistency model these latencies grow significantly slower than with the weak  onsistency due to better optimization potential by means of overlapping, reordering and program order relaxations. The release consistency improves the performance by 15.6% and 26.5% on average in the code and consistency latencies over the weak consistency model for the specific application, as the system grows from single core to 64 cores. The latency of data transactions  rows 2.2 times faster on the average with a weak consistency model than with a release consistency model when the system scales from single core to 64 cores.
  •  
8.
  • Sundell, Håkan, 1968, et al. (författare)
  • NOBLE: non-blocking programming support via lock-free shared abstract data types
  • 2009
  • Ingår i: SIGARCH Computer Architecture News. - : ACM, Association for Computing Machinery, Inc.. - 0163-5964 .- 1943-5851. ; 36:5, s. 80-87
  • Tidskriftsartikel (refereegranskat)abstract
    • An essential part of programming for multi-core and multi-processor includes ef cient and reliable means for sharing data. Lock-free data structures are known as very suitable for this purpose, although experienced to be very complex to design. In this paper, we present a software library of non-blocking abstract data types that have been designed to facilitate lock-free programming for non-experts. The system provides: i) ef cient implementations of the most commonly used data types in concurrent and sequential software design, ii) a lock-free memory management system, and iii) a run time-system. The library provides clear semantics that are at least as strong as those of corresponding lock-based implementations of the respective data types. Our software library can be used for facilitating lockfree programming; its design enables the programmer to: i) replace lock-based components of sequential or parallel code easily and ef ciently , ii) use well-tuned concurrent algorithms inside a software or hardware transactional system. In the paper we describe the design and functionality of the system. We also provide experimental results that show that the library can considerably improve the performance of software systems.
  •  
9.
  • Gidenstam, Anders, et al. (författare)
  • LFTHREADS : a lock-free thread library
  • 2008
  • Ingår i: SIGARCH Computer Architecture News. - : Association for Computing Machinery, Inc.. - 0163-5964. ; , s. 88-92
  • Konferensbidrag (refereegranskat)abstract
    • This extended abstract presents LFTHREADS, a thread library entirely based on lock-free methods, i.e. no spinlocks or similar synchronization mechanisms are employed in the implementation of the multithreading. Since lockfreedom is highly desirable in multiprocessors/multicores due to its advantages in parallelism, fault-tolerance, convoy-avoidance and more, there is an increased demand in lock-free methods in parallel applications, hence also in multiprocessor/multicore system services. LFTHREADS is the first thread library that provides a lock-free implementation of blocking synchronization primitives for application threads; although the latter may sound like a contradicting goal, such objects have several benefits: e.g. library operations that block and unblock threads on the same synchronization object can make progress in parallel while maintaining the desired thread-level semantics and without having to wait for any "low" operations among them. Besides, as no spin-locks or similar synchronization mechanisms are employed, memory contention can be reduced and processors/cores are able to do useful work. As a consequence, applications, too, can enjoy enhanced parallelism and fault-tolerance. For the synchronization in LFTHREADS we have introduced a new method, which we call responsibility hand-off (RHO), that does not need any special kernel support. The RHO method is also of independent interest, as it can also serve as a tool for lock-free token passing, management of contention and interaction between scheduling and synchronization. This paper gives an outline and the context of LFTHREADS. For more details the reader is refered to [7] and [8].
  •  
10.
  • Ha, Phuong, 1976, et al. (författare)
  • Non-blocking programming on multi-core graphics processors
  • 2009
  • Ingår i: SIGARCH Computer Architecture News. - 0163-5964. ; 36:5, s. 19-28
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures like the CUDA graphics processors. We first design three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models in terms of their consensus numbers. These tight results show that the coalesced memory access mechanisms can facilitate strong synchronization between the threads of multicore processors, without the need of synchronization primitives other than reads and writes.Moreover, based on the intrinsic features of recent GPU architectures, we construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-and-set and compare-and-swap. Accesses to the wait-free objects have time complexity O(N), where N is the number of processes. Our result demonstrates that it is possible to construct waitfree synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 13

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy