SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Anshus Otto) "

Sökning: WFRF:(Anshus Otto)

  • Resultat 1-7 av 7
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ha, Phuong, 1976, et al. (författare)
  • Brief Announcement: Wait-free Programming for General Purpose Computations on Graphics Processors
  • 2008
  • Ingår i: Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing. - 9781595939890 ; , s. 452-
  • Konferensbidrag (refereegranskat)abstract
    • This paper aims at bridging the gap between the lack of synchronization mechanisms in recent graphics processor (GPU) architectures and the need of synchronization mechanisms in parallel applications. Based on the intrinsic features of recent GPU architectures, we construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-and-set and compare-and-swap. Accesses to the new wait-free objects have time complexity O(N), where N is the number of concurrent processes. The wait-free objects have space complexity O(N2), which is optimal. Our result demonstrates that it is possible to construct wait-free synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs.
  •  
2.
  • Ha, Phuong, 1976, et al. (författare)
  • NB-FEB: A Universal Scalable Easy-to-Use Synchronization Primitive for Manycore Architectures
  • 2009
  • Ingår i: Proceedings of the 13th International Conference on Principle of Distributed Systems (OPODIS 2009), Lecture Notes in Computer Science. - Berlin, Heidelberg : Springer Berlin Heidelberg. - 1611-3349. - 9783642108761 ; 5923, s. 189-203
  • Konferensbidrag (refereegranskat)abstract
    • his paper addresses the problem of universal synchronization primitives that can support scalable thread synchronization for large-scale manycore architectures. The universal synchronization primitives that have been deployed widely in conventional architectures, are the compare-and-swap (CAS) and load-linked/store-conditional (LL/SC) primitives. However, such synchronization primitives are expected to reach their scalability limits in the evolution to manycore architectures with thousands of cores.We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchronization primitive for parallel programming on manycore architectures. We show that the NB-FEB primitive is universal, scalable, feasible and easy to use. NB-FEB, together with registers, can solve the consensus problem for an arbitrary number of processes (universality). NB-FEB is combinable, namely its memory requests to the same memory location can be combined into only one memory request, which consequently makes NB-FEB scalable (scalability). Since NB-FEB is a variant of the original full/empty bit that always returns a value instead of waiting for a conditional flag, it is as feasible as the original full/empty bit, which has been implemented in many computer systems (feasibility). We construct, on top of NB-FEB, a non-blocking software transactional memory system called NBFEB-STM, which can be used as an abstraction to handle concurrent threads easily. NBFEB-STM is space efficient: the space complexity of each object updated by N concurrent threads/transactions is ${\it \Theta}(N)$, which is optimal.
  •  
3.
  • Ha, Phuong, 1976, et al. (författare)
  • NB-FEB: An Easy-to-Use and Scalable Universal Synchronization Primitive for Parallel Programming
  • 2008
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • This paper addresses the problem of universal synchronizationprimitives that can support scalable thread synchronizationfor large-scale many-core architectures. The universalsynchronization primitives that have been deployed widelyin conventional architectures, are the compare-and-swap (CAS)and load-linked/store-conditional (LL/SC) primitives. However,such synchronization primitives are expected to reachtheir scalability limits in the evolution to many-core architectureswith thousands of cores.We introduce a non-blocking full/empty bit primitive, orNB-FEB for short, as a promising synchronization primitivefor parallel programming on may-core architectures. We showthat the NB-FEB primitive is universal, scalable, feasible andconvenient to use. NB-FEB, together with registers, can solvethe consensus problem for an arbitrary number of processes(universality). NB-FEB is combinable, namely its memory requeststo the same memory location can be combined intoonly one memory request, which consequently mitigates performancedegradation due to synchronization "hot spots" (scalability).Since NB-FEB is a variant of the original full/emptybit that always returns a value instead of waiting for a conditionalflag, it is as feasible as the original full/empty bit, whichhas been implemented in many computer systems (feasibility).The original full/empty bit is well-known as a special-purposeprimitive for fast producer-consumer synchronization and hasbeen used extensively in the specific domain of applications.In this paper, we show that NB-FEB can be deployed easilyas a general-purpose primitive. Using NB-FEB, we constructa non-blocking software transactional memory systemcalled NBFEB-STM, which can be used to handle concurrentthreads conveniently. NBFEB-STM is space efficient:the space complexity of each object updated by N concurrentthreads/transactions is Θ(N), the optimal.
  •  
4.
  • Ha, Phuong, 1976, et al. (författare)
  • Non-blocking programming on multi-core graphics processors
  • 2009
  • Ingår i: SIGARCH Computer Architecture News. - 0163-5964. ; 36:5, s. 19-28
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures like the CUDA graphics processors. We first design three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models in terms of their consensus numbers. These tight results show that the coalesced memory access mechanisms can facilitate strong synchronization between the threads of multicore processors, without the need of synchronization primitives other than reads and writes.Moreover, based on the intrinsic features of recent GPU architectures, we construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-and-set and compare-and-swap. Accesses to the wait-free objects have time complexity O(N), where N is the number of processes. Our result demonstrates that it is possible to construct waitfree synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs.
  •  
5.
  • Ha, Phuong, 1976, et al. (författare)
  • Preliminary results on nb-feb, a synchronization primitive for parallel programming
  • 2009
  • Ingår i: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming. - New York, NY, USA : ACM. - 9781605583976 ; , s. 295-296
  • Konferensbidrag (refereegranskat)abstract
    • We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchronization primitive for parallel programming on may-core architectures. We show that the NB-FEB primitive is universal, scalable and feasible. NB-FEB, together with registers, can solve the consensus problem for an arbitrary number of processes (universality). NB-FEB is combinable, namely its memory requests to the same memory location can be combined into only one memory request, which consequently mitigates performance degradation due to synchronization "hot spots" (scalability). Since NB-FEB is a variant of the original full/empty bit that always returns a value instead of waiting for a conditional flag, it is as feasible as the original full/empty bit, which has been implemented in many computer systems (feasibility).
  •  
6.
  • Ha, Phuong, 1976, et al. (författare)
  • The Synchronization Power of Coalesced Memory Accesses
  • 2008
  • Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Berlin, Heidelberg : Springer Berlin Heidelberg. - 1611-3349 .- 0302-9743. - 9783540877783 ; 5218, s. 320-334
  • Konferensbidrag (refereegranskat)abstract
    • Multicore processor architectures have established themselves as the new generation of processor architectures. As part of the one core to many cores evolution, memory access mechanisms have advanced rapidly. Several new memory access mechanisms have been implemented in many modern commodity multicore processors. Memory access mechanisms, by devising how processing cores access the shared memory, directly influence the synchronization capabilities of the multicore processors. Therefore, it is crucial to investigate the synchronization power of these new memory access mechanisms.This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures like the CUDA graphics processors. We first design three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models in terms of their consensus numbers. These tight results show that the coalesced memory access mechanisms can facilitate strong synchronization between the threads of multicore processors, without the need of synchronization primitives other than reads and writes. In the case of the contemporary CUDA processors, our results imply that the coalesced memory access mechanisms have consensus numbers up to sixteen.
  •  
7.
  • Ha, Phuong, 1976, et al. (författare)
  • Wait-free Programming for General Purpose Computations on Graphics Processors
  • 2008
  • Ingår i: the Proceedings of the 22th International Parallel and Distributed Symposium (IPDPS 2008). - 1530-2075. - 9781424416936 ; , s. 1-12
  • Konferensbidrag (refereegranskat)abstract
    • The fact that graphics processors (GPUs) are today’s most powerful computational hardware for the dollar has motivated researchers to utilize the ubiquitous and powerful GPUs for general-purpose computing. Recent GPUs feature the single-program multiple-data (SPMD) multicore architecture instead of the single-instruction multiple-data (SIMD). However, unlike CPUs, GPUs devote their transistors mainly to data processing rather than data caching and flow control, and consequently most of the powerful GPUs with many cores do not support any synchronization mechanisms between their cores. This prevents GPUs from being deployed more widely for general-purpose computing. This paper aims at bridging the gap between the lack of synchronization mechanisms in recent GPU architectures and the need of synchronization mechanisms in parallel applications. Based on the intrinsic features of recent GPU architectures, we construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-and-set and compare-and-swap. Accesses to the wait-free objects have time complexity O(N), whether N is the number of processes. Our result demonstrates that it is possible to construct wait-free synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-7 av 7
Typ av publikation
konferensbidrag (5)
rapport (1)
tidskriftsartikel (1)
Typ av innehåll
refereegranskat (6)
övrigt vetenskapligt/konstnärligt (1)
Författare/redaktör
Tsigas, Philippas, 1 ... (7)
Ha, Phuong, 1976 (7)
Anshus, Otto (7)
Lärosäte
Chalmers tekniska högskola (7)
Språk
Engelska (7)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (7)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy