SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "db:Swepub ;pers:(Jantsch Axel)"

Sökning: db:Swepub > Jantsch Axel

  • Resultat 21-30 av 356
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
21.
  • Chen, Xiaowen, et al. (författare)
  • Hybrid distributed shared memory space in multi-core processors
  • 2011
  • Ingår i: Journal of Software. - : International Academy Publishing (IAP). - 1796-217X. ; 6:12 SPEC. ISSUE, s. 2369-2378
  • Tidskriftsartikel (refereegranskat)abstract
    • On multi-core processors, memories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memory addresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtualto- Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. The hybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressing on shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-time partitioning of hybrid DSM organization in order to analyze its performance. A real DSM based multi-core platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioning demonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improvement depends on problem size, way of data partitioning and computation/communication ratio of parallel applications, network size of the system, etc. In our experiments, the maximal improvement is 34.42%, the minimal improvement 3.68%.
  •  
22.
  • Chen, Xiaowen, et al. (författare)
  • Multi-FPGA Implementation of a Network-on-Chip Based Many-core Architecture with Fast Barrier Synchronization Mechanism
  • 2010
  • Ingår i: Proceedings of the IEEE Norchip Conference. - 9781424489732
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we propose a fast barrier synchronization mechanism, targetingNetwork-on-Chip based manycore architectures. Its salient feature is that, once thebarrier condition is reached, the "barrier release" acknowledgement is routed to all processor nodes in a broadcast way in order to save area by avoiding storing source node information and to minimize completion time by eliminating serialization of barrierreleasing. Then, we construct a multi-FPGA platform using Xilinx® Virtex 5 as FPGA chipsand implement a NoC based many-core architecture on it. FPGA utilization and simulation results show that our mechanism demonstrates both area and performance advantages over the barrier synchronization counterpart with unicast barrier releasing. 
  •  
23.
  • Chen, Xiaowen, et al. (författare)
  • Reducing Virtual-to-Physical address translation overhead in Distributed Shared Memory based multi-core Network-on-Chips according to data property
  • 2013
  • Ingår i: Computers & electrical engineering. - : Elsevier BV. - 0045-7906 .- 1879-0755. ; 39:2, s. 596-612
  • Tidskriftsartikel (refereegranskat)abstract
    • In Network-on-Chip (NoC) based multi-core platforms, Distributed Shared Memory (DSM) preferably uses virtual addressing in order to hide the physical locations of the memories. However, this incurs performance penalty due to the Virtual-to-Physical (V2P) address translation overhead for all memory accesses. Based on the data property which can be either private or shared, this paper proposes a hybrid DSM which partitions a local memory into a private and a shared part. The private part is accessed directly using physical addressing and the shared part using virtual addressing. In particular, the partitioning boundary can be configured statically at design time and dynamically at runtime. The dynamic configuration further removes the V2P address translation overhead for those data with changeable property when they become private at runtime. In the experiments with three applications (matrix multiplication, 2D FFT, and H.264/AVC encoding), compared with the conventional DSM, our techniques show performance improvement up to 37.89%.
  •  
24.
  • Chen, Xiaowen, 1982-, et al. (författare)
  • Run-time Partitioning of Hybrid Distributed Shared Memory on Multi-core Network-on-Chips
  • 2010
  • Ingår i: The 3rd IEEE International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 39-46
  • Konferensbidrag (refereegranskat)abstract
    • On multi-core Network-on-Chips (NoCs), mem- ories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memoryaddresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtual-to-Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. Thehybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressingon shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-timepartitioning of hybrid DSM organization in order to analyze its perfor- mance. A real DSM based multi-core NoC platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioningdemonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improve- ment depends on problem size, way of datapartitioning and computation/ communication ratio of parallel applications, network size of the system, etc. In our experiments, the maximal improvement is 34.42%, the minimal improvement 3.68%.
  •  
25.
  • Chen, Xiaowen, et al. (författare)
  • Speedup Analysis of Data-parallel Applications on Multi-core NoCs
  • 2009
  • Ingår i: Proceedings of the IEEE International Conference on ASIC (ASICON). - 9781424438686 ; , s. 105-108
  • Konferensbidrag (refereegranskat)abstract
    • As more computing cores are integrated onto a single chip, the effect of network communication latency is becoming more and more significant on Multi-core Network-onChips (NoCs). For data-parallel applications, we study the model ofparallel speedup by including network communication latency in Amdahl's law. The speedup analysis considers the effect of network topology, network size, traffic model and computation/communication ratio. We also study the speedup efficiency. In our Multi-core NoC platform, a real data-parallel application, i.e. matrix multiplication, is used to validate the analysis. Our theoretical analysis and the application results show that the speedup improvement is nonlinear and the speedup efficiency decreases as the system size is scaled up. Such analysis can be used to guide architects and programmers to improve parallel processing efficiency by reducing network latency with optimized network design and increasing computation proportion in the program.
  •  
26.
  • Chen, Xiaowen, et al. (författare)
  • Supporting Distributed Shared Memory on Multi-core Network-on-Chips Using a Dual Microcoded Controller
  • 2010
  • Ingår i: Proceedings of the conference for Design Automation and Test in Europe. ; , s. 39-44
  • Konferensbidrag (refereegranskat)abstract
    • Supporting Distributed Shared Memory (DSM) is essential for multi-coreNetwork-on-Chips for the sake of reusing huge amount of legacy code and easy programmability. We propose a microcoded controller as a hardware module in each node to connect the core, the local memory and the network. The controller is programmable where the DSM functions such as virtual-to-physical address translation,memory access and synchronization etc. are realized using microcode. To enable concurrent processing of memory requests from the local and remote cores, ourcontroller features two mini-processors, one dealing with requests from the local coreand the other from remote cores. Synthesis results suggest that the controller consumes 51k gates for the logic and can run up to 455 MHz in 130 nm technology. To evaluate its performance, we use synthetic and application workloads. Results show that, when the system size is scaled up, the delay overhead incurred by the controller may become less significant when compared with the network delay. In this way, the delay efficiency of our DSM solution is close to hardware solutions on average but still have all the flexibility of software solutions.
  •  
27.
  • Chen, Xiaowen, 1982-, et al. (författare)
  • Supporting Efficient Synchronization in Multi-core NoCs Using Dynamic Buffer Allocation Technique
  • 2010
  • Ingår i: Proceedings of the IEEE Annual Symposium on VLSI. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 462-463
  • Konferensbidrag (refereegranskat)abstract
    • This paper explores a dynamic buffer allocation technique to guide a distributedsynchronization architecture to support efficient synchronization on multi-core Network-on-Chips (NoCs). The synchronization architecture features two physical buffers to be able to concurrently queue and handle synchronization requests issued by the local processor and remote processors via the on-chip network. Using the dynamic bufferallocation technique, the two physical buffers are dynamically allocated to form multiple virtual buffers in order to improve buffers' utilization. Experiments are carried on to evaluate buffers' utilization.
  •  
28.
  • Chen, Zhipeng, et al. (författare)
  • A Worst Case Performance Model for TDM Virtual Circuit in NoCs
  • 2010
  • Ingår i: NETWORK AND PARALLEL COMPUTING. - Berlin : Springer Berlin/Heidelberg. - 9783642156717 ; , s. 452-461
  • Konferensbidrag (refereegranskat)abstract
    • In Network-on-Chip (NoC), Time-Division-Mutiplexing (TDM) Virtual Circuit (VC) is well recognized as being capable to provide guaranteed services in both latency and bandwidth. We propose a method of modeling TDM based VC by using Network Calculus. We derive a tight upper bound of end-to-end delay and buffer requirement for individual VC. The performance analysis using Latency-Rate server is also presented in comparsion with our Performance model for TDM Virtual Circuit in NoCs (Pemvin). We conducted experiments on comparing Pemvin to the Latency-Rate server model. Our experiment results show the improvement of Pemvin on tightening the upper bound of end-to-end delay and buffer requirement.
  •  
29.
  •  
30.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 21-30 av 356
Typ av publikation
konferensbidrag (238)
tidskriftsartikel (66)
rapport (14)
bokkapitel (14)
doktorsavhandling (10)
licentiatavhandling (5)
visa fler...
annan publikation (3)
samlingsverk (redaktörskap) (2)
bok (2)
proceedings (redaktörskap) (1)
forskningsöversikt (1)
visa färre...
Typ av innehåll
refereegranskat (305)
övrigt vetenskapligt/konstnärligt (51)
Författare/redaktör
Lu, Zhonghai (116)
Hemani, Ahmed (44)
Sander, Ingo (40)
Tenhunen, Hannu (39)
Öberg, Johnny (33)
visa fler...
Ellervee, Peeter (22)
O'Nils, Mattias (18)
Kumar, Shashi (17)
Chen, Xiaowen (16)
Millberg, Mikael (16)
Liu, Ming (15)
Chen, Shuming (11)
Svantesson, Bengt (11)
Zhu, Jun (9)
Pamunuwa, Dinesh (9)
Feng, Chaochao (8)
Zheng, Li-Rong (7)
Bjureus, Per (7)
Zhou, Dian (7)
Liu, Shaoteng (7)
Jantsch, Axel, Profe ... (7)
Isoaho, Jouni (7)
Weerasekera, Roshan (6)
Weldezion, Awet Yema ... (6)
Nabiev, Rustam (6)
Liljeberg, Pasi (6)
Liu, Hengzhu (6)
Zhang, Minxuan (6)
Eslami Kiasari, Abba ... (6)
Grange, Matt (6)
Haghbayan, Mohammad- ... (6)
Bertozzi, Davide (5)
Benini, Luca (5)
O'Nils, Mattias, 196 ... (5)
Postula, Adam (5)
Öberg, Johnny (5)
Ebrahimi, Masoumeh (4)
Wang, Qiang (4)
Zheng, Lirong (4)
Al-Khatib, Iyad (4)
Poletti, Francesco (4)
Bechara, Mohamed (4)
Khalifeh, Hasan (4)
Rahmani, Amir M. (4)
Xu, Hao (4)
Li, Jinwen (4)
Forsell, Martti (4)
Soininen, Juha-Pekka (4)
Shallari, Irida (4)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (345)
Mittuniversitetet (11)
Linköpings universitet (3)
Jönköping University (2)
RISE (1)
Språk
Engelska (356)
Forskningsämne (UKÄ/SCB)
Teknik (290)
Naturvetenskap (57)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy