SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "(db:Swepub) pers:(Jantsch Axel) srt2:(2005-2009) mspu:(conferencepaper) srt2:(2009)"

Search: (db:Swepub) pers:(Jantsch Axel) srt2:(2005-2009) mspu:(conferencepaper) > (2009)

  • Result 1-10 of 16
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Chen, Xiaowen, et al. (author)
  • Speedup Analysis of Data-parallel Applications on Multi-core NoCs
  • 2009
  • In: Proceedings of the IEEE International Conference on ASIC (ASICON). - 9781424438686 ; , s. 105-108
  • Conference paper (peer-reviewed)abstract
    • As more computing cores are integrated onto a single chip, the effect of network communication latency is becoming more and more significant on Multi-core Network-onChips (NoCs). For data-parallel applications, we study the model ofparallel speedup by including network communication latency in Amdahl's law. The speedup analysis considers the effect of network topology, network size, traffic model and computation/communication ratio. We also study the speedup efficiency. In our Multi-core NoC platform, a real data-parallel application, i.e. matrix multiplication, is used to validate the analysis. Our theoretical analysis and the application results show that the speedup improvement is nonlinear and the speedup efficiency decreases as the system size is scaled up. Such analysis can be used to guide architects and programmers to improve parallel processing efficiency by reducing network latency with optimized network design and increasing computation proportion in the program.
  •  
2.
  • Grange, Matt, et al. (author)
  • Physical mapping and performance study of a multi-clock 3-Dimensional Network-on-Chip mesh
  • 2009
  • In: 2009 IEEE INTERNATIONAL CONFERENCE ON 3D SYSTEMS INTEGRATION. - San Francisco : IEEE conference proceedings. - 9781424445110 ; , s. 345-351
  • Conference paper (peer-reviewed)abstract
    • The physical performance of a 3-Dimensional Network-on-Chip (NoC) mesh architecture employing through silicon vias (TSV) for vertical connectivity is investigated with a cycle-accurate RTL simulator. The physical latency and area impact of TSVs, switches, and the on-chip interconnect is evaluated to extract the maximum signaling speeds through the switches, horizontal and vertical network links. The relatively low parasitics of TSVs compared to the on-chip 2-D interconnect allow for higher signaling speeds between chip layers. The system-level impact on overall network performance as a result of clocking vertical packets at a higher rate through the TSV interconnect is simulated and reported.
  •  
3.
  • Liu, Ming, et al. (author)
  • A Reconfigurable Design Framework for FPGA Adaptive Computing
  • 2009
  • In: 2009 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS. - : IEEE. - 9781424452934 ; , s. 439-444
  • Conference paper (peer-reviewed)abstract
    • Partial Reconfiguration (PR) offers the possibility to adaptively change part of the FPGA design without stopping the remaining system. In this paper, we present a comprehensive framework for adaptive computing, in which design key points of hardware processes, system interconnections, Operating Systems (OS), device drivers, scheduler software as well as context switching are respectively concerned in different hardware/software layers. A case study is discussed to demonstrate an example of swapping a Flash memory controller and an SRAM controller in response to diverse memory access needs. Result analysis reveals a more efficient resource utilization of 52.1% I/O pads, 86.5% LUTs and 81.3% Flip-Flops, when compared to the static design with same functionalities. A small reconfiguration overhead of context switching is measured within the range from hundreds of microseconds to milliseconds. Moreover, technical perspectives are analyzed and it is foreseen to obtain great benefits with the proposed design framework in object applications of particle physics experiments.
  •  
4.
  • Liu, Ming, et al. (author)
  • Run-time Partial Reconfiguration Speed Investigation and Architectural Design Space Exploration
  • 2009
  • In: FPL 09. - 9781424438914 ; , s. 498-502
  • Conference paper (peer-reviewed)abstract
    • Run-time Partial Reconfiguration (PR) speed is significant in applications especially when fast IP core switching is required. In this paper, we propose to use Direct Memory Access (DMA), Master (MST) burst, and a dedicated Block RAM (BRAM) cache respectively to reduce the reconfiguration time. Based on the Xilinx PR technology and the Internal Configuration Access Port (ICAP) primitive in the FPGA fabric, we discuss multiple design architectures and thoroughly investigate their performance with measurements for different partial bitstream sizes. Compared to the reference OPB_HWICAP and XPS_HWICAP designs, experimental results show that DMA_HWICAP and MST_HWICAP reduce the reconfiguration time by one order of magnitude, with little resource consumption overhead. The BRAM_HWICAP design can even approach the reconfiguration speed limit of the ICAP primitive at the cost of large Block RAM utilization.
  •  
5.
  • Liu, Ming, et al. (author)
  • Trigger algorithm development on FPGA-based Compute Nodes
  • 2009
  • In: 2009 16th IEEE-NPSS Real Time Conference. - New York : IEEE. - 9781424457960 ; , s. 478-484
  • Conference paper (peer-reviewed)abstract
    • Based on the ATCA computation architecture and Compute Nodes (CN), investigation and implementation work has been being executed for HADES and PANDA trigger algorithms. We present our designs for HADES track reconstruction processing, Cherenkov ring recognition, Time-Of-Flight processing, electromagnetic shower recognition.. and the PANDA straw tube tracking algorithm. They will appear as co-processors in the uniform system design to undertake the detector-specific computing. The algorithm principles will be explained and hardware designs are described in the paper. The current progress reveals the feasibility to implement these algorithms on FPGAs. Also experimental results demonstrate the performance speedup when compared to alternative software solutions, as well as the potential capability of high-speed parallel/pipelined processing in Data Acquisition and Trigger systems.
  •  
6.
  • Lu, Zhonghai, et al. (author)
  • A Flow Regulator for On-Chip Communication
  • 2009
  • In: IEEE INTERNATIONAL SOC CONFERENCE, PROCEEDINGS. - 9781424452200 ; , s. 151-154
  • Conference paper (peer-reviewed)abstract
    • We have proposed (sigma, rho)-based flow regulation as a design instrument for System-on-Chip (SoC) architects to control quality-of-service and achieve cost-effective communication, where sigma bounds the traffic burstiness and rho the traffic rate. In this paper, we present a hardware implementation of the regulator. We discuss its microarchitecture. Based on this microarchitecture, we design, implement and synthesize a multi-flow regulator for AXI. Our experiments show the effectiveness of such a regulation device on the control of delay, jitter and buffer requirements.
  •  
7.
  • Lu, Zhonghai, et al. (author)
  • Flow Regulation for On-Chip Communication
  • 2009
  • In: DATE. - 9781424437818 ; , s. 578-581
  • Conference paper (peer-reviewed)abstract
    • We propose (sigma, rho)-based flow regulation as a design instrument for System-on-Chip (SoC) architects to control quality-of-service and achieve cost-effective communication, where sigma bounds the traffic burstiness and rho the traffic rate. This regulation changes the burstiness and timing of traffic flows, and can be used to decrease delay and reduce buffer requirements in the SoC infrastructure. In this paper, we define and analyze the regulation spectrum, which bounds the upper and lower limits of regulation. Experiments on a Network-on-Chip (NoC) with guaranteed service demonstrate the benefits of regulation We conclude that flow regulation may exert significant positive impact on communication performance and buffer requirements.
  •  
8.
  • Lu, Zhonghai, et al. (author)
  • Trends of Terascale Computing Chips in the Next Ten Years
  • 2009
  • In: 2009 IEEE 8TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS. - NEW YORK : IEEE. ; , s. 62-66
  • Conference paper (peer-reviewed)abstract
    • Moore's law steadily continues though facing a number of challenges. This paper identifies ongoing and desirable trends to exploit the technology capacity and flirt her Moore 's law for terascale on-chip computing architectures in the next ten years. Four foreseeable trends are: from single core to many cores, from bus-based to network-based interconnect, from centralized memory to distributed memory, and from 2D integration to 3D integration. We motivate these trends and show that the number of design choices for computing chips is increasing rapidly, leading to an exploding design space with uncountable opportunities for the innovative architect. Moreover, we envision that the multicore Network-on-Chip will become an infrastructure backbone and accumulate many other infrastructural functions such as memory, power and resource management, testing and diagnostic services.
  •  
9.
  • Millberg, Mikael, et al. (author)
  • Priority Based Forced Requeue to Reduce Worst-Case Latencies for Bursty Traffic
  • 2009
  • In: DATE. - : IEEE. - 9781424437818 ; , s. 1070-1075
  • Conference paper (peer-reviewed)abstract
    • In this paper we introduce Priority Based Forced Requeue to decrease worst-case latencies in NoCs offering best effort services. Forced Requeue is to prematurely lift out low priority packets from the network and requeue them outside using priority queues. The first benefit of this approach, applicable to any NoC offering best effort services, is that packets that have not yet entered the network now compete with packets inside the network and hence tighter bounds on admission times can be given. The second benefit - which is more specific to deflective routing as in the Nostrum NoC - is that packet "reshuffling" dramatically reduces the latency inside the network for bursty traffic due to a lowered risk of collisions at the exit of the network. This paper studies the Forced Requeuing on a mesh with varying burst sizes and traffic scenarios. The experimental results show a 50% reduction in worst-case latency from a system perspective thanks to a reshaped latency distribution whilst keeping the average latency the same.
  •  
10.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-10 of 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view