SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "(db:Swepub) pers:(Lu Zhonghai) conttype:(scientificother) srt2:(2005-2009)"

Sökning: (db:Swepub) pers:(Lu Zhonghai) conttype:(scientificother) > (2005-2009)

  • Resultat 1-7 av 7
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Liu, Ming, 1982- (författare)
  • A High-end Reconfigurable Computation Platform for Particle Physics Experiments
  • 2008
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Modern nuclear and particle physics experiments run at a very high reaction rate and are able to deliver a data rate of up to hundred GBytes/s.  This data rate is far beyond the storage and on-line analysis capability. Fortunately physicists have only interest in a very small proportion among the huge amounts of data. Therefore in order to select the interesting data and reject the background by sophisticated pattern recognition processing, it is essential to realize an efficient data acquisition and trigger system which results in a reduced data rate by several orders of magnitude. Motivated by the requirements from multiple experiment applications, we are developing a high-end reconfigurable computation platform for data acquisition and triggering. The system consists of a scalable number of compute nodes, which are fully interconnected by high-speed communication channels. Each compute node features 5 Xilinx Virtex-4 FX60 FPGAs and up to 10 GBytesDDR2 memory. A hardware/software co-design approach is proposed to develop custom applications on the platform, partitioning performance-critical calculation to the FPGA hardware fabric while leaving flexible and slow controls to the embedded CPU plus the operating system. The system is expected to be high-performance and general-purpose for various applications especially in the physics experiment domain. As a case study, the particle track reconstruction algorithm for HADES has been developed and implemented on the computation platform in the format of processing engines. The Tracking Processing Unit (TPU) recognizes peak bins on the projection plane and reconstructs particle tracks in realtime. Implementation results demonstrate its acceptable resource utilization and the feasibility to implement the module together with the sys-tem design on the FPGA. Experimental results show that the online track reconstruction computation achieves 10.8 - 24.3 times performance acceleration per TPU module when compared to the software solution on a Xeon2.4 GHz commodity server.
  •  
2.
  • Lu, Zhonghai (författare)
  • Design and Analysis of On-Chip Communication for Network-on-Chip Platforms
  • 2007
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Due to the interplay between increasing chip capacity and complex applications, System-on-Chip (SoC) development is confronted by severe challenges, such as managing deep submicron effects, scaling communication architectures and bridging the productivity gap. Network-on-Chip (NoC) has been a rapidly developed concept in recent years to tackle the crisis with focus on network-based communication. NoC problems spread in the whole SoC spectrum ranging from specification, design, implementation to validation, from design methodology to tool support. In the thesis, we formulate and address problems in three key NoC areas, namely, on-chip network architectures, NoC network performance analysis, and NoC communication refinement. Quality and cost are major constraints for micro-electronic products, particularly, in high-volume application domains. We have developed a number of techniques to facilitate the design of systems with low area, high and predictable performance. From flit admission and ejection perspective, we investigate the area optimization for a classical wormhole architecture. The proposals are simple but effective. Not only offering unicast services, on-chip networks should also provide effective support for multicast. We suggest a connection-oriented multicasting protocol which can dynamically establish multicast groups with quality-of-service awareness. Based on the concept of a logical network, we develop theorems to guide the construction of contention-free virtual circuits, and employ a back-tracking algorithm to systematically search for feasible solutions. Network performance analysis plays a central role in the design of NoC communication architectures. Within a layered NoC simulation framework, we develop and integrate traffic generation methods in order to simulate network performance and evaluate network architectures. Using these methods, traffic patterns may be adjusted with locality parameters and be configured per pair of tasks. We propose also an algorithm-based analysis method to estimate whether a wormhole-switched network can satisfy the timing constraints of real-time messages. This method is built on traffic assumptions and based on a contention tree model that captures direct and indirect network contentions and concurrent link usage. In addition to NoC platform design, application design targeting such a platform is an open issue. Following the trends in SoC design, we use an abstract and formal specification as a starting point in our design flow. Based on the synchronous model of computation, we propose a top-down communication refinement approach. This approach decouples the tight global synchronization into process local synchronization, and utilizes synchronizers to achieve process synchronization consistency during refinement. Meanwhile, protocol refinement can be incorporated to satisfy design constraints such as reliability and throughput. The thesis summarizes the major research results on the three topics.
  •  
3.
  •  
4.
  • Lu, Zhonghai, et al. (författare)
  • NNSE: Nostrum Network-on-Chip Simulation Environment
  • 2005
  • Ingår i: Proceedings of Swedish System-on-Chip Conference, Stockholm, Sweden, April 2005..
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • A main challenge for Network-on-Chip (NoC) design isto select a network architecture that suits a particular application.NNSE enables to analyze the performance impactof NoC configuration parameters. It allows one to(1) configure a network with respect to topology, flow controland routing algorithm etc.; (2) configure various regularand application specific traffic patterns; (3) evaluatethe network with the traffic patterns in terms of latency and throughput.
  •  
5.
  • Lu, Zhonghai (författare)
  • Using wormhole switching for networks on chip : feasibility analysis and microarchitecture adaptation
  • 2005
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Network-on-Chip (NoC) is proposed as a systematic approach to address future System-on-Chip (SoC) design difficulties. Due to its good performance and small buffering requirement, wormhole switching is being considered as a main network flow control mechanism for on-chip networks. Wormhole switching for NoCs is challenging from NoC application design and switch complexity reduction. In a NoC design flow, mapping an application onto the network should conduct a feasibility analysis in order to determine whether the messages’ timing constraints can be satisfied, and whether the network can be efficiently utilized. This is necessary because network contentions lead to nondeterministic behavior in message delivery. For wormhole-switched networks, we have formulated a contention tree model to accurately capture network contentions and reflect the concurrent use of links. Based on this model, the timing bounds of real-time messages can be derived. Furthermore, we have developed an algorithm to test the feasibility of real-time messages in the networks. From the wormhole switch micro-architecture level, switch complexity should be minimized to reduce cost but with reasonable performance penalty. We have investigated the flit admission and flit ejection problems that concern how the flits of packets are admitted into and ejected from the network, respectively. For flit admission, we propose a novel coupling scheme which binds a flit-admission queue with an output physical channel. Our results show that this scheme achieves a reduction of up to 8% in switch area and up to 35% in switch power over other comparable solutions. For flit ejection, we propose a p-sink model which differs from a typical ideal ejection model in that it uses only p flit sinks to eject flits instead of p • v flit sinks as required by the ideal model, where p is the number of physical channels of a switch and v is the number of virtual channels per physical channel. With this model, the buffering cost of flit sinks only depends on p, i.e., is irrespective of v. We have evaluated the coupled flit-admission technique and p-sink model in a 2D 4 x 4 mesh network. In our experiments, they exhibit only limited performance penalties in some cases. We believe that these cost-effective models are promising candidates to be used in wormhole-switched on-chip networks.
  •  
6.
  • Naeem, Abdul, et al. (författare)
  • Scalability of Relaxed Consistency Models in NoC based Multicore Architectures
  • 2009
  • Ingår i: SIGARCH Computer Architecture News. - : ACM Press. - 0163-5964 .- 1943-5851. ; 37:5, s. 8-15
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • This paper studies realization of relaxed memory consistency models in the network-on-chip based distributed shared memory (DSM) multi-core systems. Within DSM systems, memory consistency is a critical issue since it affects not only the performance but also the correctness of programs. We investigate the scalability of the relaxed consistency models (weak, release consistency) implemented by using transaction counters. Our experimental results compare the average and maximum code, synchronization and data latencies of the two consistency models for various network sizes with regular mesh topologies. The observed latencies rise for both the consistency models as the network size grows. However, the scaling behaviors are different. With the release consistency model these latencies grow significantly slower than with the weak  onsistency due to better optimization potential by means of overlapping, reordering and program order relaxations. The release consistency improves the performance by 15.6% and 26.5% on average in the code and consistency latencies over the weak consistency model for the specific application, as the system grows from single core to 64 cores. The latency of data transactions  rows 2.2 times faster on the average with a weak consistency model than with a release consistency model when the system scales from single core to 64 cores.
  •  
7.
  • Wolf, Pieter van der, et al. (författare)
  • Definition of Device Level Interface with QoS : Draft Specification
  • 2007
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • The extensions to standard IP communication interfaces proposed in SPRINT WP3document D3.1 are defined. Flow identification signals are added to the DLI signal level interface so transactionscan indicate the services they require. These services are specified as Contracts thatdefine the flow characteristics required for correct operation. These characteristics arethe main input to an analysis method to validate that a SoC design achieves its performance targets. DLI-Guard units are defined that enforce Contracts by regulating an IP module’s identified flows. Monitoring of flow characteristics, such as latency, is also optionally provided. A configuration API for DLI-Guards is outlined together with example code toillustrate its use. This specification is successfully applied to AMBA AXI, the prime example DLI
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-7 av 7

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy