SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Lu Zhonghai) "

Sökning: WFRF:(Lu Zhonghai)

  • Resultat 1-10 av 292
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Chen, X., et al. (författare)
  • Achieving memory access equalization via round-trip routing latency prediction in 3D many-core NoCs
  • 2015
  • Ingår i: Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI. - : IEEE. ; , s. 398-403
  • Konferensbidrag (refereegranskat)abstract
    • 3D many-core NoCs are emerging architectures for future high-performance single chips due to its integration of many processor cores and memories by stacking multiple layers. In such architecture, because processor cores and memories reside in different locations (center, corner, edge, etc.), memory accesses behave differently due to their different communication distances, and the performance (latency) gap of different memory accesses becomes larger as the network size is scaled up. This phenomenon may lead to very high latencies suffered from by some memory accesses, thus degrading the system performance. To achieve high performance, it is crucial to reduce the number of memory accesses with very high latencies. However, this should be done with care since shortening the latency of one memory access can worsen the latency of another as a result of shared network resources. Therefore, the goal should focus on narrowing the latency difference of memory accesses. In the paper, we address the goal by proposing to prioritize the memory access packets based on predicting the round-trip routing latencies of memory accesses. The communication distance and the number of the occupied items in the buffers in the remaining routing path are used to predict the round-trip latency of a memory access. The predicted round-trip routing latency is used as the base to arbitrate the memory access packets so that the memory access with potential high latency can be transferred as early and fast as possible, thus equalizing the memory access latencies as much as possible. Experiments with varied network sizes and packet injection rates prove that our approach can achieve the goal of memory access equalization and outperforms the classic round-robin arbitration in terms of maximum latency, average latency, and LSD1. In the experiments, the maximum improvement of the maximum latency, the average latency and the LSD are 80%, 14%, and 45% respectively.
  •  
2.
  • Liu, Ming, et al. (författare)
  • ATCA-based Computation Platform for Data Acquisition and Triggering in Particle Physics Experiments
  • 2008
  • Ingår i: 2008 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE AND LOGIC APPLICATIONS, VOLS 1 AND 2. ; , s. 287-292
  • Konferensbidrag (refereegranskat)abstract
    • An ATCA-based computation platform for data acquisition and trigger applications in nuclear and particle physics experiments has been developed. Each Compute Node (CN) which appears as a Field Replaceable Unit (FRU) in an ATCA shelf, features 5 Xilinx Virtex-4 FX60 FPGAs and up to 10 GBytes DDR2 memory. Connectivity is provided with 8 optical links and 5 Gigabit Ethernet ports, which are mounted on each board to receive data from detectors and forward results to outer shelves or PC farms with attached mass storage. Fast point-to-point on-board interconnections between FPGAs as well as the full-mesh shelf backplane provide flexibility and high bandwidth to partition algorithms and correlate results among them. The system represents a highly reconfigurable and scalable solution for multiple applications.
  •  
3.
  • Liu, Ming, et al. (författare)
  • Trigger algorithm development on FPGA-based Compute Nodes
  • 2009
  • Ingår i: 2009 16th IEEE-NPSS Real Time Conference. - New York : IEEE. - 9781424457960 ; , s. 478-484
  • Konferensbidrag (refereegranskat)abstract
    • Based on the ATCA computation architecture and Compute Nodes (CN), investigation and implementation work has been being executed for HADES and PANDA trigger algorithms. We present our designs for HADES track reconstruction processing, Cherenkov ring recognition, Time-Of-Flight processing, electromagnetic shower recognition.. and the PANDA straw tube tracking algorithm. They will appear as co-processors in the uniform system design to undertake the detector-specific computing. The algorithm principles will be explained and hardware designs are described in the paper. The current progress reveals the feasibility to implement these algorithms on FPGAs. Also experimental results demonstrate the performance speedup when compared to alternative software solutions, as well as the potential capability of high-speed parallel/pipelined processing in Data Acquisition and Trigger systems.
  •  
4.
  • Liu, Weihua, et al. (författare)
  • Characterizing the Reliability and Threshold Voltage Shifting of 3D Charge Trap NAND Flash
  • 2019
  • Ingår i: 2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE). - : IEEE. - 9783981926323 ; , s. 312-315
  • Konferensbidrag (refereegranskat)abstract
    • 3D charge trap (CT) triple-level cell (TLC) NAND flash gradually becomes a mainstream storage component due to high storage capacity and performance, but introducing a concern about reliability. Fault tolerance and data management schemes are capable of improving reliability. Designing a more efficient solution, however, needs to understand the reliability characteristics of 3D CT TLC NAND flash. To facilitate such understanding, by exploiting a real-world testing platform, we investigate the reliability characteristics including the raw bit error rate (RBER) and the threshold voltage (Vth) shifting features after suffering from variable disturbances. We give analyses of why these characteristics exist in 3D CT TLC NAND flash. We hope these observations can guide the designers to propose high efficient solutions to the reliability problem.
  •  
5.
  • Wang, Qiang, et al. (författare)
  • Hardware/Software Co-design of an ATCA-based Computation Platform for Data Acquisition and Triggering
  • 2009
  • Ingår i: 16th IEEE NPSS Real Time Conference. - 9781424457960 ; , s. 485-489
  • Konferensbidrag (refereegranskat)abstract
    • An ATCA-based computation platform for data acquisition and trigger(TDAQ) applications has been developed for multiple future projects such its PANDA. HADES, and BESIII. Each Compute Node (CN) appears as one (if the fourteen Field Replaceable Units (FRU) in an ATCA shelf, which in total features a high performance of 1890 Clips inter-FPGA on-board channels, 1456 Gbps inter-board backplane connections, 728 Gbps full-duplex optical links, 70 Gbps Ethernet. 140 GBytes DDR2 SDRAM. and all computing resources of 70 Xilinx Virtex-4 FX60 FPGAs. Corresponding to (the system architecture, a hardware/software co-design approach is proposed to ease and accelerate the development for different experiments. In the uniform system design. application-specific computation is to be implemented as customized hardware co-processors, while the embedded PowerPC processor takes charge of flexible slow controls and transmission protocol processing.
  •  
6.
  • Anagnostopoulos, Iraklis, et al. (författare)
  • Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations
  • 2011
  • Ingår i: IEEE Embedded Systems Letters. - 1943-0663. ; 3:2, s. 66-69
  • Tidskriftsartikel (refereegranskat)abstract
    • Multiprocessor system-on-chip (MPSoCs) have attracted significant attention since they are recognized as a scalable paradigm to interconnect and organize a high number of cores. Current multicore embedded systems exhibit increased levels of dynamicbehavior, leading to unexpected memory footprint variations unknown at design time.Dynamic memory management (DMM) is a promising solution for such types of dynamicsystems. Although some efficient dynamic memory managers have been proposed for conventional bus-based MPSoC platforms, there are no DMM solutions regarding the constraints and the opportunities delivered by the physical distribution of multiple memorynodes of the platform. In this work, we address the problem of providing customizedmicrocoded DMM on MPSoC platforms with distributed memory organization. Customization is enabled at application-and platform-level. Results show that customizedmicrocoded DMM can serve approximately 7× more allocation requests compared to puredistributed memory platforms and perform 25% faster than the corresponding high-level implementation in C language. 
  •  
7.
  • Badawi, Mohammad, 1981- (författare)
  • Adaptive Coarse-grain Reconfigurable Protocol Processing Architecture
  • 2016
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Digital signal processors and their variants have provided significant benefit to efficient implementation of Physical Layer (PHY) of Open Systems Interconnection (OSI) model’s seven-layer protocol processing stack compared to the general purpose processors. Protocol processors promise to provide a similar advantage for implementing higher layers in the (OSI)'s seven-layer model. This thesis addresses the problem of designing customizable coarse-grain reconfigurable protocol processing fabrics as a solution to achieving high performance and computational efficiency. A key requirement that this thesis addresses is the ability to not only adapt to varying applications and standards, and different modes in each standard but also to time varying load and performance demands while maintaining quality of service.This thesis presents a tile-based multicore protocol processing architecture that can be customized at design time to meet the requirements of the target application. The architecture can then be reconfigured at boot time and tuned to suit the desired use-case. This architecture includes a packet-oriented memory system that has deterministic access time and access energy costs, and hence can be accurately dimensioned to fulfill the requirements of the desired use-case. Moreover, to maintain quality of service as predicted, while minimizing the use of energy and resources, this architecture encompasses an elastic management scheme that controls run-time configuration to deploy processing resources based on use-case and traffic demands.To evaluate the architecture presented in this thesis, different case studies were conducted while quantitative and qualitative metrics were used for assessment. Energy-delay product, energy efficiency, area efficiency and throughput show the improvements that were achieved using the processing cores and the memory of the presented architecture, compared with other solutions. Furthermore, the results show the reduction in latency and power consumption required to evaluate controlling states when using the elastic management scheme. The elasticity of the scheme also resulted in reducing the total area required for the controllers that serve multiple processing cores in comparison with other designs. Finally, the results validate the ability of the presented architecture to support quality of service without misutilizing available energy during a real-life case study of a multi-participant Voice Over Internet Protocol (VOIP) call.
  •  
8.
  • Badawi, Mohammad, et al. (författare)
  • Customizable Coarse-grained Energy-efficient Reconfigurable Packet Processing Architecture
  • 2014
  • Ingår i: Proceedings Of The 2014 IEEE 25th International Conference on Application-specific Systems, Architectures and Processors (ASAP). - : IEEE. ; , s. 30-35
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we present a highly customizable and rapidly reconfigurable multi-core packet processing architecture that provides energy and area efficiency while retaining flexibility. Presented architecture with its agile reconfigurability permits time-critical adaptability where resources can be re-clustered at run time in few cycles, hence, maintaining efficiency if requirements of the use-case change. We elaborate the flexibility and adaptability of our architecture and we report its evaluation results. For evaluation, we performed the widely-used UDP/IP and we compared our proposed architecture to low-power 32-bit general purpose processors, a custom ASIC implementation and a programmable protocol processor. Compared to GPP-based solutions, our architecture is 20-34 times more energy efficient while providing 2.4-4.1 times higher throughput. While retaining the programmability, the proposed solution achieved 78% of the energy efficiency of hardwired ASIC implementation. Compared to a programmable protocol processor, our solution has 2.6 times more throughput and requires only a third of the gate count. lastly, we quantified the worst-case time and average-case time required for time-critical adaptability when reconfiguration occurs during a real-life Voice-Over IP traffic.
  •  
9.
  • Badawi, Mohammad, et al. (författare)
  • Elastic Management and QoS Provisioning Scheme for Adaptable Multi-core Protocol Processing Architecture
  • 2016
  • Ingår i: 19TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2016). - : IEEE. - 9781509028160 ; , s. 575-583
  • Konferensbidrag (refereegranskat)abstract
    • Adaptable protocol processing architectures can offer quality-of-service (QoS) while improving energy efficiency and resource utilization. However, a key condition for adaptable architectures to support QoS is that, the latency required for processor adaptation does not result in violating packet processing delay bound. Moreover, adaptation latency must not cause packets to accumulate until memory becomes full and packets are dropped. In this paper, we present an elastic management scheme for agile adaptable multi-core protocol processing architecture to facilitate processor adaptation when QoS has to be maintained. The proposed management scheme encompasses a set of reconfigurable finite state machines (FSMs) and each is dimensioned to associate single processing element (PE). During processor adaptation, the needed FSMs can rapidly be clustered to provide the control needed for the newly adapted structure. We use a real-life application to demonstrate how our proposed management scheme supports maintaining QoS during processor adaptation. We also quantify the time needed for processor adaptation as well as the reduction in energy, latency and area achieved when using our scheme.
  •  
10.
  • Badawi, Mohammad, et al. (författare)
  • Quality-of-service-aware adaptation scheme for multi-core protocol processing architecture
  • 2017
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 54, s. 47-59
  • Tidskriftsartikel (refereegranskat)abstract
    • Employing adaptable protocol processing architectures has shown a high potential in provisioning Quality-of-Service (QoS) while retaining efficient use of available energy budget. Nevertheless, successful QoS provisioning using adaptable protocol processing architectures requires adaption to be agile and to have low latency. That is, a long adaptation latency might lead to violating desired packet processing latency, desired throughput or loss of packets if the memory fails to accommodate packet accumulation. This paper presents an elastic management scheme to permit agile and QoS-aware adaptation of processing elements (PEs) within the protocol processing architecture, such that desired QoS is maintained. Moreover, our proposed scheme has the potential to reduce energy consumption since it employs the PEs upon demand. We quantify the latency required for PEs adaptation, the reduction in energy and the reduction in area that can be achieved using our scheme. We also consider two different real-life use cases to demonstrate the effectiveness of our proposed management scheme in maintaining QoS while conserving available energy.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 292

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy