SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Zheng Lirong) srt2:(2020-2021)"

Sökning: WFRF:(Zheng Lirong) > (2020-2021)

  • Resultat 1-5 av 5
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Huang, Boming, et al. (författare)
  • IECA : An In-Execution Configuration CNN Accelerator With 30.55 GOPS/mm(2) Area Efficiency
  • 2021
  • Ingår i: IEEE Transactions on Circuits and Systems Part 1. - : Institute of Electrical and Electronics Engineers (IEEE). - 1549-8328 .- 1558-0806. ; 68:11, s. 4672-4685
  • Tidskriftsartikel (refereegranskat)abstract
    • It remains challenging for a Convolutional Neural Network (CNN) accelerator to maintain high hardware utilization and low processing latency with restricted on-chip memory. This paper presents an In-Execution Configuration Accelerator (IECA) that realizes an efficient control scheme, exploring architectural data reuse, unified in-execution controlling, and pipelined latency hiding to minimize configuration overhead out of the computation scope. The proposed IECA achieves row-wise convolution with tiny distributed buffers and reduces the size of total on-chip memory by removing 40% of redundant memory storage with shared delay chains. By exploiting a reconfigurable Sequence Mapping Table (SMT) and Finite State Machine (FSM) control, the chip realizes cycle-accurate Processing Element (PE) control, automatic loop tiling and latency hiding without extra time slots for pre-configuration. Evaluated on AlexNet and VGG-16, the IECA retains over 97.3% PE utilization and over 95.6% memory access time hiding on average. The chip is designed and fabricated in a UMC 55-nm process running at a frequency of 250 MHz and achieves an area efficiency of 30.55 GOPS/mm(2) and 0.244 GOPS/KGE (kilo-gate-equivalent), which makes an over 2.0x and 2.1x improvement, respectively, compared with that of previous related works. Implementation of the IEC control scheme uses only a 0.55% area of the 2.75 mm(2) core.
  •  
2.
  • Jin, Yi, et al. (författare)
  • Self-aware distributed deep learning framework for heterogeneous IoT edge devices
  • 2021
  • Ingår i: Future generations computer systems. - : Elsevier BV. - 0167-739X .- 1872-7115. ; 125, s. 908-920
  • Tidskriftsartikel (refereegranskat)abstract
    • Implementing artificial intelligence (AI) in the Internet of Things (IoT) involves a move from the cloud to the heterogeneous and low-power edge, following an urgent demand for deploying complex training tasks in a distributed and reliable manner. This work proposes a self-aware distributed deep learning (DDL) framework for IoT applications, which is applicable to heterogeneous edge devices aiming to improve adaptivity and amortize the training cost. The self-aware design including the dynamic self-organizing approach and the self-healing method enhances the system reliability and resilience. Three typical edge devices are adopted with cross-platform Docker deployment: Personal Computers (PC) for general computing devices, Raspberry Pi 4Bs (Rpi) for resource-constrained edge devices, and Jetson Nanos (Jts) for AI-enabled edge devices. Benchmarked with ResNet-32 on CIFAR-10, the training efficiency of tested distributed clusters is increased by 8.44x compared to the standalone Rpi. The cluster with 11 heterogeneous edge devices achieves a training efficiency of 200.4 images/s and an accuracy of 92.45%. Results prove that the self-organizing approach functions well with dynamic changes like devices being removed or added. The self-healing method is evaluated with various stabilities, cluster scales, and breakdown cases, testifying that the reliability can be largely enhanced for extensively distributed deployments. The proposed DDL framework shows excellent performance for training implementation with heterogeneous edge devices in IoT applications with high-degree scalability and reliability.
  •  
3.
  • Liu, Lizheng, et al. (författare)
  • A FPGA-based Hardware Accelerator for Bayesian Confidence Propagation Neural Network
  • 2020
  • Ingår i: 2020 IEEE Nordic Circuits and Systems Conference, NORCAS 2020 - Proceedings. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • The Bayesian Confidence Propagation Neural Network (BCPNN) has been applied in higher level of cognitive intelligence (e.g. working memory, associative memory). However, in the spike-based version of this learning rule the pre-, postsynaptic and coincident activity is traced in three low-passfiltering stages, the calculation processes of weight update are very computationally intensive. In this paper, a hardware architecture of the updating process for lazy update mode is proposed for updating 8 local synaptic state variables. The parallelism by decomposing the calculation steps of formulas based on the inherent data dependencies is optimized. The FPGA-based hardware accelerator of BCPNN is designed and implemented. The experimental results show the updating process on FPGA can be accomplished within 110 ns with a clock frequency of 200 MHz, the updating speed is greatly enhanced compared with the CPU test. The trade-off between performance, accuracy and resources on dedicated hardware is evaluated, and the impact of the module reuse on resource consumption and computing performance is evaluated.
  •  
4.
  • Liu, Lizheng, et al. (författare)
  • An Autonomous Error-Tolerant Architecture Featuring Self-reparation for Convolutional Neural Networks
  • 2020
  • Ingår i: Proceeding of the IEEE Vehicular Technology Conference. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • Convolutional neural networks are widely used in artificial intelligence and Internet of Things area. As the scale of convolutional neural network expands, more and more processing units are provided for it. The systems are easy prone to error, and any computing problems in any layer of the network will lead to wrong output results. Traditional multimode redundancy methods make the systems more complex, and increase power consumption. This paper proposes an autonomous error-tolerant architecture for convolutional neural networks. Taking the LeNet-5 as an example, the network layers of CNN are mapped on the AET architecture, an error-tolerant synapse is designed to discover the errors, an active evolution scheme is designed to handle unrecoverable errors and implement network reconfiguration. This design is implemented on FPGA, and the experimental results show that this architecture can realize effective error tolerance for convolutional neural network and has fast error recovery ability under the premise of ensuring the same recognition accuracy.
  •  
5.
  • Xu, Jianqiang, et al. (författare)
  • Design of Smart Unstaffed Retail Shop Based on IoT and Artificial Intelligence
  • 2020
  • Ingår i: IEEE Access. - : Institute of Electrical and Electronics Engineers (IEEE). - 2169-3536. ; 8, s. 147728-147737
  • Tidskriftsartikel (refereegranskat)abstract
    • Unstaffed retail shops have emerged recently and been noticeably changing our shopping styles. In terms of these shops, the design of vending machine is critical to user shopping experience. The conventional design typically uses weighing sensors incapable of sensing what the customer is taking. In the present study, a smart unstaffed retail shop scheme is proposed based on artificial intelligence and the internet of things, as an attempt to enhance the user shopping experience remarkably. To analyze multiple target features of commodities, the SSD (300x300) algorithm is employed; the recognition accuracy is further enhanced by adding sub-prediction structure. Using the data set of 18, 000 images in different practical scenarios containing 20 different type of stock keeping units, the comparison experimental results reveal that the proposed SSD (300x300) model outperforms than the original SSD (300x300) in goods detection, the mean average precision of the developed method reaches 96.1% on the test dataset, revealing that the system can make up for the deficiency of conventional unmanned container. The practical test shows that the system can meet the requirements of new retail, which greatly increases the customer flow and transaction volume.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-5 av 5

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy