Sökning: onr:"swepub:oai:DiVA.org:kth-278437" >
An Efficient Accele...
An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective
-
- Chen, Qinyu (författare)
- Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
-
- Huang, Yan (författare)
- Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
-
- Sun, Rui (författare)
- Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
-
visa fler...
-
- Song, Wenqing (författare)
- Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
-
- Lu, Zhonghai (författare)
- KTH,Elektronik och inbyggda system
-
- Fu, Yuxiang (författare)
- Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
-
- Li, Li (författare)
- Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
-
visa färre...
-
Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China Elektronik och inbyggda system (creator_code:org_t)
- Institute of Electrical and Electronics Engineers (IEEE), 2020
- 2020
- Engelska.
-
Ingår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems. - : Institute of Electrical and Electronics Engineers (IEEE). - 1063-8210 .- 1557-9999. ; 28:6, s. 1540-1544
- Relaterad länk:
-
https://urn.kb.se/re...
-
visa fler...
-
https://doi.org/10.1...
-
visa färre...
Abstract
Ämnesord
Stäng
- Convolutional neural networks (CNNs) have emerged as one of the most popular ways applied in many fields. These networks deliver better performance when going deeper and larger. However, the complicated computation and huge storage impede hardware implementation. To address the problem, quantized networks are proposed. Besides, various convolutional structures are designed to meet the requirements of different applications. For example, compared with the traditional convolutions (CONVs) for image classification, CONVs for image generation are usually composed of traditional CONVs, dilated CONVs, and transposed CONVs, leading to a difficult hardware mapping problem. In this brief, we translate the difficult mapping problem into the sparsity problem and propose an efficient hardware architecture for sparse binary and ternary CNNs by exploiting the sparsity and low bit-width characteristics. To this end, we propose an ineffectual data removing (IDR) mechanism to remove both the regular and irregular sparsity based on dual-channel processing elements (PEs). Besides, a flexible layered load balance (LLB) mechanism is introduced to alleviate the load imbalance. The accelerator is implemented with 65-nm technology with a core size of 2.56 mm(2). It can achieve 3.72-TOPS/W energy efficiency at 50.1 mW, which makes it a promising design for embedded devices.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Datorseende och robotik (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Computer Vision and Robotics (hsv//eng)
Nyckelord
- Data processing
- Computer architecture
- Very large scale integration
- Hardware
- Registers
- Microsoft Windows
- Kernel
- Dilated convolutions (DCONVs) and transposed convolutions (TCONVs)
- load balance
- sparsity
- VLSI
Publikations- och innehållstyp
- ref (ämneskategori)
- art (ämneskategori)
Hitta via bibliotek
Till lärosätets databas