Tyck till om SwePub Sök
här!
Sökning: id:"swepub:oai:DiVA.org:kth-260226" >
Smilodon :
Smilodon : An Efficient Accelerator for Low Bit-Width CNNs with Task Partitioning
-
- Chen, Qinyu (författare)
- Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China.
-
- Fu, Yuxiang (författare)
- Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China.
-
- Cheng, Kaifeng (författare)
- Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China.
-
visa fler...
-
- Song, Wenqing (författare)
- Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China.
-
- Lu, Zhonghai (författare)
- KTH,Elektronik och inbyggda system
-
- Li, Li (författare)
- Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China.
-
- Zhang, Chuan (författare)
- Southeast Univ, Lab Efficient Architectures Digital Commun & Sign, Nanjing, Jiangsu, Peoples R China.;Southeast Univ, Natl Mobile Commun Res Lab, Nanjing, Jiangsu, Peoples R China.
-
visa färre...
-
Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China Elektronik och inbyggda system (creator_code:org_t)
- IEEE, 2019
- 2019
- Engelska.
-
Ingår i: 2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS). - : IEEE. - 9781728103976
- Relaterad länk:
-
https://urn.kb.se/re...
-
visa fler...
-
https://doi.org/10.1...
-
visa färre...
Abstract
Ämnesord
Stäng
- Convolutional Neural Networks (CNNs) have been widely applied in various fields such as image and video recognition, recommender systems, and natural language processing. However, the massive size and intensive computation loads prevent its feasible deployment in practice, especially on the embedded systems. As a highly competitive candidate, low bit-width CNNs are proposed to enable efficient implementation. In this paper, we propose Smilodon, a scalable, efficient accelerator for low bit-width CNNs based on a parallel streaming architecture, optimized with a task partitioning strategy. We also present the 3D systolic-like computing arrays fitting for convolutional layers. Our design is implemented on Zynq XC7ZO20 FPGA, which can satisfy the needs of real-time with a frame rate of 1, 622 FPS throughput, while consuming 2.1 Watt. To the best of our knowledge, our accelerator is superior to the state-of-the-art works in the tradeoff among throughput, power efficiency, and area efficiency.
Ämnesord
- TEKNIK OCH TEKNOLOGIER -- Elektroteknik och elektronik (hsv//swe)
- ENGINEERING AND TECHNOLOGY -- Electrical Engineering, Electronic Engineering, Information Engineering (hsv//eng)
Nyckelord
- Low bit-width CNNs
- 3D systolic-like array
- task partitioning
- parallel streaming architecture
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)
Hitta via bibliotek
Till lärosätets databas