SwePub
Sök i LIBRIS databas

  Extended search

WFRF:(Lu Zhonghai)
 

Search: WFRF:(Lu Zhonghai) > Smilodon :

Smilodon : An Efficient Accelerator for Low Bit-Width CNNs with Task Partitioning

Chen, Qinyu (author)
Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China.
Fu, Yuxiang (author)
Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China.
Cheng, Kaifeng (author)
Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China.
show more...
Song, Wenqing (author)
Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China.
Lu, Zhonghai (author)
KTH,Elektronik och inbyggda system
Li, Li (author)
Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China.
Zhang, Chuan (author)
Southeast Univ, Lab Efficient Architectures Digital Commun & Sign, Nanjing, Jiangsu, Peoples R China.;Southeast Univ, Natl Mobile Commun Res Lab, Nanjing, Jiangsu, Peoples R China.
show less...
Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Jiangsu, Peoples R China Elektronik och inbyggda system (creator_code:org_t)
IEEE, 2019
2019
English.
In: 2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS). - : IEEE. - 9781728103976
  • Conference paper (peer-reviewed)
Abstract Subject headings
Close  
  • Convolutional Neural Networks (CNNs) have been widely applied in various fields such as image and video recognition, recommender systems, and natural language processing. However, the massive size and intensive computation loads prevent its feasible deployment in practice, especially on the embedded systems. As a highly competitive candidate, low bit-width CNNs are proposed to enable efficient implementation. In this paper, we propose Smilodon, a scalable, efficient accelerator for low bit-width CNNs based on a parallel streaming architecture, optimized with a task partitioning strategy. We also present the 3D systolic-like computing arrays fitting for convolutional layers. Our design is implemented on Zynq XC7ZO20 FPGA, which can satisfy the needs of real-time with a frame rate of 1, 622 FPS throughput, while consuming 2.1 Watt. To the best of our knowledge, our accelerator is superior to the state-of-the-art works in the tradeoff among throughput, power efficiency, and area efficiency.

Subject headings

TEKNIK OCH TEKNOLOGIER  -- Elektroteknik och elektronik (hsv//swe)
ENGINEERING AND TECHNOLOGY  -- Electrical Engineering, Electronic Engineering, Information Engineering (hsv//eng)

Keyword

Low bit-width CNNs
3D systolic-like array
task partitioning
parallel streaming architecture

Publication and Content Type

ref (subject category)
kon (subject category)

Find in a library

To the university's database

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view