SwePub
Sök i LIBRIS databas

  Utökad sökning

onr:"swepub:oai:DiVA.org:kth-278437"
 

Sökning: onr:"swepub:oai:DiVA.org:kth-278437" > An Efficient Accele...

An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective

Chen, Qinyu (författare)
Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
Huang, Yan (författare)
Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
Sun, Rui (författare)
Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
visa fler...
Song, Wenqing (författare)
Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
Lu, Zhonghai (författare)
KTH,Elektronik och inbyggda system
Fu, Yuxiang (författare)
Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
Li, Li (författare)
Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China.
visa färre...
Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China Elektronik och inbyggda system (creator_code:org_t)
Institute of Electrical and Electronics Engineers (IEEE), 2020
2020
Engelska.
Ingår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems. - : Institute of Electrical and Electronics Engineers (IEEE). - 1063-8210 .- 1557-9999. ; 28:6, s. 1540-1544
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • Convolutional neural networks (CNNs) have emerged as one of the most popular ways applied in many fields. These networks deliver better performance when going deeper and larger. However, the complicated computation and huge storage impede hardware implementation. To address the problem, quantized networks are proposed. Besides, various convolutional structures are designed to meet the requirements of different applications. For example, compared with the traditional convolutions (CONVs) for image classification, CONVs for image generation are usually composed of traditional CONVs, dilated CONVs, and transposed CONVs, leading to a difficult hardware mapping problem. In this brief, we translate the difficult mapping problem into the sparsity problem and propose an efficient hardware architecture for sparse binary and ternary CNNs by exploiting the sparsity and low bit-width characteristics. To this end, we propose an ineffectual data removing (IDR) mechanism to remove both the regular and irregular sparsity based on dual-channel processing elements (PEs). Besides, a flexible layered load balance (LLB) mechanism is introduced to alleviate the load imbalance. The accelerator is implemented with 65-nm technology with a core size of 2.56 mm(2). It can achieve 3.72-TOPS/W energy efficiency at 50.1 mW, which makes it a promising design for embedded devices.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datorseende och robotik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Vision and Robotics (hsv//eng)

Nyckelord

Data processing
Computer architecture
Very large scale integration
Hardware
Registers
Microsoft Windows
Kernel
Dilated convolutions (DCONVs) and transposed convolutions (TCONVs)
load balance
sparsity
VLSI

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy