SwePub
Sök i LIBRIS databas

  Extended search

onr:"swepub:oai:DiVA.org:kth-321316"
 

Search: onr:"swepub:oai:DiVA.org:kth-321316" > Enabling Energy-Eff...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Enabling Energy-Efficient Inference for Self-Attention Mechanisms in Neural Networks

Chen, Qinyu (author)
Univ Shanghai Sci & Technol, Inst Photon Chips, Shanghai, Peoples R China.
Sun, Congyi (author)
Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Peoples R China.
Lu, Zhonghai (author)
KTH,Elektronik och inbyggda system
show more...
Gao, Chang (author)
Univ Zurich, Inst Neuroinformat, Zurich, Switzerland.;Swiss Fed Inst Technol, Zurich, Switzerland.
show less...
Univ Shanghai Sci & Technol, Inst Photon Chips, Shanghai, Peoples R China Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Peoples R China. (creator_code:org_t)
Institute of Electrical and Electronics Engineers (IEEE), 2022
2022
English.
In: 2022 Ieee International Conference On Artificial Intelligence Circuits And Systems (Aicas 2022). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 25-28
  • Conference paper (peer-reviewed)
Abstract Subject headings
Close  
  • The study of specialized accelerators tailored for neural networks is becoming a promising topic in recent years. Such existing neural network accelerators are usually designed for convolutional neural networks (CNNs) or recurrent neural networks have been (RNNs), however, less attention has been paid to the attention mechanisms, which is an emerging neural network primitive with the ability to identify the relations within input entities. The self-attention-oriented models such as Transformer have achieved great performance on natural language processing, computer vision and machine translation. However, the self-attention mechanism has intrinsically expensive computational workloads, which increase quadratically with the number of input entities. Therefore, in this work, we propose an software-hardware co-design solution for energy-efficient self-attention inference. A prediction-based approximate self-attention mechanism is introduced to substantially reduce the runtime as well as power consumption, and then a specialized hardware architecture is designed to further increase the speedup. The design is implemented on a Xilinx XC7Z035 FPGA, and the results show that the energy efficiency is improved by 5.7x with less than 1% accuracy loss.

Subject headings

TEKNIK OCH TEKNOLOGIER  -- Elektroteknik och elektronik (hsv//swe)
ENGINEERING AND TECHNOLOGY  -- Electrical Engineering, Electronic Engineering, Information Engineering (hsv//eng)

Keyword

Self-attention
approximate computing
VLSI

Publication and Content Type

ref (subject category)
kon (subject category)

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Find more in SwePub

By the author/editor
Chen, Qinyu
Sun, Congyi
Lu, Zhonghai
Gao, Chang
About the subject
ENGINEERING AND TECHNOLOGY
ENGINEERING AND ...
and Electrical Engin ...
Articles in the publication
By the university
Royal Institute of Technology

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view