SwePub
Sök i LIBRIS databas

  Utökad sökning

id:"swepub:oai:research.chalmers.se:d8f56696-ca12-4fd2-903e-55f7cd78bccb"
 

Sökning: id:"swepub:oai:research.chalmers.se:d8f56696-ca12-4fd2-903e-55f7cd78bccb" > An Automated and Co...

An Automated and Controlled Numerical Precision Reduction Framework for GPUs

Angerd, Alexandra, 1988 (författare)
Chalmers tekniska högskola,Chalmers University of Technology
 (creator_code:org_t)
Gothenburg, 2018
Engelska.
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)
Abstract Ämnesord
Stäng  
  • Reducing the precision of floating-point values is an effective approach to achieve higher performance as well as higher energy-efficiency. This is especially true for GPUs, since many of its common tasks are inherently insensitive to precision-reduction. A substantially lower bitwidth can open up for many novel microarchitectural optimizations such as resource-efficient register files, functional units, and cache memory subsystems. However, to reduce the precision of floating-point values in a controlled manner, a connection has to be established between the application and the microarchitecture, since it is decided at the application level if deviations from the exact answer is tolerable. This thesis proposes a GPU framework which establishes such a connection. The first part of the framework consists of a method for automatically selecting an appropriate precision for each floating-point value given the tolerable output deviation. The results show that by allowing a small, but acceptable, degradation of output quality, the number of bits needed to represent the floating-point values can be significantly reduced. The second part of the framework is a novel GPU register file organization together with a register allocation algorithm capable of leveraging the precision-reduced floats given by the first part of the framework. The register allocation algorithm uses the precision-reduced floats to lower the register footprint of each thread. This is of great importance for GPUs since, unlike traditional CPU architectures, GPUs hide latency by keeping a large number of threads in flight simultaneously. Also, to enable fast context switching, the state of all active threads are readily available in the register file. As the thread register footprint limits the number of active threads, it might impede latency hiding. Our evaluation shows that the increase in active threads is translated into a significant performance improvement when using our proposed GPU register file organization, for a smaller cost than increasing the number of threads by using a larger register file.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datorteknik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Engineering (hsv//eng)
NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
TEKNIK OCH TEKNOLOGIER  -- Elektroteknik och elektronik -- Datorsystem (hsv//swe)
ENGINEERING AND TECHNOLOGY  -- Electrical Engineering, Electronic Engineering, Information Engineering -- Computer Systems (hsv//eng)

Nyckelord

Microarchitecture
Floating-Point Precision
Approximate Computing
Register File
GPU

Publikations- och innehållstyp

lic (ämneskategori)
vet (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Hitta mer i SwePub

Av författaren/redakt...
Angerd, Alexandr ...
Om ämnet
NATURVETENSKAP
NATURVETENSKAP
och Data och informa ...
och Datorteknik
NATURVETENSKAP
NATURVETENSKAP
och Data och informa ...
och Datavetenskap
TEKNIK OCH TEKNOLOGIER
TEKNIK OCH TEKNO ...
och Elektroteknik oc ...
och Datorsystem
Av lärosätet
Chalmers tekniska högskola

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy