SwePub
Sök i LIBRIS databas

  Extended search

id:"swepub:oai:DiVA.org:liu-53354"
 

Search: id:"swepub:oai:DiVA.org:liu-53354" > Greedy adaptive cri...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist
  • Landelius, Tomasn/a (author)

Greedy adaptive critics for LPQ [dvs LQR] problems : Convergence Proofs

  • BookEnglish1996

Publisher, publication year, extent ...

  • Linköping, Sweden :Linköping University, Department of Electrical Engineering,1996
  • 20 s.
  • electronicrdacarrier

Numbers

  • LIBRIS-ID:oai:DiVA.org:liu-53354
  • https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-53354URI

Supplementary language notes

  • Language:English
  • Summary in:English

Part of subdatabase

Classification

  • Subject category:vet swepub-contenttype
  • Subject category:rap swepub-publicationtype

Series

  • LiTH-ISY-R,1400-3902 ;1896

Notes

  • A number of success stories have been told where reinforcement learning has been applied to problems in continuous state spaces using neural nets or other sorts of function approximators in the adaptive critics. However, the theoretical understanding of why and when these algorithms work is inadequate. This is clearly exemplified by the lack of convergence results for a number of important situations. To our knowledge only two such results been presented for systems in the continuous state space domain. The first is due to Werbos and is concerned with linear function approximation and heuristic dynamic programming. Here no optimal strategy can be found why the result is of limited importance. The second result is due to Bradtke and deals with linear quadratic systems and quadratic function approximators. Bradtke's proof is limited to ADHDP and policy iteration techniques where the optimal solution is found by a number of successive approximations. This paper deals with greedy techniques, where the optimal solution is directly aimed for. Convergence proofs for a number of adaptive critics, HDP, DHP, ADHDP and ADDHP, are presented. Optimal controllers for linear quadratic regulation (LQR) systems can be found by standard techniques from control theory but the assumptions made in control theory can be weakened if adaptive critic techniques are employed. The main point of this paper is, however, not to emphasize the differences but to highlight the similarities and by so doing contribute to a theoretical understanding of adaptive critics.

Subject headings and genre

  • Linear quadratic regulation
  • Reinforcement learning
  • TECHNOLOGY
  • TEKNIKVETENSKAP

Added entries (persons, corporate bodies, meetings, titles ...)

  • Knutsson, HansLinköpings universitet,Bildbehandling,Tekniska högskolan(Swepub:liu)hankn25 (author)
  • n/aBildbehandling (creator_code:org_t)

Internet link

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Find more in SwePub

By the author/editor
Landelius, Tomas
Knutsson, Hans
Parts in the series
LiTH-ISY-R,
By the university
Linköping University

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view