SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Gao Yulong)
 

Sökning: WFRF:(Gao Yulong) > (2023) > Policy Evaluation i...

Policy Evaluation in Distributional LQR

Wang, Zifan (författare)
KTH,Reglerteknik
Gao, Yulong (författare)
Department of Computer Science, University of Oxford, UK
Wang, Siyi (författare)
Information-oriented Control, Technical University of Munich, Germany
visa fler...
Zavlanos, Michael M. (författare)
Department of Mechanical Engineering and Materials Science, Duke University, USA
Abate, Alessandro (författare)
Department of Computer Science, University of Oxford, UK
Johansson, Karl H., 1967- (författare)
KTH,Reglerteknik
visa färre...
 (creator_code:org_t)
ML Research Press, 2023
2023
Engelska.
Ingår i: Proceedings of the 5th Annual Learning for Dynamics and Control Conference, L4DC 2023. - : ML Research Press. ; , s. 1245-1256
  • Konferensbidrag (refereegranskat)
Abstract Ämnesord
Stäng  
  • Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on discounted linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call distributional LQR. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)

Nyckelord

Distributional LQR
distributional RL
policy evaluation
risk-averse control

Publikations- och innehållstyp

ref (ämneskategori)
kon (ämneskategori)

Till lärosätets databas

Hitta mer i SwePub

Av författaren/redakt...
Wang, Zifan
Gao, Yulong
Wang, Siyi
Zavlanos, Michae ...
Abate, Alessandr ...
Johansson, Karl ...
Om ämnet
NATURVETENSKAP
NATURVETENSKAP
och Data och informa ...
och Datavetenskap
Artiklar i publikationen
Av lärosätet
Kungliga Tekniska Högskolan

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy