SwePub
Sök i LIBRIS databas

  Extended search

id:"swepub:oai:DiVA.org:kth-301017"
 

Search: id:"swepub:oai:DiVA.org:kth-301017" > SWIRL :

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist
  • Krishnan, S. (author)

SWIRL : A SequentialWindowed Inverse Reinforcement Learning Algorithm for Robot Tasks With Delayed Rewards

  • Article/chapterEnglish2020

Publisher, publication year, extent ...

  • 2020-05-07
  • Cham :Springer Nature,2020
  • printrdacarrier

Numbers

  • LIBRIS-ID:oai:DiVA.org:kth-301017
  • https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301017URI
  • https://doi.org/10.1007/978-3-030-43089-4_43DOI

Supplementary language notes

  • Language:English
  • Summary in:English

Part of subdatabase

Classification

  • Subject category:ref swepub-contenttype
  • Subject category:art swepub-publicationtype

Notes

  • QC 20210914
  • Inverse Reinforcement Learning (IRL) allows a robot to generalize from demonstrations to previously unseen scenarios by learning the demonstrator’s reward function. However, in multi-step tasks, the learned rewards might be delayed and hard to directly optimize. We present Sequential Windowed Inverse Reinforcement Learning (SWIRL), a three-phase algorithm that partitions a complex task into shorter-horizon subtasks based on linear dynamics transitions that occur consistently across demonstrations. SWIRL then learns a sequence of local reward functions that describe the motion between transitions. Once these reward functions are learned, SWIRL applies Q-learning to compute a policy that maximizes the rewards. We compare SWIRL (demonstrations to segments to rewards) with Supervised Policy Learning (SPL - demonstrations to policies) and Maximum Entropy IRL (MaxEnt-IRL demonstrations to rewards) on standard Reinforcement Learning benchmarks: Parallel Parking with noisy dynamics, Two-Link acrobot, and a 2D GridWorld. We find that SWIRL converges to a policy with similar success rates (60%) in 3x fewer time-steps than MaxEnt-IRL, and requires 5x fewer demonstrations than SPL. In physical experiments using the da Vinci surgical robot, we evaluate the extent to which SWIRL generalizes from linear cutting demonstrations to cutting sequences of curved paths.

Subject headings and genre

Added entries (persons, corporate bodies, meetings, titles ...)

  • Garg, A. (author)
  • Liaw, R. (author)
  • Thananjeyan, B. (author)
  • Miller, L. (author)
  • Pokorny, Florian T.,1980-KTH,Robotik, perception och lärande, RPL(Swepub:kth)u1sxkfwe (author)
  • Goldberg, K. (author)
  • KTHRobotik, perception och lärande, RPL (creator_code:org_t)

Related titles

  • In:Springer Proceedings in Advanced RoboticsCham : Springer Nature13, s. 672-6872511-1256

Internet link

Find in a library

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view