SwePub
Sök i LIBRIS databas

  Utökad sökning

id:"swepub:oai:DiVA.org:kth-247830"
 

Sökning: id:"swepub:oai:DiVA.org:kth-247830" > SWIRL :

SWIRL : A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards

Krishnan, Sanjay (författare)
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.
Garg, Animesh (författare)
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.;Stanford Univ, Stanford, CA 94305 USA.
Liaw, Richard (författare)
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.
visa fler...
Thananjeyan, Brijen (författare)
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.
Miller, Lauren (författare)
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.
Pokorny, Florian T., 1980- (författare)
KTH,Robotik, perception och lärande, RPL
Goldberg, Ken (författare)
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.
visa färre...
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.;Stanford Univ, Stanford, CA 94305 USA. (creator_code:org_t)
2018-07-25
2019
Engelska.
Ingår i: The international journal of robotics research. - : SAGE PUBLICATIONS LTD. - 0278-3649 .- 1741-3176. ; 38:2-3, s. 126-145
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • We present sequential windowed inverse reinforcement learning (SWIRL), a policy search algorithm that is a hybrid of exploration and demonstration paradigms for robot learning. We apply unsupervised learning to a small number of initial expert demonstrations to structure future autonomous exploration. SWIRL approximates a long time horizon task as a sequence of local reward functions and subtask transition conditions. Over this approximation, SWIRL applies Q-learning to compute a policy that maximizes rewards. Experiments suggest that SWIRL requires significantly fewer rollouts than pure reinforcement learning and fewer expert demonstrations than behavioral cloning to learn a policy. We evaluate SWIRL in two simulated control tasks, parallel parking and a two-link pendulum. On the parallel parking task, SWIRL achieves the maximum reward on the task with 85% fewer rollouts than Q-learning, and one-eight of demonstrations needed by behavioral cloning. We also consider physical experiments on surgical tensioning and cutting deformable sheets using a da Vinci surgical robot. On the deformable tensioning task, SWIRL achieves a 36% relative improvement in reward compared with a baseline of behavioral cloning with segmentation.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences (hsv//eng)

Nyckelord

Reinforcement learning
inverse reinforcement learning
learning from demonstrations
medical robots and systems

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy