Sökning: id:"swepub:oai:DiVA.org:kth-247830" >
SWIRL :
SWIRL : A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards
-
- Krishnan, Sanjay (författare)
- Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.
-
- Garg, Animesh (författare)
- Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.;Stanford Univ, Stanford, CA 94305 USA.
-
- Liaw, Richard (författare)
- Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.
-
visa fler...
-
- Thananjeyan, Brijen (författare)
- Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.
-
- Miller, Lauren (författare)
- Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.
-
- Pokorny, Florian T., 1980- (författare)
- KTH,Robotik, perception och lärande, RPL
-
- Goldberg, Ken (författare)
- Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.
-
visa färre...
-
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.;Stanford Univ, Stanford, CA 94305 USA. (creator_code:org_t)
- 2018-07-25
- 2019
- Engelska.
-
Ingår i: The international journal of robotics research. - : SAGE PUBLICATIONS LTD. - 0278-3649 .- 1741-3176. ; 38:2-3, s. 126-145
- Relaterad länk:
-
https://urn.kb.se/re...
-
visa fler...
-
https://doi.org/10.1...
-
visa färre...
Abstract
Ämnesord
Stäng
- We present sequential windowed inverse reinforcement learning (SWIRL), a policy search algorithm that is a hybrid of exploration and demonstration paradigms for robot learning. We apply unsupervised learning to a small number of initial expert demonstrations to structure future autonomous exploration. SWIRL approximates a long time horizon task as a sequence of local reward functions and subtask transition conditions. Over this approximation, SWIRL applies Q-learning to compute a policy that maximizes rewards. Experiments suggest that SWIRL requires significantly fewer rollouts than pure reinforcement learning and fewer expert demonstrations than behavioral cloning to learn a policy. We evaluate SWIRL in two simulated control tasks, parallel parking and a two-link pendulum. On the parallel parking task, SWIRL achieves the maximum reward on the task with 85% fewer rollouts than Q-learning, and one-eight of demonstrations needed by behavioral cloning. We also consider physical experiments on surgical tensioning and cutting deformable sheets using a da Vinci surgical robot. On the deformable tensioning task, SWIRL achieves a 36% relative improvement in reward compared with a baseline of behavioral cloning with segmentation.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences (hsv//eng)
Nyckelord
- Reinforcement learning
- inverse reinforcement learning
- learning from demonstrations
- medical robots and systems
Publikations- och innehållstyp
- ref (ämneskategori)
- art (ämneskategori)
Hitta via bibliotek
Till lärosätets databas