Learning from Interactions: Forward and Inverse Decision-Making for Autonomous Dynamical Systems

↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Search: WFRF:(Wahlberg Bo Professor) > Learning from Inter...

Learning from Interactions : Forward and Inverse Decision-Making for Autonomous Dynamical Systems

de Miranda de Matos Lourenço, Inês, 1994- (author): KTH,Reglerteknik

Wahlberg, Bo, Professor, 1959- (thesis advisor): KTH,Reglerteknik

Hirche, Sandra, Professor (opponent): Technical University of Munich, Munich, Germany

(creator_code:org_t)

ISBN 9789180407434
Stockholm : KTH Royal Institute of Technology, 2023
English xvi, 215 s.
Series: TRITA-EECS-AVL ; 2023:76

Related links:: https://kth.diva-por... (primary) (Raw object); show more...; https://urn.kb.se/re...; show less...

Doctoral thesis (other academic/artistic)

Abstract Subject headings

Decision-making is the mechanism of using available information to generate solutions to given problems by forming preferences, beliefs, and selecting courses of action amongst several alternatives. In this thesis, we study the mechanisms that generate behavior (the forward problem) and how their characteristics can explain observed behavior (the inverse problem). Both problems play a pivotal role in contemporary research due to the desire to design sophisticated autonomous agents that serve as the building blocks for a smart society, amidst complexity, risk, and uncertainty. This work explores different parts of the autonomous decision-making process where agents learn from interacting with each other and the environment that surrounds them. We address fundamental problems of behavior modeling, parameter estimation in the form of beliefs, distributions, and reward functions, and then finally interactions with other agents; which lay the foundation for a complete and integrative framework for decision-making and learning. The thesis is divided into four parts, each featuring a different information exchange paradigm.First, we model the forward problem of how a decision-maker forms beliefs about the world and the inverse problem of estimating these beliefs from the agent’s behavior. The private belief (posterior distribution) on the state of the world is formed according to a hidden Markov model by filtering private information. The ability to estimate private beliefs forms a foundation for predicting and counteracting against future actions. We answer the problems of i) how the private belief of the decision-maker can be estimated by observing its decisions (under two different scenarios), and ii) how the decision-maker can protect its private belief from an adversary by confusing it. We exemplify the applicability of our frameworks in regime-switching Markovian portfolio allocation.In the second part, we study forward decision-making of biological systems and the inverse problem of how to obtain insight into their intrinsic characteristics. We focus on time perception – how humans and animals perceive the passage of time – and design a biologically-inspired decision-making framework using reinforcement learning that replicates timing mechanisms. We show that a simulated robot equipped with our framework is able to perceive time similarly to animals, and that by analyzing its performed actions we are able to estimate the parameters of timing mechanisms.Next, we consider teacher-student settings where a teacher agent can intervene with the decision-making process of a student agent to assist it in performing a task. In the third part, we propose correctional learning as an approach where the teacher can intercept the observations the student collects from the system and modify them to improve the estimation process of the student. We provide finite-sample results for batch correctional learning in system identification, generalize it to more complex systems using optimal transport, and lower-bound improvements on the estimate’s variance for the online case.Decision-making in teacher-student settings like the previous one requires both agents to have aligned models of understanding of each other. In the fourth and last part of this thesis, the teacher can, instead, alter the decisions of the decision-maker in a human-robot interaction setting. We use a confidence-based misalignment detection method that enables the robot to update its knowledge proportionally to its confidence in the human corrections and propose a framework to disambiguate between misalignment caused by incorrectly learned features that do not generalize to new environments and features entirely missing from the robot’s model. We demonstrate the proposed framework in a 7 degrees-of-freedom robot manipulator with physical human corrections and show how to initiate the model realignment process once misalignment is detected.

Beslutsfattande är en komplex process där tillgänglig information används för att skapa lösningar på givna problem. Denna process involverar bland annat bildande av preferenser och uppfattningar, samt val av handlingsstrategier bland flera olika alternativ. I den här avhandlingen utforskar vi mekanismerna som skapar beteende (det direkta problemet) och hur deras karaktär kan förklara observerade handlingar (inversproblemet). Båda dessa problem spelar en avgörande roll inom dagens forskning för utvecklandet av avancerade autonoma agenter, vilka utgör byggstenarna för ett smart samhälle som tar hänsyn till komplexitet, risk och osäkerhet. Detta arbete utforskar olika aspekter av den autonoma beslutsfattande processen där agenter lär sig genom att interagera med andra agenter och den miljö som omger dem. Vi tar oss en grundläggande problem inom beteendemodellering samt parameterskattning i form av uppfattningar, sannolikhetsfördelningar och belöningsfunktioner. Slutligen studerar vi även interaktioner med andra agenter, vilket lägger grunden för ett komplett och integrerat ramverk för beslutsfattande och lärande.I avhandlingens första del modellerar vi både det direkta problemet, där beslutsfattare bildar uppfattningar om sin omvärld, och inversproblemet, där dessa uppfattningar skattas utifrån agentens handlingar. Vi använder en dold Markov-modell för att filtrera privat information och skapa den privata uppfattningen (a posteriori-fördelning) om omvärldens tillstånd. Förmågan att skatta privata uppskattningar utgör en grund för att förutspå, och motverka, framtida handlingar. Vi diskuterar hur dessa privata uppfattningar kan skattas utifrån beslutsfattarens handlingar och hur beslutsfattaren kan skydda sina uppfattningar från en motståndare. Vi tillämpar vårt ramverk på ett problem om systemskiftande portföljallokering.I den andra delen studerar vi framåtriktat beslutsfattande i biologiska system och hur man kan utvinna insikter om deras egenskaper genom att lösa det inversa problemet. Vi fokuserar på tidsuppfattning, nämligen hur människor och djur uppfattar tidsförlopp. Inspirerade av biologiska system, utformar vi också ett beslutsfattande ramverk baserat på förstärkande inlärning som återskapar biologiska tidsmekanismer. Vi visar att en simulerad robot utrustad med vårt ramverk kan uppfatta samma tid som djur, och att vi genom att analysera dess utförda handlingar kan skatta parametrarna för biologiska tidssystem.I avhandlingens tredje del behandlar vi mästar-lärling situationer, där en expertagent (lärare) hjälper en elevagent utföra uppgifter genom att ingripa i dess beslutsfattande process. Vi föreslår korrigeringsinlärning, där en lärare observerar och modifierar den data som samlas in av elevagenten, med syftet att förbättra elevagentens skattningsprocess. Vi presenterar resultat från ett batch-scenario med ändlig samplingsmängd. Vi generaliserar även ramverket med hjälp av verktyg från optimal transport för tillämpning på skattningsproblem av högre komplexitet. Slutligen utvidgar vi ramverket för tillämpning i ett online-scenario och härleder i samband med det en nedre gräns för förbättringen av skattningens varians.I den fjärde och sista delen av denna avhandling kan läraren istället modifiera den beslutsfattande agentens handlingar i en människo-robot-interaktion. Vi använder en konfidensbaserad metod för att detektera avvikelser, som roboten sedan kan använda för att uppdatera sin kunskap. Vi presenterar ett ramverk för att skilja mellan avvikelser orsakade av felaktigt inlärda attribut som inte kan generaliseras till nya miljöer och attribut som saknas i robotens modell. Vi demonstrerar vårt ramverk genom att tillämpa det på en robotarm vars handlingar kan korrigeras av en människa. Vi visar även hur omjusteringsprocessen initieras när en avvikelse upptäcks.

Find in a library

Learning from Interactions Forward and Inverse Decision-Making for Autonomous Dy... (Search the publication in LIBRIS)

To the university's database

Find more in SwePub

By the author/editor: de Miranda de Ma ...; Wahlberg, Bo, Pr ...; Hirche, Sandra, ...

About the subject

ENGINEERING AND TECHNOLOGY: ENGINEERING AND ...; and Electrical Engin ...

Parts in the series: TRITA-EECS-AVL ;

By the university: Royal Institute of Technology

Search outside SwePub

Extend your search to:: Google; Google Book Search; Google Scholar

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se

Learning from Interactions : Forward and Inverse Decision-Making for Autonomous Dynamical Systems

Subject headings

Keyword

Publication and Content Type

Find in a library

To the university's database

Find more in SwePub

Search outside SwePub