SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Källström Johan 1976 ) srt2:(2023)"

Sökning: WFRF:(Källström Johan 1976 ) > (2023)

  • Resultat 1-5 av 5
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Hayes, Conor F., et al. (författare)
  • A Brief Guide to Multi-Objective Reinforcement Learning and Planning
  • 2023
  • Ingår i: Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS). - 9781450394321 ; , s. 1988-1990
  • Konferensbidrag (refereegranskat)abstract
    • Real-world sequential decision-making tasks are usually complex, and require trade-offs between multiple–often conflicting–objectives. However, the majority of research in reinforcement learning (RL) and decision-theoretic planning assumes a single objective, or that multiple objectives can be handled via a predefined weighted sum over the objectives. Such approaches may oversimplify the underlying problem, and produce suboptimal results. This extended abstract outlines the limitations of using a semi-blind iterative process to solve multi-objective decision making problems. Our extended paper [4], serves as a guide for the application of explicitly multi-objective methods to difficult problems.
  •  
2.
  • Källström, Johan, 1976-, et al. (författare)
  • Model-Based Actor-Critic for Multi-Objective Reinforcement Learning with Dynamic Utility Functions
  • 2023
  • Ingår i: Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS). - 9781450394321 ; , s. 2818-2820
  • Konferensbidrag (refereegranskat)abstract
    • Many real-world problems require a trade-off between multiple conflicting objectives. Decision-makers’ preferences over solutions to such problems are determined by their utility functions, which convert multi-objective values to scalars. In some settings, utility functions change over time, and the goal is to find methods that can efficiently adapt an agent’s policy to changes in utility. Previous work on learning with dynamic utility functions has focused on model-free methods, which often suffer from poor sample efficiency. In this work, we instead propose a model-based actor-critic, which explores with diverse utility functions through imagined rollouts within a learned world model between interactions with the real environment. An experimental evaluation on Minecart, a well-known benchmark for multi-objective reinforcement learning, shows that by learning a model of the environment the quality of the agent’s policy is improved compared to model-free algorithms.
  •  
3.
  • Källström, Johan, 1976-, et al. (författare)
  • Model-Based Multi-Objective Reinforcement Learning with Dynamic Utility Functions
  • 2023
  • Ingår i: Proceedings of the Adaptive and Learning Agents Workshop (ALA) at AAMAS 2023. ; , s. 1-9
  • Konferensbidrag (refereegranskat)abstract
    • Many real-world problems require a trade-off between multiple conflicting objectives. Decision-makers’ preferences over solutions to such problems are determined by their utility functions, which convert multi-objective values to scalars. In some settings, utility functions change over time, and the goal is to find methods that can efficiently adapt an agent’s policy to changes in utility. Previous work on learning with dynamic utility functions has focused on model-free methods, which often suffer from poor sample efficiency. In this work, we instead propose a model-based actor-critic, which explores with diverse utility functions through imagined rollouts within a learned world model between interactions with the real environment. An experimental evaluation shows that by learning a model of the environment the performance of the agent can be improved compared to model-free algorithms.
  •  
4.
  • Källström, Johan, 1976- (författare)
  • Reinforcement Learning for Improved Utility of Simulation-Based Training
  • 2023
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Team training in complex domains often requires a substantial number of resources, e.g. vehicles, machines, and role-players. For this reason, it may be difficult to realise efficient and effective training scenarios in a real-world setting. Instead, part of the training can be conducted in synthetic, computer-generated environments. In these environments trainees can operate simulators instead of real vehicles, while synthetic actors can replace human role-players to increase the complexity of the simulated scenario at low operating cost. However, constructing behaviour models for synthetic actors is challenging, especially for the end users, who typically do not have expertise in artificial intelligence. In this dissertation, we study how machine learning can be used to simplify the construction of intelligent agents for simulation-based training. A simulation-based air combat training system is used as case study. The contributions of the dissertation are divided into two parts. The first part aims at improving the understanding of reinforcement learning in the domain of simulation-based training. First, a user-study is conducted to identify important capabilities and characteristics of learning agents that are intended to support training of fighter pilots. It is identified that one of the most important capabilities of learning agents in the context of simulation-based training is that their behaviour can be adapted to different phases of training, as well as to the training needs of individual human trainees. Second, methods for learning how to coordinate with other agents are studied in simplified training scenarios, to investigate how the design of the agent’s observation space, action space, and reward signal affects the performance of learning. It is identified that temporal abstractions and hierarchical reinforcement learning can improve the efficiency of learning, while also providing support for modelling of doctrinal behaviour. In more complex settings, curriculum learning and related methods are expected to help find novel tactics even when sparse, abstract reward signals are used. Third, based on the results from the user study and the practical experiments, a system concept for a user-adaptive training system is developed to support further research. The second part of the contributions focuses on methods for utility-based multi-objective reinforcement learning, which incorporates knowledge of the user’s utility function in the search for policies that balance multiple conflicting objectives. Two new agents for multi-objective reinforcement learning are proposed: the Tunable Actor (T-Actor) and the Multi-Objective Dreamer (MO-Dreamer). T-Actor provides decision support to instructors by learning a set of Pareto optimal policies, represented by a single neural network conditioned on objective preferences. This enables tuning of the agent’s behaviour to fit trainees’ current training needs. Experimental evaluations in gridworlds and in the target system show that T-Actor reduces the number of training steps required for learning. MO-Dreamer adapts online to changes in users’ utility, e.g. changes in training needs. It does so by learning a model of the environment, which it can use for anticipatory rollouts with a diverse set of utility functions to explore which policy to follow to optimise the return for a given set of objective preferences. An experimental evaluation shows that MO-Dreamer outperforms prior model-free approaches in terms of experienced regret, for frequent as well as sparse changes in utility. Overall, the research conducted in this dissertation contributes to improved knowledge about how to apply machine learning methods to construction of simulation-based training environments. While our focus was on air combat training, the results are general enough to be applicable in other domains. 
  •  
5.
  • Vamplew, Peter, et al. (författare)
  • Scalar Reward is Not Enough
  • 2023
  • Ingår i: Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS). - 9781450394321 ; , s. 839-841
  • Konferensbidrag (refereegranskat)abstract
    • Silver et al.[14] posit that scalar reward maximisation is sufficient to underpin all intelligence and provides a suitable basis for artificial general intelligence (AGI). This extended abstract summarises the counter-argument from our JAAMAS paper [19].
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-5 av 5

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy