SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Källström Johan 1976 ) srt2:(2022)"

Sökning: WFRF:(Källström Johan 1976 ) > (2022)

  • Resultat 1-2 av 2
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Hayes, Conor F., et al. (författare)
  • A practical guide to multi-objective reinforcement learning and planning
  • 2022
  • Ingår i: Autonomous Agents and Multi-Agent Systems. - New York, NY, United States : Springer. - 1387-2532 .- 1573-7454. ; 36:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.
  •  
2.
  • Vamplew, Peter, et al. (författare)
  • Scalar reward is not enough : a response to Silver, Singh, Precup and Sutton (2021)
  • 2022
  • Ingår i: Autonomous Agents and Multi-Agent Systems. - : Springer. - 1387-2532 .- 1573-7454. ; 36:2
  • Tidskriftsartikel (refereegranskat)abstract
    • The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-2 av 2

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy