SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Baumann Dominik Ph.D. 1991 ) "

Sökning: WFRF:(Baumann Dominik Ph.D. 1991 )

  • Resultat 1-7 av 7
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Baumann, Dominik, Ph.D. 1991-, et al. (författare)
  • A computationally lightweight safe learning algorithm
  • 2023
  • Ingår i: 2023 62nd IEEE Conference on Decision and Control, (CDC). - : Institute of Electrical and Electronics Engineers (IEEE). - 9798350301243 - 9798350301250 ; , s. 1022-1027
  • Konferensbidrag (refereegranskat)abstract
    • Safety is an essential asset when learning control policies for physical systems, as violating safety constraints during training can lead to expensive hardware damage. In response to this need, the field of safe learning has emerged with algorithms that can provide probabilistic safety guarantees without knowledge of the underlying system dynamics. Those algorithms often rely on Gaussian process inference. Unfortunately, Gaussian process inference scales cubically with the number of data points, limiting applicability to high-dimensional and embedded systems. In this paper, we propose a safe learning algorithm that provides probabilistic safety guarantees but leverages the Nadaraya-Watson estimator instead of Gaussian processes. For the Nadaraya-Watson estimator, we can reach logarithmic scaling with the number of data points. We provide theoretical guarantees for the estimates, embed them into a safe learning algorithm, and show numerical experiments on a simulated seven-degrees-of-freedom robot manipulator.
  •  
2.
  •  
3.
  • Baumann, Dominik, Ph.D. 1991-, et al. (författare)
  • Safe Reinforcement Learning in Uncertain Contexts
  • 2024
  • Ingår i: IEEE Transactions on robotics. - : IEEE. - 1552-3098 .- 1941-0468. ; 40, s. 1828-1841
  • Tidskriftsartikel (refereegranskat)abstract
    • When deploying machine learning algorithms in the real world, guaranteeing safety is an essential asset. Existing safe learning approaches typically consider continuous variables, i.e., regression tasks. However, in practice, robotic systems are also subject to discrete, external environmental changes, e.g., having to carry objects of certain weights or operating on frozen, wet, or dry surfaces. Such influences can be modeled as discrete context variables. In the existing literature, such contexts are, if considered, mostly assumed to be known. In this work, we drop this assumption and show how we can perform safe learning when we cannot directly measure the context variables. To achieve this, we derive frequentist guarantees for multiclass classification, allowing us to estimate the current context from measurements. Furthermore, we propose an approach for identifying contexts through experiments. We discuss under which conditions we can retain theoretical guarantees and demonstrate the applicability of our algorithm on a Furuta pendulum with camera measurements of different weights that serve as contexts.
  •  
4.
  • Gräfe, Alexander, et al. (författare)
  • Towards remote fault detection by analyzing communication priorities
  • 2022
  • Ingår i: 2022 IEEE 61st Conference on Decision and Control (CDC). - : Institute of Electrical and Electronics Engineers (IEEE). - 9781665467612 - 9781665467605 - 9781665467629 ; , s. 1758-1763
  • Konferensbidrag (refereegranskat)abstract
    • The ability to detect faults is an important safety feature for event-based multi-agent systems. In most existing algorithms, each agent tries to detect faults by checking its own behavior. But what if one agent becomes unable to recognize misbehavior, for example due to failure in its onboard fault detection? To improve resilience and avoid propagation of individual errors to the multi-agent system, agents should check each other remotely for malfunction or misbehavior. In this paper, we build upon a recently proposed predictive triggering architecture that involves communication priorities shared throughout the network to manage limited bandwidth. We propose a fault detection method that uses these priorities to detect errors in other agents. The resulting algorithms is not only able to detect faults, but can also run on a low-power microcontroller in real-time, as we demonstrate in hardware experiments.
  •  
5.
  • Hulme, Oliver, et al. (författare)
  • Reply to "The Limitations of Growth-Optimal Approaches to Decision Making Under Uncertainty"
  • 2023
  • Ingår i: Econ Journal Watch. - : Institute of Spontaneous Order Economics. - 1933-527X. ; 20:2, s. 335-348
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • In an article appearing concurrently with the present one, Matthew Ford and John Kay put forward their understanding of a decision theory which emerges in ergodicity economics. Their understanding leads them to believe that ergodicity economics evades the core problem of decisions under uncertainty and operates solely in a regime where there is no measurable uncertainty. If this were the case, then the authors' critical stance would be justified and, as the authors point out, the decision theory would yield only trivial results, identical to a flavor of expected-utility theory. Here we clarify that the critique is based on a theoretical misunderstanding, and that uncertainty-quantified in any reasonable way-is large in the regime where the model operates. Our resolution explains the success of recent laboratory experiments, where ergodicity economics makes predictions different from expected-utility theory, contrary to the claim of equivalence by Ford and Kay. Also, a state of the world is identified where ergodicity economics outperforms expected-utility theory empirically.
  •  
6.
  • Sukhija, Bhavya, et al. (författare)
  • GOSAFEOPT : Scalable safe exploration for global optimization of dynamical systems
  • 2023
  • Ingår i: Artificial Intelligence. - : Elsevier BV. - 0004-3702 .- 1872-7921. ; 320
  • Tidskriftsartikel (refereegranskat)abstract
    • Learning optimal control policies directly on physical systems is challenging. Even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. This work proposes GOSAFEOPT as the first provably safe and optimal algorithm that can safely discover globally optimal policies for systems with high-dimensional state space. We demonstrate the superiority of GOSAFEOPT over competing model-free safe learning methods in simulation and hardware experiments on a robot arm.(c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons .org /licenses /by /4 .0/).
  •  
7.
  • Weichwald, Sebastian, et al. (författare)
  • Learning by Doing : Controlling a Dynamical System using Causality, Control, and Reinforcement Learning
  • 2022
  • Ingår i: Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track. - : PMLR. ; 176, s. 246-258
  • Konferensbidrag (refereegranskat)abstract
    • Questions in causality, control, and reinforcement learning go beyond the classical machine learning task of prediction under i.i.d. observations. Instead, these fields consider the problem of learning how to actively perturb a system to achieve a certain effect on a response variable. Arguably, they have complementary views on the problem: In control, one usually aims to first identify the system by excitation strategies to then apply model-based design techniques to control the system. In (non-model-based) reinforcement learning, one directly optimizes a reward. In causality, one focus is on identifiability of causal structure. We believe that combining the different views might create synergies and this competition is meant as a first step toward such synergies. The participants had access to observational and (offline) interventional data generated by dynamical systems. Track CHEM considers an open-loop problem in which a single impulse at the beginning of the dynamics can be set, while Track ROBO considers a closed-loop problem in which control variables can be set at each time step. The goal in both tracks is to infer controls that drive the system to a desired state. Code is open-sourced ( https://github.com/LearningByDoingCompetition/learningbydoing-comp ) to reproduce the winning solutions of the competition and to facilitate trying out new methods on the competition tasks.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-7 av 7

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy