SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Matni Nikolai) "

Sökning: WFRF:(Matni Nikolai)

  • Resultat 1-10 av 10
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Lee, Bruce D., et al. (författare)
  • The Fundamental Limitations of Learning Linear-Quadratic Regulators
  • 2023
  • Ingår i: 2023 62nd IEEE Conference on Decision and Control, CDC 2023. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 4053-4060
  • Konferensbidrag (refereegranskat)abstract
    • We present a local minimax lower bound on the excess cost of designing a linear-quadratic controller from offline data. The bound is valid for any offline exploration policy that consists of a stabilizing controller and an energy bounded exploratory input. The derivation leverages a relaxation of the minimax estimation problem to Bayesian estimation, and an application of van Trees inequality. We show that the bound aligns with system-theoretic intuition. In particular, we demonstrate that the lower bound increases when the optimal control objective value increases. We also show that the lower bound increases when the system is poorly excitable, as characterized by the spectrum of the controllability gramian of the system mapping the noise to the state and the H∞ norm of the system mapping the input to the state. We further show that for some classes of systems, the lower bound may be exponential in the state dimension, demonstrating exponential sample complexity for learning the linear-quadratic regulator.
  •  
2.
  • Lindemann, Lars, et al. (författare)
  • Learning Hybrid Control Barrier Functions from Data
  • 2020
  • Ingår i: Proceedings of the 2020 Conference on Robot Learning, CoRL 2020. - : ML Research Press. ; , s. 1351-1370
  • Konferensbidrag (refereegranskat)abstract
    • Motivated by the lack of systematic tools to obtain safe control laws for hybrid systems, we propose an optimization-based framework for learning certifiably safe control laws from data. In particular, we assume a setting in which the system dynamics are known and in which data exhibiting safe system behavior is available. We propose hybrid control barrier functions for hybrid systems as a means to synthesize safe control inputs. Based on this notion, we present an optimization-based framework to learn such hybrid control barrier functions from data. Importantly, we identify sufficient conditions on the data such that feasibility of the optimization problem ensures correctness of the learned hybrid control barrier functions, and hence the safety of the system. We illustrate our findings in two simulations studies, including a compass gait walker.
  •  
3.
  • Robey, Alexander, et al. (författare)
  • Learning Control Barrier Functions from Expert Demonstrations
  • 2020
  • Ingår i: 2020 59Th IEEE Conference On Decision And Control (Cdc). - : IEEE. ; , s. 3717-3724
  • Konferensbidrag (refereegranskat)abstract
    • Inspired by the success of imitation and inverse reinforcement learning in replicating expert behavior through optimal control, we propose a learning based approach to safe controller synthesis based on control barrier functions (CBFs). We consider the setting of a known nonlinear control affine dynamical system and assume that we have access to safe trajectories generated by an expert - a practical example of such a setting would be a kinematic model of a self-driving vehicle with safe trajectories (e.g., trajectories that avoid collisions with obstacles in the environment) generated by a human driver. We then propose and analyze an optimization based approach to learning a CBF that enjoys provable safety guarantees under suitable Lipschitz smoothness assumptions on the underlying dynamical system. A strength of our approach is that it is agnostic to the parameterization used to represent the CBF, assuming only that the Lipschitz constant of such functions can be efficiently bounded. Furthermore, if the CBF parameterization is convex, then under mild assumptions, so is our learning process. We end with extensive numerical evaluations of our results on both planar and realistic examples, using both random feature and deep neural network parameterizations of the CBF. To the best of our knowledge, these are the first results that learn provably safe control barrier functions from data.
  •  
4.
  • Russo, Alessio (författare)
  • Efficient Exploration and Robustness in Controlled Dynamical Systems
  • 2023
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • In this thesis, we explore two distinct topics. The first part of the thesis delves into efficient exploration in  multi-task bandit models and model-free exploration in large Markov decision processes (MDPs). This section is driven by the recent research community's interest in uncovering optimal methods to navigate complex decision-making problems. In the second part, we examine attack vectors against MDPs and dynamical systems, which is motivated by the research community's recent push to enhance the safety of Machine Learning models against malicious attacks.  Additionally, we explore how to quantify uncertainty in an off-policy evaluation problem,  reflecting our ongoing efforts to deepen understanding in this domain. In the first part of the thesis, we present an investigation into the sample complexity of identifying the optimal arm in multi-task bandit problems. In this setting, an agent can select a task and efficiently learn through cross-task knowledge transfer. We derive instance-specific lower bounds that any probably approximately correct (PAC) algorithm should satisfy, revealing both theoretically and empirically significant improvements in scaling over previous methods. Subsequently, we investigate model-free exploration in Reinforcement Learning (RL) problems. By leveraging an information-theoretical viewpoint, we focus on the instance-specific lower bound for the number of samples needed to identify a nearly-optimal policy. We develop an approximation of this lower bound involving quantities that can be inferred using model-free approaches. This leads to a new model-free exploration strategy applicable to  continuous MDPs. In the second part of the thesis, we begin by investigating two types of attacks on MDPs: those that alter the observations and those that manipulate the control signals of the victim. We present strategies for designing optimal attacks to minimize the collected reward of the victim. Our strategies show how uncertainties induced by the attack can be modeled using a partially observable MDP (POMDP) framework. We also illustrate how to devise attacks that achieve lower detectability, approaching the problem from a statistical detection perspective. Next, we explore the problem of an eavesdropper aiming to detect changes in an MDP. Approaching this problem from a statistical detection perspective, we introduce a novel definition of online privacy based on the average amount of information per observation of the underlying stochastic system. We derive privacy upper bounds and calculate policies that attain higher privacy levels, supplementing our analysis with examples and numerical simulations. Finally, we present a new off-policy estimation method based on Conformal Prediction that outputs an interval containing the target policy's true reward, demonstrating how to handle the distribution shift between target and behavior policies, and maintain the certainty level while reducing the interval length.Next, we shift our focus onto linear dynamical systems. We study poisoning attacks on data-driven control methods, focusing on how slight changes in the dataset induced by an adversary can significantly deteriorate control methods and potentially destabilize the system. We propose a novel algorithm for computing effective attacks, providing a theoretical basis for our strategy. We also study the detectability of poisoning attacks, focusing on the impact of data poisoning on least-squares estimates. We establish conditions under which the set of models compatible with the data includes the true model of the system, and we analyze different poisoning strategies for the attacker. On the basis of the arguments presented herein, we propose a stealthy data poisoning attack on the least-squares estimator that can evade classical statistical tests. 
  •  
5.
  • Tsiamis, Anastasios, et al. (författare)
  • Learning to Control Linear Systems can be Hard
  • 2022
  • Ingår i: Proceedings of 35th Conference on Learning Theory, COLT 2022. - : ML Research Press. ; , s. 3820-3857
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we study the statistical difficulty of learning to control linear systems. We focus on two standard benchmarks, the sample complexity of stabilization, and the regret of the online learning of the Linear Quadratic Regulator (LQR). Prior results state that the statistical difficulty for both benchmarks scales polynomially with the system state dimension up to system-theoretic quantities. However, this does not reveal the whole picture. By utilizing minimax lower bounds for both benchmarks, we prove that there exist nontrivial classes of systems for which learning complexity scales dramatically, i.e. exponentially, with the system dimension. This situation arises in the case of underactuated systems, i.e. systems with fewer inputs than states. Such systems are structurally difficult to control and their system theoretic quantities can scale exponentially with the system dimension dominating learning complexity. Under some additional structural assumptions (bounding systems away from uncontrollability), we provide qualitatively matching upper bounds. We prove that learning complexity can be at most exponential with the controllability index of the system, that is the degree of underactuation.
  •  
6.
  • Tsiamis, Anastasios, et al. (författare)
  • Statistical Learning Theory for Control : A Finite-Sample Perspective
  • 2023
  • Ingår i: IEEE Control Systems. - : Institute of Electrical and Electronics Engineers (IEEE). - 1066-033X .- 1941-000X. ; 43:6, s. 67-97
  • Tidskriftsartikel (refereegranskat)abstract
    • Learning algorithms have become an integral component to modern engineering solutions. Examples range from self-driving cars and recommender systems to finance and even critical infrastructure, many of which are typically under the purview of control theory. While these algorithms have already shown tremendous promise in certain applications [1], there are considerable challenges, in particular, with respect to guaranteeing safety and gauging fundamental limits of operation. Thus, as we integrate tools from machine learning into our systems, we also require an integrated theoretical understanding of how they operate in the presence of dynamic and system-theoretic phenomena. Over the past few years, intense efforts toward this goal - an integrated theoretical understanding of learning, dynamics, and control - have been made. While much work remains to be done, a relatively clear and complete picture has begun to emerge for (fully observed) linear dynamical systems. These systems already allow for reasoning about concrete failure modes, thus helping to indicate a path forward. Moreover, while simple at a glance, these systems can be challenging to analyze. Recently, a host of methods from learning theory and high-dimensional statistics, not typically in the control-theoretic toolbox, have been introduced to our community. This tutorial survey serves as an introduction to these results for learning in the context of unknown linear dynamical systems (see 'Summary'). We review the current state of the art and emphasize which tools are needed to arrive at these results. Our focus is on characterizing the sample efficiency and fundamental limits of learning algorithms. Along the way, we also delineate a number of open problems. More concretely, this article is structured as follows. We begin by revisiting recent advances in the finite-sample analysis of system identification. Next, we discuss how these finite-sample bounds can be used downstream to give guaranteed performance for learning-based offline control. The final technical section discusses the more challenging online control setting. Finally, in light of the material discussed, we outline a number of future directions.
  •  
7.
  • Ziemann, Ingvar, et al. (författare)
  • A Tutorial on the Non-Asymptotic Theory of System Identification
  • 2023
  • Ingår i: 2023 62nd IEEE Conference on Decision and Control, CDC 2023. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 8921-8939
  • Konferensbidrag (refereegranskat)abstract
    • This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of-mainly linear-system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the performance of various least-squares based estimators for identifying the parameters in autoregressive models. We conclude by sketching out how the ideas presented herein can be extended to certain nonlinear identification problems. Note: For reasons of space, proofs have been omitted in this version and are available in an online version: https://arxiv.org/abs/2309.03873.
  •  
8.
  • Ziemann, Ingvar (författare)
  • Applications of Information Inequalities to Linear Systems : Adaptive Control and Security
  • 2021
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • This thesis considers the application of information inequalities, Cramér-Rao type bounds, based on Fisher information, to linear systems. These tools are used to study the trade-offs between learning and performance in two application areas: adaptive control and control systems security.In the first part of the thesis, we study stochastic adaptive control of linear quadratic regulators (LQR). Here, information inequalities are used to derive instance-dependent  regret lower bounds. First, we consider a simplified version of LQR, a memoryless reference tracking model, and show how regret can be linked to a cumulative estimation error. This is then exploited to derive a regret lower bound in terms of the Fisher information generated by the experiment of the optimal policy. It is shown that if the optimal policy has ill-conditioned Fisher information, then so does any low-regret policy. This is combined with a Cramér-Rao bound to give a regret lower bound on the order of magnitude square-root T in the time-horizon  for a class of instances we call uninformative. The lower bound holds for all policies which depend smoothly on the underlying parametrization.Second, we extend these results to the general LQR model, and to arbitrary affine parametrizations of the instance parameters. The notion of uninformativeness is generalized to this situation to give a structure-dependent rank condition for when logarithmic regret is impossible. This is done by reduction of regret to a cumulative Bellman error. Due to the quadratic nature of LQR, this Bellman error turns out to be a quadratic form, which again can be interpreted as an estimation error. Using this, we prove a local minimax regret lower bound, of which the proof relies on relating the minimax regret to a Bayesian estimation problem, and then using Van Trees' inequality. Again, it is shown that an appropriate information quantity of any low regret policy is similar to that of the optimal policy and that any uninformative instance suffers local minimax regret at least on the order of magnitude square-root T. Moreover, it shown that the notion of uninformativeness when specialized to certain well-understood scenarios yields a tight characterization of square-root-regret.In the second part of this thesis, we study control systems security problems from a Fisher information point of view. First, we consider a secure state estimation problem and characterize the maximal impact an adversary can cause by means of least informative distributions -- those which maximize the Cramér-Rao bound. For a linear measurement equation, it is shown that the least informative distribution, subjected to variance and sparsity constraints, can be solved for by a semi-definite program, which becomes mixed-integer in the presence of sparsity constraints. Furthermore, by relying on well-known results on minimax and robust estimation, a game-theoretic interpretation for this characterization of the maximum impact is offered.Last, we consider a Fisher information regularized minimum variance control objective, to study the trade-offs between parameter privacy and control performance. It is noted that this can be motivated for instance by learning-based attacks, in which case one seeks to leak as little information as possible to a system-identification adversary. Supposing that the feedback law is linear, the noise distribution minimizing the trace of Fisher information subject to a state variance penalty is found to be conditionally Gaussian. 
  •  
9.
  • Ziemann, Ingvar, et al. (författare)
  • How are policy gradient methods affected by the limits of control?
  • 2022
  • Ingår i: 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 5992-5999
  • Konferensbidrag (refereegranskat)abstract
    • We study stochastic policy gradient methods from the perspective of control-theoretic limitations. Our main result is that ill-conditioned linear systems in the sense of Doyle inevitably lead to noisy gradient estimates. We also give an example of a class of stable systems in which policy gradient methods suffer from the curse of dimensionality. Finally, we show how our results extend to partially observed systems.
  •  
10.
  • Ziemann, Ingvar, et al. (författare)
  • Single Trajectory Nonparametric Learning of Nonlinear Dynamics
  • 2022
  • Ingår i: Proceedings of 35th Conference on Learning Theory, COLT 2022. - : ML Research Press. ; , s. 3333-3364
  • Konferensbidrag (refereegranskat)abstract
    • Given a single trajectory of a dynamical system, we analyze the performance of the nonparametric least squares estimator (LSE). More precisely, we give nonasymptotic expected l2-distance bounds between the LSE and the true regression function, where expectation is evaluated on a fresh, counterfactual, trajectory. We leverage recently developed information-theoretic methods to establish the optimality of the LSE for nonparametric hypotheses classes in terms of supremum norm metric entropy and a subgaussian parameter. Next, we relate this subgaussian parameter to the stability of the underlying process using notions from dynamical systems theory. When combined, these developments lead to rate-optimal error bounds that scale as T−1/(2+q) for suitably stable processes and hypothesis classes with metric entropy growth of order δ−q. Here, T is the length of the observed trajectory, δ ∈ R+ is the packing granularity and q ∈ (0, 2) is a complexity term. Finally, we specialize our results to a number of scenarios of practical interest, such as Lipschitz dynamics, generalized linear models, and dynamics described by functions in certain classes of Reproducing Kernel Hilbert Spaces (RKHS).
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 10

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy