SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Bonet Blai) "

Search: WFRF:(Bonet Blai)

  • Result 1-7 of 7
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Bonet, Blai, et al. (author)
  • General Policies, Subgoal Structure, and Planning Width
  • 2024
  • In: The journal of artificial intelligence research. - : AI ACCESS FOUNDATION. - 1076-9757 .- 1943-5037. ; 80, s. 475-516
  • Journal article (peer-reviewed)abstract
    • It has been observed that many classical planning domains with atomic goals can be solved by means of a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width, which in these cases is bounded and small. Yet, while the notion of width has become part of state-of-the-art planning algorithms such as BFWS, there is no good explanation for why so many benchmark domains have bounded width when atomic goals are considered. In this work, we address this question by relating bounded width with the existence of general optimal policies that in each planning instance are represented by tuples of atoms of bounded size. We also define the notions of (explicit) serializations and serialized width that have a broader scope, as many domains have a bounded serialized width but no bounded width. Such problems are solved nonoptimally in polynomial time by a variant of the Serialized IW algorithm. Finally, the language of general policies and the semantics of serializations are combined to yield a simple, meaningful, and expressive language for specifying serializations in compact form in the form of sketches, which can be used for encoding domain control knowledge by hand or for learning it from examples. Sketches express general problem decompositions in terms of subgoals, and terminating sketches of bounded width express problem decompositions that can be solved in polynomial time.
  •  
2.
  • Bonet, Blai, et al. (author)
  • On Policy Reuse: An Expressive Language for Representing and Executing General Policies that Call Other Policies
  • 2024
  • In: Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling (ICAPS 2024).
  • Conference paper (peer-reviewed)abstract
    • Recently,a simple but powerful language for expressing and learning general policies and problem decompositions (sketches) has been introduced in terms of rules defined over a set of Boolean and numerical features. In this work, we consider three extensions of this language aimed at making policies and sketches more flexible and reusable: internal memory states, as in finite state controllers; indexical features, whose values are a function of the state and a number of internal registers that can be loaded with objects; and modules that wrap up policies and sketches and allow them to call each other by passing parameters. In addition, unlike general policies that select state transitions rather than ground actions, the new language allows for the selection of such actions. The expressive power of the resulting language for policies and sketches is illustrated through a number of examples.
  •  
3.
  • Haslum, Patrik, 1973-, et al. (author)
  • New Admissible Heuristics for Domain-Independent Planning
  • 2005
  • In: Proceedings of the 20th national ´Conference on Artificial Intelligence (AAAI). - : AAAI Press. - 157735236X ; , s. 1163-
  • Conference paper (peer-reviewed)abstract
    • Admissible heuristics are critical for effective domain-independent planning when optimal solutions must be guaranteed. Two useful heuristics are the hm heuristics, which generalize the reachability heuristic underlying the planning graph, and pattern database heuristics. These heuristics, however, have serious limitations: reachability heuristics capture only the cost of critical paths in a relaxed problem, ignoring the cost of other relevant paths, while PDB heuristics, additive or not, cannot accommodate too many variables in patterns, and methods for automatically selecting patterns that produce good estimates are not known. We introduce two refinements of these heuristics: First, the additive hm heuristic which yields an admissible sum of hm heuristics using a partitioning of the set of actions. Second, the constrained PDB heuristic which uses constraints from the original problem to strengthen the lower bounds obtained from abstractions. The new heuristics depend on the way the actions or problem variables are partitioned. We advance methods for automatically deriving additive hm and PDB heuristics from STRIPS encodings. Evaluation shows improvement over existing heuristics in several domains, although, not surprisingly, no heuristic dominates all the others over all domains.
  •  
4.
  • Rodriguez, Ivan D., et al. (author)
  • FOND Planning with Explicit Fairness Assumptions
  • 2022
  • In: The journal of artificial intelligence research. - : AI Access Foundation ; AAAI Press. - 1076-9757 .- 1943-5037. ; 74, s. 887-916
  • Journal article (peer-reviewed)abstract
    • We consider the problem of reaching a propositional goal condition in fully-observable non deterministic (FOND) planning under a general class of fairness assumptions that are given explicitly. The fairness assumptions are of the form A/B and say that state trajectories that contain infinite occurrences of an action a from A in a state s and finite occurrence of actions from B, must also contain infinite occurrences of action a in s followed by each one of its possible outcomes. The infinite trajectories that violate this condition are deemed as unfair, and the solutions are policies for which all the fair trajectories reach a goal state. We show that strong and strong-cyclic FOND planning, as well as QNP planning, a planning model introduced recently for generalized planning, are all special cases of FOND planning with fairness assumptions of this form which can also be combined. FOND+ planning, as this form of planning is called, combines the syntax of FOND planning with some of the versatility of LTL for expressing fairness constraints. A sound and complete FOND+ planner is implemented by reducing FOND+ planning to answer set programs, and its performance is evaluated in comparison with FOND and QNP planners, and LTL synthesis tools. Two other FOND+ planners are introduced as well which are more scalable but are not complete.
  •  
5.
  • Ståhlberg, Simon, 1988-, et al. (author)
  • Learning General Optimal Policies with Graph Neural Networks : Expressive Power, Transparency, and Limits
  • 2022
  • In: Proceedings of the Thirty-Second International Conference on Automated Planning and Scheduling (ICAPS 2022). - Palo Alto : AAAI Press. - 9781577358749 ; , s. 629-637
  • Conference paper (peer-reviewed)abstract
    • It has been recently shown that general policies for many clas-sical planning domains can be expressed and learned in termsof a pool of features defined from the domain predicates usinga description logic grammar. At the same time, most descrip-tion logics correspond to a fragment of k-variable countinglogic (Ck ) for k = 2, that has been shown to provide a tightcharacterization of the expressive power of graph neural net-works. In this work, we make use of these results to under-stand the power and limits of using graph neural networks(GNNs) for learning optimal general policies over a numberof tractable planning domains where such policies are knownto exist. For this, we train a simple GNN in a supervised man-ner to approximate the optimal value function V ∗(s) of anumber of sample states s. As predicted by the theory, it is ob-served that general optimal policies are obtained in domainswhere general optimal value functions can be defined withC2 features but not in those requiring more expressive C3 fea-tures. In addition, it is observed that the features learned are inclose correspondence with the features needed to express V ∗in closed form. The theory and the analysis of the domainslet us understand the features that are actually learned as wellas those that cannot be learned in this way, and let us movein a principled manner from a combinatorial optimization ap-proach to learning general policies to a potentially, more ro-bust and scalable approach based on deep learning.
  •  
6.
  • Ståhlberg, Simon, 1988-, et al. (author)
  • Learning General Policies with Policy Gradient Methods
  • 2023
  • In: Proceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning. - 9781956792027 ; , s. 647-657
  • Conference paper (peer-reviewed)abstract
    • While reinforcement learning methods have delivered remarkable results in a number of settings, generalization, i.e., the ability to produce policies that generalize in a reliable and systematic way, has remained a challenge. The problem of generalization has been addressed formally in classical planning where provable correct policies that generalize over all instances of a given domain have been learned using combinatorial methods. The aim of this work is to bring these two research threads together to illuminate the conditions under which (deep) reinforcement learning approaches, and in particular, policy optimization methods, can be used to learn policies that generalize like combinatorial methods do. We draw on lessons learned from previous combinatorial and deep learning approaches, and extend them in a convenient way. From the former, we model policies as state transition classifiers, as (ground) actions are not general and change from instance to instance. From the latter, we use graph neural networks (GNNs) adapted to deal with relational structures for representing value functions over planning states, and in our case, policies. With these ingredients in place, we find that actor-critic methods can be used to learn policies that generalize almost as well as those obtained using combinatorial approaches while avoiding the scalability bottleneck and the use of feature pools. Moreover, the limitations of the DRL methods on the benchmarks considered have little to do with deep learning or reinforcement learning algorithms, and result from the well-understood expressive limitations of GNNs, and the tradeoff between optimality and generalization (general policies cannot be optimal in some domains). Both of these limitations are addressed without changing the basic DRL methods by adding derived predicates and an alternative cost structure to optimize.
  •  
7.
  • Ståhlberg, Simon, 1988-, et al. (author)
  • Learning Generalized Policies without Supervision Using GNNs
  • 2022
  • In: Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning. - California : IJACAI Organization. - 9781956792010 ; , s. 474-483
  • Conference paper (peer-reviewed)abstract
    • We consider the problem of learning generalized policies forclassical planning domains using graph neural networks fromsmall instances represented in lifted STRIPS. The problemhas been considered before but the proposed neural architec-tures are complex and the results are often mixed. In thiswork, we use a simple and general GNN architecture andaim at obtaining crisp experimental results and a deeper un-derstanding: either the policy greedy in the learned valuefunction achieves close to 100% generalization over instanceslarger than those used in training, or the failure must be under-stood, and possibly fixed, logically. For this, we exploit therelation established between the expressive power of GNNsand the C2 fragment of first-order logic (namely, FOL with2 variables and counting quantifiers). We find for examplethat domains with general policies that require more expres-sive features can be solved with GNNs once the states are ex-tended with suitable ”derived atoms” encoding role composi-tions and transitive closures that do not fit into C2. The workfollows an existing approach based on GNNs for learning op-timal general policies in a supervised fashion, but the learnedpolicies are no longer required to be optimal (which expandsthe scope, as many planning domains do not have general op-timal policies) and are learned without supervision. Interest-ingly, value-based reinforcement learning methods that aimto produce optimal policies, do not always yield policies thatgeneralize, as the goals of optimality and generality are inconflict in domains where optimal planning is NP-hard.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-7 of 7

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view