SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Talebi Mazraeh Shahi Mohammad Sadegh 1982 ) "

Search: WFRF:(Talebi Mazraeh Shahi Mohammad Sadegh 1982 )

  • Result 1-9 of 9
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Alinia, Bahram, et al. (author)
  • Competitive Online Scheduling Algorithms with Applications in Deadline-Constrained EV Charging
  • 2018
  • In: 2018 IEEE/ACM 26th International Symposium on Quality of Service, IWQoS 2018. - : IEEE. - 9781538625422
  • Conference paper (peer-reviewed)abstract
    • This paper studies the classical problem of online scheduling of deadline-sensitive jobs with partial values and investigates its extension to Electric Vehicle (EV) charging scheduling by taking into account the processing rate limit of jobs and charging station capacity constraint. The problem lies in the category of time-coupled online scheduling problems without availability of future information. This paper proposes two online algorithms, both of which are shown to be (2-\frac{1}{U})-competitive, where U is the maximum scarcity level, a parameter that indicates demand-to-supply ratio. The first proposed algorithm is deterministic, whereas the second is randomized and enjoys a lower computational complexity. When U grows large, the performance of both algorithms approaches that of the state-of-the-art for the case where there is processing rate limits on the jobs. Nonetheless in realistic cases, where U is typically small, the proposed algorithms enjoy a much lower competitive ratio. To carry out the competitive analysis of our algorithms, we present a proof technique, which is novel to the best of our knowledge. This technique could also be used to simplify the competitive analysis of some existing algorithms, and thus could be of independent interest.
  •  
2.
  • Hajiesmaili, Mohammad Hassan, et al. (author)
  • Multiperiod Network Rate Allocation With End-to-End Delay Constraints
  • 2018
  • In: IEEE Transactions on Control of Network Systems. - : Institute of Electrical and Electronics Engineers (IEEE). - 2325-5870. ; 5:3, s. 1087-1097
  • Journal article (peer-reviewed)abstract
    • QoS-aware networking applications such as real-time streaming and video surveillance systems require nearly fixed average end-to-end delay over long periods to communicate efficiently, although may tolerate some delay variations in short periods. This variability exhibits complex dynamics that makes rate control of such applications a formidable task. This paper addresses rate allocation for heterogeneous QoS-aware applications that preserves the long-term end-to-end delay constraint while seeking the maximum network utility cumulated over a fixed time interval. To capture the temporal dynamics of sources, we incorporate a novel time-coupling constraint in which delay sensitivity of sources is considered such that a certain end-to-end average delay for each source over a prespecified time interval is satisfied. We propose an algorithm, as a dual-based solution, which allocates source rates for the next time interval in a distributed fashion, given the knowledge of network parameters in advance. Also, we extend the algorithm to the case that the problem data is not known fully in advance to capture more realistic scenarios. Through numerical experiments, we show that our proposed algorithm attains higher average link utilization and a wider range of feasible scenarios in comparison with the best, to our knowledge, rate control schemes that may guarantee such constraints on delay.
  •  
3.
  • Alinia, Bahram, et al. (author)
  • Maximizing Quality of Aggregation in Delay-Constrained Wireless Sensor Networks
  • 2013
  • In: IEEE Communications Letters. - : IEEE Press. - 1089-7798 .- 1558-2558. ; 17:11, s. 2084-2087
  • Journal article (peer-reviewed)abstract
    • In this letter, both the number of participating nodes and spatial dispersion are incorporated to establish a bi-objective optimization problem for maximizing the quality of aggregation under interference and delay constraints in tree-based wireless sensor networks (WSNs). The formulated problem is proved to be NP-hard with respect to Weighted-sum scalarization and a distributed heuristic aggregation scheduling algorithm, named SDMAX, is proposed. Simulation results show that SDMAX not only gives a close approximation of the Pareto-optimal solution, but also outperforms the best, to our knowledge, existing alternative proposed so far in the literature.
  •  
4.
  • Lelarge, Marc, et al. (author)
  • Spectrum Bandit Optimization
  • 2013
  • In: 2013 IEEE Information Theory Workshop, ITW 2013. - : IEEE conference proceedings. - 9781479913213 ; , s. 6691221-
  • Conference paper (peer-reviewed)abstract
    • We consider the problem of allocating radio channels to links in a wireless network. Links interact through interference, modelled as a conflict graph (i.e., two interfering links cannot be simultaneously active on the same channel). We aim at identifying the channel allocation maximizing the total network throughput over a finite time horizon. Should we know the average radio conditions on each channel and on each link, an optimal allocation would be obtained by solving an Integer Linear Program (ILP). When radio conditions are unknown a priori, we look for a sequential channel allocation policy that converges to the optimal allocation while minimizing on the way the throughput loss or regret due to the need for exploring suboptimal allocations. We formulate this problem as a generic linear bandit problem, and analyze it in a stochastic setting where radio conditions are driven by a i.i.d. stochastic process, and in an adversarial setting where radio conditions can evolve arbitrarily. We provide, in both settings, algorithms whose regret upper bounds outperform those of existing algorithms.
  •  
5.
  • Talebi Mazraeh Shahi, Mohammad Sadegh, 1982-, et al. (author)
  • Learning proportionally fair allocations with low regret
  • 2018
  • In: SIGMETRICS 2018 - Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems. - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450358460 ; , s. 50-52
  • Conference paper (peer-reviewed)abstract
    • We address the problem of learning Proportionally Fair (PF) allocations in parallel server systems with unknown service rates. We provide the first algorithms, to our knowledge, for learning such allocations with sub-linear regret
  •  
6.
  • Talebi Mazraeh Shahi, Mohammad Sadegh, 1982- (author)
  • Minimizing Regret in Combinatorial Bandits and Reinforcement Learning
  • 2017
  • Doctoral thesis (other academic/artistic)abstract
    • This thesis investigates sequential decision making tasks that fall in the framework of reinforcement learning (RL). These tasks involve a decision maker repeatedly interacting with an environment modeled by an unknown finite Markov decision process (MDP), who wishes to maximize a notion of reward accumulated during her experience. Her performance can be measured through the notion of regret, which compares her accumulated expected reward against that achieved by an oracle algorithm always following an optimal behavior. In order to maximize her accumulated reward, or equivalently to minimize the regret, she needs to face a trade-off between exploration and exploitation.The first part of this thesis investigates combinatorial multi-armed bandit (MAB) problems, which are RL problems whose state-space is a singleton. It also addresses some applications that can be cast as combinatorial MAB problems. The number of arms in such problems generically grows exponentially with the number of basic actions, but the rewards of various arms are correlated. Hence, the challenge in such problems is to exploit the underlying combinatorial structure.For these problems, we derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any admissible algorithm and investigate how these bounds scale with the dimension of the underlying combinatorial structure. We then propose several algorithms and provide finite-time analyses of their regret. The proposed algorithms efficiently exploit the structure of the problem, provide better performance guarantees than existing algorithms, and significantly outperform these algorithms in practice.The second part of the thesis concerns RL in an unknown and discrete MDP under the average-reward criterion. We develop some variations of the transportation lemma that could serve as novel tools for the regret analysis of RL algorithms. Revisiting existing regret lower bounds allows us to derive alternative bounds, which motivate that the local variance of the bias function of the MDP, i.e., the variance with respect to next-state transition laws, could serve as a notion of problem complexity for regret minimization in RL. Leveraging these tools also allows us to report a novel regret analysis of the KL-UCRL algorithm for ergodic MDPs. The leading term in our regret bound depends on the local variance of the bias function, thus coinciding with observations obtained from our presented lower bounds. Numerical evaluations in some benchmark MDPs indicate that the leading term of the derived bound can provide an order of magnitude improvement over previously known results for this algorithm.
  •  
7.
  • Talebi Mazraeh Shahi, Mohammad Sadegh, 1982- (author)
  • Online Combinatorial Optimization under Bandit Feedback
  • 2016
  • Licentiate thesis (other academic/artistic)abstract
    • Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making problems with an exploration vs. exploitation trade-off. In such problems, the decision maker selects an arm in each round and observes a realization of the corresponding unknown reward distribution. Each decision is based on past decisions and observed rewards. The objective is to maximize the expected cumulative reward over some time horizon by balancing exploitation (arms with higher observed rewards should be selectedoften) and exploration (all arms should be explored to learn their average rewards). Equivalently, the performanceof a decision rule or algorithm can be measured through its expected regret, defined as the gap betweenthe expected reward achieved by the algorithm and that achieved by an oracle algorithm always selecting the bestarm. This thesis investigates stochastic and adversarial combinatorial MAB problems, where each arm is a collection of several basic actions taken from a set of $d$ elements, in a way that the set of arms has a certain combinatorial structure. Examples of such sets include the set of fixed-size subsets, matchings, spanning trees, paths, etc. These problems are specific forms of online linear optimization, where the decision space is a subset of $d$-dimensional hypercube.Due to the combinatorial nature, the number of arms generically grows exponentially with $d$. Hence, treating arms as independent and applying classical sequential arm selection policies would yield a prohibitive regret. It may then be crucial to exploit the combinatorial structure of the problem to design efficient arm selection algorithms.As the first contribution of this thesis, in Chapter 3 we investigate combinatorial MABs in the stochastic setting and with Bernoulli rewards. We derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any algorithm under bandit and semi-bandit feedback. The proposed lower bounds are problem-specific and tight in the sense that there exists an algorithm that achieves these regret bounds. Our derivation leverages some theoretical results in adaptive control of Markov chains. Under semi-bandit feedback, we further discuss the scaling of the proposed lower bound with the dimension of the underlying combinatorial structure. For the case of semi-bandit feedback, we propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the fourth chapter, we consider stochastic combinatorial MAB problems where the underlying combinatorial structure is a matroid. Specializing the results of Chapter 3 to matroids, we provide explicit regret lower bounds for this class of problems. For the case of semi-bandit feedback, we propose KL-OSM, a computationally efficient greedy-based algorithm that exploits the matroid structure. Through a finite-time analysis, we prove that the regret upper bound of KL-OSM matches the proposed lower bound, thus making it the first asymptotically optimal algorithm for this class of problems. Numerical experiments validate that KL-OSM outperforms state-of-the-art algorithms in practice, as well.In the fifth chapter, we investigate the online shortest-path routing problem which is an instance of combinatorial MABs with geometric rewards. We consider and compare three different types of online routing policies, depending (i) on where routing decisions are taken (at the source or at each node), and (ii) on the received feedback (semi-bandit or bandit). For each case, we derive the asymptotic regret lower bound. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact.For source routing under semi-bandit feedback, we then propose two algorithms with a trade-off betweencomputational complexity and performance. The regret upper bounds of these algorithms improve over those ofthe existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments. Finally, we discuss combinatorial MABs in the adversarial setting and under bandit feedback. We concentrate on the case where arms have the same number of basic actions but are otherwise arbitrary. We propose CombEXP, an algorithm that has the same regret scaling as state-of-the-art algorithms. Furthermore, we show that CombEXP admits lower computational complexity for some combinatorial problems.
  •  
8.
  • Talebi Mazraeh Shahi, Mohammad Sadegh, 1982-, et al. (author)
  • Stochastic Online Shortest Path Routing : The Value of Feedback
  • 2018
  • In: IEEE Transactions on Automatic Control. - : Institute of Electrical and Electronics Engineers (IEEE). - 0018-9286 .- 1558-2523. ; 63:4, s. 915-930
  • Journal article (peer-reviewed)abstract
    • This paper studies online shortest path routing over multihop networks. Link costs or delays are time varying and modeled by independent and identically distributed random processes, whose parameters are initially unknown. The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays. Our aim is to find a routing policy that minimizes the regret (the cumulative difference of expected delay) between the path chosen by the policy and the unknown optimal path. We formulate the problem as a combinatorial bandit optimization problem and consider several scenarios that differ in where routing decisions are made and in the information available when making the decisions. For each scenario, we derive a tight asymptotic lower bound on the regret that has to be satisfied by any online routing policy. Three algorithms, with a tradeoff between computational complexity and performance, are proposed. The regret upper bounds of these algorithms improve over those of the existing algorithms. We also assess numerically the performance of the proposed algorithms and compare it to that of existing algorithms.
  •  
9.
  • Talebi Mazraeh Shahi, Mohammad Sadegh, 1982-, et al. (author)
  • Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs
  • 2018
  • In: Proceedings of 29th International Conference on Algorithmic Learning Theory, ALT 2018. - : ML Research Press. ; , s. 770-805
  • Conference paper (peer-reviewed)abstract
    • The problem of reinforcement learning in an unknown and discrete Markov Decision Process (MDP) under the average-reward criterion is considered, when the learner interacts with the system in a single stream of observations, starting from an initial state without any reset. We revisit the minimax lower bound for that problem by making appear the local variance of the bias function in place of the diameter of the MDP. Furthermore, we provide a novel analysis of the KL-Ucrl algorithm establishing a high-probability regret bound scaling as Oe(q S Ps,a V?s,aT ) for this algorithm for ergodic MDPs, where S denotes the number of states and where Vs,a? is the variance of the bias function with respect to the next-state distribution following action a in state s. The resulting bound improves upon the best previously known regret bound Oe(DS√AT) for that algorithm, where A and D respectively denote the maximum number of actions (per state) and the diameter of MDP. We finally compare the leading terms of the two bounds in some benchmark MDPs indicating that the derived bound can provide an order of magnitude improvement in some cases. Our analysis leverages novel variations of the transportation lemma combined with Kullback-Leibler concentration inequalities, that we believe to be of independent interest.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-9 of 9

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view