In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.
We offer the quantitative estimation of stability of risk-sensitive cost optimization in the problem of optimal stopping of Markov chain on a Borel space X. It is supposed that the transition probability p(⋅|x), x∈X is approximated by the transition probability p˜(⋅|x), x∈X, and that the stopping rule f˜∗ , which is optimal for the process with the transition probability p˜ is applied to the process with the transition probability p. We give an upper bound (expressed in term of the total variation distance: supx∈X∥p(⋅|x)−p˜(⋅|x)∥) for an additional cost paid for using the rule f˜∗ instead of the (unknown) stopping rule f∗ optimal for p.
This paper deals with a certain class of unbounded optimization problems. The optimization problems taken into account depend on a parameter. Firstly, there are established conditions which permit to guarantee the continuity with respect to the parameter of the minimum of the optimization problems under consideration, and the upper semicontinuity of the multifunction which applies each parameter into its set of minimizers. Besides, with the additional condition of uniqueness of the minimizer, its continuity is given. Some examples of nonconvex optimization problems that satisfy the conditions of the article are supplied. Secondly, the theory developed is applied to discounted Markov decision processes with unbounded cost functions and with possibly noncompact actions sets in order to obtain continuous optimal policies. This part of the paper is illustrated with two examples of the controlled Lindley's random walk. One of these examples has nonconstant action sets.
This paper deals with Markov Control Processes (MCPs) on Euclidean spaces with an infinite horizon and a discounted total cost. Firstly, MCPs which result from the deterministic controlled systems will be analyzed. For such MCPs, conditions that permit to establish the equation known in the literature of Economy as Euler's Equation (EE) will be given. There will be also presented an example of a Markov Control Process with deterministic controlled system where, to obtain the optimal value function, EE applied to the value iteration algorithm will be used. Secondly, the MCPs which result from the perturbation of deterministic controlled systems with a random noise will be dealt with. There will be also provided the conditions which allow to obtain the optimal value function and the optimal policy of a perturbed controlled system, in terms of the optimal value function and the optimal policy of deterministic controlled system corresponding. Finally, several examples to illustrate the last case mentioned will be presented.
The authors introduce risk sensitivity to a model of sequential games where players don't know beforehand which of them will make a choice at each stage of the game. It is shown that every sequential game without a predetermined order of turns with risk sensitivity has a Nash equilibrium, as well as in the case in which players have types that are chosen for them before the game starts and that are kept from the other players. There are also a couple of examples that show how the equilibria might change if the players are risk prone or risk adverse.
The paper concerns Markov decision processes (MDPs) with both the state and the decision spaces being finite and with the total reward as the objective function. For such a kind of MDPs, the authors assume that the reward function is of a fuzzy type. Specifically, this fuzzy reward function is of a suitable trapezoidal shape which is a function of a standard non-fuzzy reward. The fuzzy control problem consists of determining a control policy that maximizes the fuzzy expected total reward, where the maximization is made with respect to the partial order on the α-cuts of fuzzy numbers. The optimal policy and the optimal value function for the fuzzy optimal control problem are characterized by means of the dynamic programming equation of the standard optimal control problem and, as main conclusions, it is obtained that the optimal policy of the standard problem and the fuzzy one coincide and the fuzzy optimal value function is of a convenient trapezoidal form. As illustrations, fuzzy extensions of an optimal stopping problem and of a red-black gambling model are presented.
Firstly, in this paper there is considered a certain class of possibly unbounded optimization problems on Euclidean spaces, for which conditions that permit to obtain monotone minimizers are given. Secondly, the theory developed in the first part of the paper is applied to Markov control processes (MCPs) on real spaces with possibly unbounded cost function, and with possibly noncompact control sets, considering both the discounted and the average cost as optimality criterion. In the context described, conditions to obtain monotone optimal policies are provided. For the conditions of MCPs presented in the article, several controlled models including, in particular, two inventory/production systems and the linear regulator problem are supplied.
In this paper conditions proposed in Flores-Hernández and Montes-de-Oca \cite{Flores} which permit to obtain monotone minimizers of unbounded optimization problems on Euclidean spaces are adapted in suitable versions to study noncooperative games on Euclidean spaces with noncompact sets of feasible joint strategies in order to obtain increasing optimal best responses for each player. Moreover, in this noncompact framework an algorithm to approximate the equilibrium points for noncooperative games is supplied.