In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.
The paper deals with a class of discrete-time stochastic control processes under a discounted optimality criterion with random discount rate, and possibly unbounded costs. The state process { x_{t} } and the discount process { \alpha_{t} } evolve according to the coupled difference equations x_{t+1}=F(x_{t},\alpha _{t},a_{t},\xi _{t}), \alpha_{t+1}=G(\alpha _{t},\eta _{t}) where the state and discount disturbance processes { \xi _{t} } and { \eta _{t} } are sequences of i.i.d. random variables with densities \rho^{\xi } and \rho^{\eta } respectively. The main objective is to introduce approximation algorithms of the optimal cost function that lead up to construction of optimal or nearly optimal policies in the cases when the densities \rho ^{\xi } and \rho ^{\eta } are either known or unknown. In the latter case, we combine suitable estimation methods with control procedures to construct an asymptotically discounted optimal policy.