Private and Common Information States in Decentralized Parallel Dynamic Programming for Delayed Sharing Patterns
Charalambos D. Charalambous, Umarbek Guvercin, Seddik Djouadi
This paper develops a dynamic programming (DP) approach for decentralized stochastic optimal control problems with delayed sharing information patterns, which exhibits the fundamental Properties of classical DP of centralized partially observable Markov decision problems (POMDPs): the value functions and information states depend on the actions of the minimizing controls and not their strategies. This is achieved by invoking the concept of Person-by-Person (PbP) optimality, in which each control strategy is associated with a value function conditioned on its assigned delayed sharing information pattern, when all other strategies are fixed to their optimal responses. The value functions satisfy generalized and simplified DP equations. These are used to derive necessary and sufficient conditions for PbP optimality. The simplified DP equations are obtained by invoking the structural property that optimal strategies are separated and functionals of two information states: 1) a private a posteriori probability distribution based on the information pattern of the strategy, and 2) a centralized a posteriori probability distribution based on the shared or common information to all strategies, each satisfying a Markov recursion. The DP approach of this paper, settles a long standing open problem since the appearance of T-step delayed sharing patterns in [1, Section IV.G], in terms of generalizing the fundamental properties of classical DP approach.
Read on ELI