M. A. Khan, D. Turgut, and L. Bölöni

Optimizing coalition formation for tasks with dynamically evolving rewards and nondeterministic action effects


Cite as:

M. A. Khan, D. Turgut, and L. Bölöni. Optimizing coalition formation for tasks with dynamically evolving rewards and nondeterministic action effects. Journal of Autonomous Agents and Multi-Agent Systems, 22(3):415–438, 2011.

Download:

Download 

Abstract:

We consider a problem domain where coalitions of agents are formed in order to execute tasks. Each task is assigned at most one coalition of agents, and the coalition can be reorganized during execution. Executing a task means bringing it to one of the desired terminal states, which might take several time steps. The state of the task evolves even if no coalition is assigned to its execution and depends nondeterministically on the cumulative actions of the agents in the coalition. Furthermore, we assume that the reward obtained for executing a task evolves in time: the more the execution of the task is delayed, the lesser the reward. A representative example of this class of problems is the allocation of firefighters to fires in a disaster rescue environment. We describe a practical methodology through which a problem of this class can be encoded as a Markov Decision Process. Due to the three levels of factoring in the resulting MDP (the states, actions and rewards are composites of the original features of the problem) the resulting MDP can be directly solved only for small problem instances. We describe two methods for parallel decomposition of the MDP: the MDP RSUA approach for random sampling and uniform allocation and the MDP REUSE method which reuses the lower level MDP to allocate resources to the parallel subproblems. Through an experimental study which models the problem domain using the fire simulation components of the Robocup Rescue simulator, we show that both methods significantly outperform heuristic approaches and MDP REUSE provides an overall higher performance than MDP RSUA.

BibTeX:

@article{Khan-2011-JAAMAS,
   title = "Optimizing coalition formation for tasks with dynamically evolving rewards and nondeterministic action effects",
   author = "M. A. Khan and D. Turgut and L. B{\"o}l{\"o}ni",
   journal = "Journal of Autonomous Agents and Multi-Agent Systems",
   year = "2011",
   volume = "22",
   number = "3",
   pages = "415-438",
   abstract = {
      We consider a problem domain where coalitions of agents are formed
      in order to execute tasks. Each task is assigned at most one
      coalition of agents, and the coalition can be reorganized during
      execution. Executing a task means bringing it to one of the
      desired terminal states, which might take several time steps. The
      state of the task evolves even if no coalition is assigned to its
      execution and depends nondeterministically on the cumulative
      actions of the agents in the coalition. Furthermore, we assume
      that the reward obtained for executing a task evolves in time: the
      more the execution of the task is delayed, the lesser the reward.
      A representative example of this class of problems is the
      allocation of firefighters to fires in a disaster rescue
      environment. We describe a practical methodology through which a
      problem of this class can be encoded as a Markov Decision Process.
      Due to the three levels of factoring in the resulting MDP (the
      states, actions and rewards are composites of the original
      features of the problem) the resulting MDP can be directly solved
      only for small problem instances. We describe two methods for
      parallel decomposition of the MDP: the MDP RSUA approach for
      random sampling and uniform allocation and the MDP REUSE method
      which reuses the lower level MDP to allocate resources to the
      parallel subproblems.
      Through an experimental study which models the problem domain
      using the fire simulation components of the Robocup Rescue
      simulator, we show that both methods significantly outperform
      heuristic approaches and MDP REUSE provides an overall higher
      performance than MDP RSUA.
      },
}

Generated by bib2html.pl (written by Patrick Riley, Lotzi Boloni ) on Fri Oct 06, 2017 18:15:24