The Research of High-level Stealing Strategy in RoboCup2D Based on Reinforcement Learning
|Course||Computer Software and Theory|
|Keywords||RoboCup Keepaway Reinforcement Learning Stealing Strategy PolicyReuse Transfer Learning|
RoboCup, Robot Soccer World Cup, is an international comprehensive event, in which simulation2D league proposes a complex decision problem in real-time and multi-agent environment. As current trend of artificial intelligence is turning from "solving single-agent problem in static predictable environment" to "solving multi-agent problem in unpredictable environment", research in agent decision problem on RoboCup2D represents the newest theoretical direction of artificial intelligence, and solving RoboCup2D problem contributes the deep development and revolution of current society.The key point in RoboCup2D problem is the high-level decision. For high-level decision, there are hand-coded strategies and a series of artificial intelligence methods. Traditional high level decision takes hand-coded strategies and suffers the issue of subjectivity:decision-related parameters are just set by experience which doesn’t guarantee optimum; and hand-coded strategies naturally can’t consider all possible situation in a game, thus can’t adapt well to dynamic change of environment, which make players perform badly. Artificial intelligence methods include reinforcement learning, decision tree learning, neural network learning and so on. As with a learning nature, they are always better than hand-coded strategies.In reinforcement learning process, an agent gradually learns to take the best action under each situation by keeping trying, observing reward and updating its knowledge, thus hope to make itself acquire the highest accumulated rewards. The interactive feature of reinforcement learning is in step with Client/Server interactive mode in RoboCup2D, and sequential decision character in reinforcement learning is in accordance with periodic decision character, these facts make reinforcement learning method very suitable for solving high level decision problem in RoboCup2D. Research in this paper is based on reinforcement learning methods.Keepaway, in which two small teams compete for the possession of ball, is a typical subtask in RoboCup2D. There has been researches on high-level protecting strategy based on reinforcement learning which optimizes the decision of keepers. But there hasn’t been any researches on high-level stealing strategy using reinforcement learning. In Keepaway, stealing task and protecting task have opposite goal, and their task features are also different, so their corresponding strategies should be different. Protecting task needs-keepers who don’t ball to go to open.area for keeper who has the ball to have a route for pass the ball; while stealing task asks all takers to get close to keepers and try to touch the ball. Protecting task majorly concerns decision of the keeper who currently has the ball; while for stealing task, the taker closed to the ball has a fixed strategy(he must go to the ball, or his team will never win), and the rest taker’s decision are of great research value. Focused on features of stealing task in Keepaway, this paper researches on how to apply reinforcement learning to high-level stealing strategy, and the related work are as follow:(1) Though analysis of stealing task, we design reasonable state space, action space and reward function for the reinforcement learning model of takers, and present a reinforcement learning algorithm. Experiments show that after reinforcement learning, takers make more reasonable high-level decisions that are much better than hand-coded decisions.(2) By rational decision of policy transferring scheme from smaller scale to bigger scale and definition of mappings between two scales, we managed to let the taker in bigger scale task do reinforcement learning in which he can reuse policy which has been learned in smaller scale task by normal reinforcement learning. Experiments prove that for the same bigger scale task, the taker using reinforcement learning with policy transferring technique performs better even at the beginning than the taker using normal reinforcement learning.Results in the paper show the effectiveness of reinforcement learning in high-level stealing decision in Keepaway task. Traditionally, reinforcement learning is only applied for low-level decision. This paper proves that by reasonable design of high-level reward model, reinforcement learning can also be used for high-level decision, showing the wider application ability of reinforcement learning.