Reinforcement Learning and Its Application in Robot System
|School||Guangdong University of Technology|
|Course||Control Theory and Control Engineering|
|Keywords||Reinforcement learning Mobile robot Least-squares method RBF Neural network Fuzzy sarsa learnig Path-planning|
Reinforcement learning is an important machine learning method, it has become one of the key research areas in intelligence control and artificial intelligence in recent years. In various lerning methods, reinforcement learning has strong on-line adaptability and self-learning ability for complex system, it converges the optimal policy through interacting with environment, the technique of learning has successfully applied to nonlinear control systems, artificial intelligence, robot control, optimization and multi-agent systems. However, reinforcement learning has some hard problems to slove, it is a key point for reinforcement learning and its applications that how to incorporate computation intelligence to design the new algorithms. Now reinforcement learning is carried out to study mostly based on small, discrete state and action space, it is a hard problem when the state and action space are large and continuous. Our research work are concentrated on the theorem, algorithm and application of reinforcement learning, making use of reinforcement learning to realize the self-adaptition in complex and unknown environment becomes more important and meaningful.This dissertation is dedicated to the algorithm and anlysis of reinforcement learning. On the base of review of current studies in domestic and abroad in this domain, some improved or novel algorithms of reinforcentmen learning are proposed. A summary of the obtained result is presented as follows.(1) Multi-step temporal difference learning algorithm based on recursive least-squares methodIn order to solve the problem of slow convergence speed in reinforcement learning systems, a multi-step temporal difference learning algorithm using recursive least-squares methods(RLS-7D(λ)) is proposed, its convergence is proved, and its formula of error estimation is obtained. The experimental on maze problem demonstrate that the aglorithm could converge to the optimal strategy steadily. Moreover, it could improve the learning precision and speed up the convergence of the learning process. (2) Reinforcement learning based on RBF neural networkIn order to increasing generalization capability of basic actor-critic learning, a novel reinforcement learning algorithm based on RBF neural network is put forward. It is proved to converge to the sole solution of matrix equation with probability 1. The algorithm shares actor network and critic network by RBF neural network, it learns on-line through complexity and learning process. A self tuning PID control strategy based on the algorithm(AC-PID) is proposed. The RBF neural network was used to approximate the policy function of actor and the value function of critic simultaneously. Simulation results show that the proposed controller is efficient and it is perfectly adaptable and generalized.(3) An improved fuzzy sarsa learning based on the degree of explorationBecause it is difficult for fuzzy sarsa learning(FSL) to balance exploration vs. exploitation, an improved fuzzy sarsa learning algorithm (IFSL) based on the degree of exploration is firstly offered. An adaptive learning rate generator for tunning learning rate on-line and a fuzzy balancer for controlling the degree of exploration are introduced, and the weight vector of IFSL with stationary action selection policy converges to a unique value is proved. Simulation results on mountain-car problem show that IFSL well manager balance, and outperforms FSL in terms of learning speed and action quality.(4) A novel fuzzy sarsa learning incorporated with ant colony optimizationIt is difficult for fuzzy sarsa learning to tune learning rate on-line, a novel fuzzy sarsa learning incorporated with ant colony optimization(ACO-FSL) is proposed. It tunes learning rate by updating pheromone level and looks fuzzy inference process as ant’s foraging, then action selected by the pheromone matrix. Simulation results on mountain-car and truck problem show that ACO-FSL outperforms FSL in terms of learning performance.(5) Reinforcement learning applied in path planning for mobile robotPath planning for mobile robots with ACO-FSL algorithm in globally unknown environment is studied in our research work. The reward functions and value functions are given in detail. The algorithm eliminates its dependence on the static information of globally unknown environment or the moving information of dynamic obstacles. Simulation experiment shows the feasibility and advantage of ACO-FSL.