Learning action probabilities from delayed reinforcement
作者:
S. I. AHSON,
R. SRINIVAS,
期刊:
International Journal of Systems Science
(Taylor Available online 1993)
卷期:
Volume 24,
issue 12
页码: 2415-2421
ISSN:0020-7721
年代: 1993
DOI:10.1080/00207729308949639
出版商: Taylor & Francis Group
数据来源: Taylor
摘要:
A reinforcement scheme for learning automata, applicable to real situations where the reinforcement received from the environment is delayed, is presented. The scheme divides the state space into regions following the boxes approach of Michie and Chambers. Each region maintains estimates of the reward characteristics of the environment and contains a local automaton that updates action probabilities whenever the system state enters it. Estimates of reward characteristics are obtained using reinforcement received during the period of eligibility. Results obtained through computer simulation of the inverted pendulum problem are compared with the adaptive critic learning developed by Bartoet al. (1983).
点击下载:
PDF (208KB)
返 回