NSTL回溯数据服务平台

首页

按字顺浏览

期刊浏览

卷期浏览

Learning action probabilities from delayed reinforcement

Learning action probabilities from delayed reinforcement

作者: S. I. AHSON, R. SRINIVAS,

期刊: International Journal of Systems Science （Taylor Available online 1993）
卷期: Volume 24, issue 12

页码: 2415-2421

ISSN:0020-7721

年代: 1993

DOI:10.1080/00207729308949639

出版商: Taylor & Francis Group

数据来源: Taylor

摘要:

A reinforcement scheme for learning automata, applicable to real situations where the reinforcement received from the environment is delayed, is presented. The scheme divides the state space into regions following the boxes approach of Michie and Chambers. Each region maintains estimates of the reward characteristics of the environment and contains a local automaton that updates action probabilities whenever the system state enters it. Estimates of reward characteristics are obtained using reinforcement received during the period of eligibility. Results obtained through computer simulation of the inverted pendulum problem are compared with the adaptive critic learning developed by Bartoet al. (1983).

点击下载: PDF (208KB)