On Howard's policy improvement method
作者:
Ulrich Rieder,
期刊:
Mathematische Operationsforschung und Statistik. Series Optimization
(Taylor Available online 1977)
卷期:
Volume 8,
issue 2
页码: 227-236
ISSN:0323-3898
年代: 1977
DOI:10.1080/02331937708842420
出版商: Akademic-Verlag
数据来源: Taylor
摘要:
We consider a stationary dynamic program with general state and action spaces and with an unbounded reward function. Taking a martingale approach to the optimization problem we derive several necessary and sufficient conditions for the validity of Howard's policy improvement method. The conditions hold both in the positive and negative ease. By means of these results we can construct a sequence of stationary policies for which the expected rewards converge to the value function. The construction is a straightforward generalization of the method given by Frid [3].
点击下载:
PDF (823KB)
返 回