Value-Iteration Algorithm:

For each iteration k+1:

  a. calculate the optimal state-value function for all s∈S;

  b. untill algorithm converges.

end up with an optimal state-value function

Optimal State-Value Function

As mentioned on the previous post, the method to pick up Optimal State-Value Function is shown below. From state s, we have multiple possible actions, what we will do is choose the best combination of immediate reward and state-value function from the next state.

Example for a grid game, it is quite like information propagate from the terminal states backward:

From State-Value Function to Policy

After we've got the Optimal State-Value Function, the Optimal Policy can be aquired by maxmizing the Action-Value Function. This means we try all possible actions from state s, and then choose the one that has the maximum reward.

最新文章

  1. 繁星——jquery的data()方法
  2. equals和“==”
  3. 整理分享C#通过user32.dll模拟物理按键操作的代码
  4. Jmeter测试结果分析
  5. App Transport Security has blocked a cleartext
  6. C语言创始人
  7. [转]JAVA设计模式之单例模式
  8. ubuntu下matplotlib画图中文乱码问题
  9. 如何将cmd中命令输出保存为TXT文本文件
  10. JavaScript 权威指南第6版 - [阅读笔记]
  11. android 09
  12. 页面点击关闭弹出提示js代码
  13. Spring Boot 部署与服务配置
  14. jenkins+ANT+jmeter 接口测试环境搭建
  15. EntityFramework附加实体
  16. C++11 并发指南三(std::mutex 详解)
  17. linux系统下find命令的使用
  18. centos6.5重新调整/home和跟目录/大小
  19. C# EditPlus环境设置
  20. Java多线程编程之单例模式

热门文章

  1. Apache 的 http-default.conf 详解
  2. 026-Cinder服务-->使用NFS作为后端存储
  3. 023-OpenStack 创建实例类型临时磁盘的讲解
  4. PAT Advanced 1036 Boys vs Girls (25 分)
  5. 工作中常用到的linux命令总结
  6. 排序二叉树、平衡二叉树、红黑树、B+树
  7. 前端每日实战:122# 视频演示如何用纯 CSS 创作一个苹果系统的相册图标
  8. 题解 P3166 【[CQOI2014]数三角形】
  9. python-第三方包的安装和升级和卸载
  10. 【NOIP2016提高A组8.11】种树