q_from_v.m
来自「实现马尔可夫决策过程模型的算法」· M 代码 · 共 12 行
M
12 行
function Q = Q_from_V(V, T, R, discount_factor)% Q(s,a) = R(s,a) + sum_s' T(s,a,s') * gamma * V(s')S = size(T,1);A = size(T,2);Q = zeros(S,A);for a=1:A Q(:,a) = R(:,a) + squeeze(T(:,a,:))*discount_factor*V;end
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?