📄 value_determination.m

📁 实现马尔可夫决策过程模型的算法

💻 M

字号:

function V = value_determination(p, T, R, discount_factor)% VALUE_DETERMINATION Solve Bellman's equation for a fixed policy % V = value_determination(p, T, R, discount_factor)S = size(T,1);A = size(T,2);% Extract the part of T and R which is specific to this policyTp = zeros(S,S); % Tp(s,s') = T(s, p(s), s')Rp = zeros(S,1); % Rp(s) = R(s, p(s))for a=1:A % avoid looping over S  ind = find(p==a); % the rows that use action a  if ~isempty(ind)    Tp(ind,:) = reshape(T(ind,a,:), length(ind), S);     Rp(ind) = R(ind,a);  endend% V = R + gTV  => (I-gT)V = R  => V = inv(I-gT)*RV = (eye(S) - discount_factor*Tp) \ Rp;%V = pinv(eye(S) - discount_factor*Tp) * Rp;

⌨️ 快捷键说明

复制代码 Ctrl + C

搜索代码 Ctrl + F

全屏模式 F11

切换主题 Ctrl + Shift + D

显示快捷键 ?

增大字号 Ctrl + =

减小字号 Ctrl + -