📄 doubleint_mdp.m

📁 approximate reinforcement learning

💻 M

字号:

function [xplus, rplus] = doubleint_mdp(m, x, u)
% Implements the discrete-time dynamics of a double integrator with
% bounded position and velocity, controlled in acceleration.
%  [XPLUS, RPLUS] = DOUBLEINT_MDP(M, X, U)
%
% This function conforms to the specifications established by SAMPLE_MDP.

% 
% limit torque
u = max(-m.phys.maxu, min(m.phys.maxu, u));

if m.disc.method == 1,        % ODE
    odet = m.disc.odet;           % we only care about solution at final time
	[odet odey] = feval(m.disc.odesolver, @doubleint_trans, odet, x, m.disc.odeopt, m.phys, u);
    xplus = odey(end, :)';      % pick up the state at t + Ts

elseif m.disc.method == 2,	% discrete ODE
    odet = m.disc.odet;
    odey = feval(m.disc.odesolver, @doubleint_trans, odet, x, m.phys, u);
    xplus = odey(end, :)';      % pick up the state at t + Ts
        
elseif m.disc.method == 3,	% Euler, x(k+1) = x(k) + Ts f(x(k), u(k))
    xplus = x + m.disc.Ts * doubleint_trans(0, x, m.phys, u);
end;

% Bound the state
xplus = max(-m.phys.maxx, min(m.phys.maxx, xplus));

% Reward - QR component, plus optionally the band reward
rplus = -xplus' * m.goal.Q * xplus - u * m.goal.R * u + ...
        m.goal.zeroreward * all(abs(x) <= m.goal.zeroband);

% END FUNCTION -----


function xdot = doubleint_trans(t, x, p, u)
%  Implements the transition function of the double integrator
%
%  Parameters:
%   T           - the current time. Ignored, added for compatibility with
%               the ode function style.
%   X           - the current state
%   P           - the physical parameters
%   U           - the command
%
%  Returns:
%   XDOT        - the derivative of the state
%

xdot = [x(2); -p.b*x(2) + u];

💿 文件大小 64 K

👤 上传用户 sky8997991

📂 所属分类数学计算

🏷️ 相关标签

#reinforcement #approximate #learning

⌨️ 快捷键说明

复制代码 Ctrl + C

搜索代码 Ctrl + F

全屏模式 F11

切换主题 Ctrl + Shift + D

显示快捷键 ?

增大字号 Ctrl + =

减小字号 Ctrl + -