📄 doubleint_problem.m

📁 approximate reinforcement learning

💻 M

字号:

function out = doubleint_problem(what)% Double integrator problem setup.%   OUT = DOUBLEINT_PROBLEM(WHAT)% This function conforms to the specifications established by SAMPLE_PROBLEM.maxx = [10; 2];maxu = .5;xgrids = {[-maxx(1):3:-4 -2 -1:0.5:1 2 4:3:maxx(1)], ...         -maxx(2):1:maxx(2)};ugrids = {maxu * [-1 0 1]};x0 = (2*rand(2, 1)) .* maxx - maxx;     % default state for replaysgamma = 0.98;switch what            case 'model'        phys.maxx = maxx;        phys.maxu = maxu;        phys.b = 0.02;                % reward specification: QR component        goal.Q = diag([1 0.5]);        goal.R = 0.05;        % and zeroreward if state within zeroband        % (to provide extra incentive of reaching zero)%         goal.zeroband = 0;%         goal.zeroreward = 0;        goal.zeroband = [0.05; 0.05];        goal.zeroreward = 10;        disc.method = 'ode';                    % 'ode', 'fixed-ode', 'euler'        disc.odesolver = 'ode45';               % solver function        disc.Ts = .2;                           % [sec] sample time        disc.odesteps = 1;                      % # of steps per sample time            % derived variables for discretization        switch(disc.method)            case 'ode'                disc.method = 1;                disc.odet = [0 disc.Ts/2 disc.Ts];                disc.odeopt = odeset;            case 'fixed-ode'                disc.method = 2;                disc.odet = 0 : disc.Ts / disc.odesteps : disc.Ts;            case 'euler'                disc.method = 3;        end;        fun = @doubleint_mdp; Ts = disc.Ts;        out = varstostruct('fun', 'Ts', 'phys', 'goal', 'disc');    case 'tiling'               tcfg.c = 8;        tcfg.init = 0;        tcfg.sigma = 1;             % used for random init only        tcfg.exact = 0;        tcfg.delta = [0.5 1];       % tiling max displacement        cfg.xgrids = xgrids;        cfg.ugrids = ugrids;        cfg.tcfg = tcfg;        cfg.x0 = x0;        cfg.gamma = gamma;                out = cfg;             case 'fuzzy'        % these are required by the algorithms        cfg.xgrids = xgrids;        cfg.ugrids = ugrids;        cfg.x0 = x0;        cfg.gamma = gamma;        out = cfg;end;% END doubleint_problem(), RETURNING out ====================================

💿 文件大小 64 K

👤 上传用户 sky8997991

📂 所属分类数学计算

🏷️ 相关标签

#reinforcement #approximate #learning

⌨️ 快捷键说明

复制代码 Ctrl + C

搜索代码 Ctrl + F

全屏模式 F11

切换主题 Ctrl + Shift + D

显示快捷键 ?

增大字号 Ctrl + =

减小字号 Ctrl + -