📄 nav_problem.m

📁 approximate reinforcement learning

💻 M

字号:

function out = nav_problem(what)% Navigation problem setup.%   OUT = NAV_PROBLEM(WHAT)% Navigation problem setup. A point mass has to be steered to the origin in a two-dimensional% continuous state space. There exists friction, and the friction coefficient is dependent on% the position such that "stickier" and "smoother" areas exist into the world.%% This function conforms to the specifications established by SAMPLE_PROBLEM.% damping configuration selector: hardcodeddsel = 'gh';switch dsel,    case 'ct',                      % Damping configuration 1: Constant damping        gh.fun = 'damp_constant';        gh.b = 0.5;        Xgrid = [-5 -2 -0.3 -.1 0 .1 0.3 2 5];  Ygrid = Xgrid;        case 'gh',                      % Damping configuration 1: Varying friction, 2 "hills"        gh.fun = 'damp_gaussianhills';        gh.b0 = 0.5;        gh.a = [8 8];        gh.c = [0 -2.3; 4.7 1]';        gh.sigma = [2.5 1.5; 1.5 2]';        gh.n = length(gh.a);        sigmamult = [-1.5 -.8 0 .8 1.5];        % base grids        Xgrid = [-5 -0.3 0 0.3 5]; Ygrid = Xgrid;        % add significant poins around the hills        for i = 1:gh.n,            Xgrid = [Xgrid gh.c(1,i)+gh.sigma(1,i)*sigmamult];            Ygrid = [Ygrid gh.c(2,i)+gh.sigma(2,i)*sigmamult];        end;        % order, remove duplicates up to .1 difference, bound        Xgrid = sortx(Xgrid, .099, -5, 5);        Ygrid = sortx(Ygrid, .099, -5, 5);end;    % damping config selectionTs = 0.2;speedgrid = [-2 0 2];ugrid = [-1 0 1];xgrids = {Xgrid, Ygrid, speedgrid, speedgrid};ugrids = {ugrid, ugrid};maxpos = max(Xgrid);maxspeed = max(speedgrid);maxu = max(ugrid);x0 = [rand(2, 1) * 2*maxpos - maxpos; 0; 0];gamma = 0.98;switch what    case 'model'               phys.maxx = [maxpos; maxpos; maxspeed; maxspeed];        phys.maxu = [maxu; maxu];               phys.damping = gh;                % reward specification: QR component        goal.Q = diag([0 0 0 0]);%         goal.Q = diag([.1 .1 0 0]);        goal.R = diag([0 0]);        % and zeroreward if state within zeroband        % (to provide extra incentive of reaching zero)        goal.zeroband = 0.2 * [1; 1; 1; 1];        goal.zeroreward = 10;        disc.method = 'ode';                    % 'ode', 'fixed-ode', 'euler'        disc.odesolver = 'ode45';               % solver function        disc.Ts = Ts;                           % [sec] sample time        disc.odesteps = 1;                      % # of steps per sample time            % derived variables for discretization        switch(disc.method)            case 'ode'                disc.method = 1;                disc.odet = [0 disc.Ts/2 disc.Ts];                disc.odeopt = odeset;            case 'fixed-ode'                disc.method = 2;                disc.odet = 0 : disc.Ts / disc.odesteps : disc.Ts;            case 'euler'                disc.method = 3;        end;        model.fun = @nav_mdp; model.plotfun = @nav_plot;        model.Ts = disc.Ts;        model.phys = phys; model.goal = goal; model.disc = disc;                out = model;    case 'fuzzy'        cfg.xgrids = xgrids;        cfg.ugrids = ugrids;        cfg.gamma = gamma;        cfg.x0 = x0;                out = cfg;        end;% END nav_problem(), RETURNING out ====================================

💿 文件大小 64 K

👤 上传用户 sky8997991

📂 所属分类数学计算

🏷️ 相关标签

#reinforcement #approximate #learning

⌨️ 快捷键说明

复制代码 Ctrl + C

搜索代码 Ctrl + F

全屏模式 F11

切换主题 Ctrl + Shift + D

显示快捷键 ?

增大字号 Ctrl + =

减小字号 Ctrl + -