📄 standard interface classes.txt

📁 CMAC神经网络机械臂控制的设计matlab源码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
struct State 
Data structure for state description. 
Synopsys: #include "interface_classes.h"
Link: sarepr.cpp

Public data members: 
static int dimensionality 
Number of state variables. Usage: State::dimensionality. 
double* x 
Array of state variables' values. 
Public methods: 
State() 
Default constructor. Creates an object with dimensionality variables. 
State(int n) 
Construct a "super-state" with n*dimensionality variables. 
State(State& s) 
Copy constructor. 
void operator = (const State& s) 
Assignment operator. 
~State() 
Destructor. 
ostream& operator << (ostream& file, const State& s) 
Overloaded operator for state output to a file or cout.
Usage:       cout << s << endl; 

--------------------------------------------------------------------------------

struct Action 
Data structure for descrete action description. 
Synopsys: #include "interface_classes.h"
Link: sarepr.cpp

Public data members: 
int id 
By default is assigned the ordinal number as the actions are added to some action set. In other words its value coincides with the action's array index in the action set to which it belongs. (see ActionSet declaration below). 
char* description 
An action may be given a character description, name. 
double value 
Numerical value of the action. 
static int count 
Total number of Action objects created. 
Public methods: 
Action() 
Default constructor. 
Action(const char* d ) 
General constructor. Creates an object with name d . 
Action(const char* d, double v) 
General constructor. Creates an object with name d and numerical value v. 
Action(Action& a) 
Copy constructor. 
~Action() 
Destructor. 
void operator = (const Action& b) 
Assignment operator. 
ostream& operator << (ostream& file, const Action& a) 
Overloaded operator for action output to a file or cout.
Usage:       cout << a << endl; 

--------------------------------------------------------------------------------

struct ActionSet 
Data structure for an action set consisting of objects of Action type. 
Synopsys: #include "interface_classes.h"
Link: sarepr.cpp

Public data members: 
int size 
Maximum number of actions in the set. 
Action* action 
Array of actions that belong to this set. 
int added 
Indicates how many actions have already been added to the set. 
Public methods: 
ActionSet() 
Default constructor. 
ActionSet(int n) 
General constructor. Creates an action set of maximum size n . Actions should be added later with addAction method. 
void create(int n) 
Creates an action set of maximum size n; can be called to set a proper maximum size after object is created with the default constructor. 
void addAction(Action& a) 
Add action a to the action set. 
~ActionSet() 
Destructor. 
void operator = (const ActionSet& a) 
Assignment operator. 

--------------------------------------------------------------------------------

class Environment 
Abstract class for the environment of an RL systaem. 
Synopsys: #include "interface_classes.h"
Link: environment.cpp and random_numbers.cpp

Public methods: 

Environment() 

Default constructor. Seeds and initiates random number generator used for most random sampling associated with the environment. This constructor is automatically called when objects of the derived classes are created.


virtual void startState(State& start, bool& terminal) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
samples a start state according to a desired start state distribution and returns it to a caller with startargument; 
sets CurrentState data member to the sampled start state; 
returns an indication of whether the sampled start state is terminal by assigning boolean true or false value to terminal argument. 



virtual void setState(const State& s, bool& terminal) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
sets the CurrentState data member to state s; 
returns an indication of whether the sampled start state is terminal by assigning boolean true or false value to terminal argument. 


virtual void transition(const Action& action, State& s_new, double& r_new, bool& terminal) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
implements a transtion from CurrentState in responce to the action specified by action argument; 
updates data members CurrentState, CurrentAction and reward; 
returns values to the agent 
s_new - new state; 
r_new - reward after performed transition; 
terminal - an indication of whether the new state s_new is terminal (boolean true or false value). 


virtual bool applicable(const State& s, const Action& a) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Checks if action a is applicable in state s. 


virtual void bound(int i, bool& bounded, double& left, double& right) 

Pure virtual function: must implemented by a derived class. 
Expected functionality: 
If ith state variable is bounded, then bounded is assigned true value and left and right arguments return corresponding bounds; otherwise bounded is assigned false value. 


void getStateSpaceBounds(double* left, double* right) 

Returns bounds on state variables. left - array of left bounds; right - array of right bounds. Both arrays should be of size State::dimensionality. 

virtual void uniformStateSample(State& s) 

Pure virtual function: must implemented by a derived class. 
Expected functionality: 
Implements uniform state space sampling. 


void computeAttributes(Attributes& att, const State& startState, int Steps, int Transitions, const int* n, const ActionSet& as, StateActionFA* fa=NULL) 

Computes off-line global values of the MDP attributes for the state distribution as on the trajectory under some policy.
Arguments: 
att - attributes structure to return computed values; 
startState - state from which to start a trajectory; 
Steps - maximum number of steps on the trajectory; 
Transitions - number of sample transitions from each state; 
n - array of size State::dimensionality indicating into how many intervals each state variable should be discretized for the approximate calculation of attributes; 
as - action set for the current RL system; 
fa - pointer to the architecture that contains action value functions for each action. According to these value functions, greedy policy will be executed. This is an optional argument: if it is not specified when the function is called or explicitly passed NULL value, uniformly random policy is followed. 


void computeAttributes(Attributes& att, int SampleSize, int Transitions, const int* n, const ActionSet& as) 

Computes off-line global values of the MDP attributes for the uniform state distribution.
Arguments: 
att - attributes structure to return computed values; 
startState - state from which to start a trajectory; 
SampleSize - number of uniformly distributed state samples in which attribute values are computed and then averaged; 
Transitions - number of sample transitions from each state; 
n - array of size State::dimensionality indicating into how many intervals each state variable should be discretized for the approximate calculation of attributes; 
as - action set for the current RL system; 


Protected methods: 


void chooseAction(double epsilon, StateActionFA* fa, const ActionSet& actions, const State& s, Action& a) 

Implements an epsilon-greedy strategy based on action value functions in the architecture pointed to by fa. If this pointer is NULL, the uniformly random policy is implemented.
Arguments: 
epsilon - parameter for the epsilon-greedy strategy; 
fa - pointer to the architecture containing action value functions or NULL for uniformly random policy; 
actions - action set for the current RL system; 
s - state in which to choose action; 
a - returns chosen action. 


Protected data members: 


State CurrentState 
Current state. 
Action CurrentAction 
Last action performed by the agent (which led to the CurrentState) 
double reward 
Reward after the last transition. 
static long idum 
Variable used by a random number generator implemented in file random_numbers.cpp . 
static bool seeded 
Indicates if the random number generator has been seeded during this program run. It is assigned the value true the first time an object of any derived classor a pointer to an object of type Environment or of any derived class is declared. This ensures that random number generator is seeded once and only once during each program run. 

--------------------------------------------------------------------------------

class Approximator 
Abstract class - base for all function approximation (FA) methods. 
Synopsys #include "interface_classes.h" 
Public methods: 

virtual int getSize() 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Returns the number of tunable parameters in this architecture. 

virtual void predict(const State& s, double& output) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Computes an output value and returns it by the output argument for input s. 

virtual void learn(const State& s, const double target) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Learns an input-output pair, where s is the input and target - desired value for that input. 

virtual void computeGradient(const State& s, double* GradientVector) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Compute the gradient with respect to architecture's parameters at the current parameters' values and input s. Return gradient in array GradientVector. 

virtual void updateParameters(double* delta) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Update tunable parameters by amounts in delta array, possible multiplied by appropriate learning steps. 

virtual void replaceTraces(const State& s, double replace) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
If Temporal Difference Reindorcement Learning is implemented, this function is expected to replace traces of parameters, activated by input state s to value replace. 
In systems where this functionality will not be used, a programmer may leave the implementation of this function empty in the derived classes. 

virtual void decayTraces(double factor) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
If Temporal Difference Reindorcement Learning is implemented, this function is expected to decay (multiply) traces of all tunable parameters by factor.
In systems where this functionality will not be used, a programmer may leave the implementation of this function empty in the derived classes. 

virtual void accumulateTraces(const State& s, double amount) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
If Temporal Difference Reindorcement Learning is implemented, this function is expected to increment traces of tunable parameters activated by input s by amount. 
In systems where this functionality will not be used, a programmer may leave the implementation of this function empty in the derived classes. 

virtual void setArchitectureParameters(int argc, char *argv[]) 

Pure virtual function: must be implemented by a derived class.
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -