⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 standard interface classes.txt

📁 CMAC神经网络机械臂控制的设计matlab源码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
Expected functionality: 
Loads parameters of the architecture from a text file. Command-line-like parameters allow for a flexible argument list. Here, argc is the number of arguments supplied in array argv, where each item of the array is a string. E.g., argv may contain one item, which is the name of the file from which the architecture's parameters should be read. In this case argc=1. 

virtual void saveArchitectureParameters(int argc, char *argv[]) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Saves parameters of the architecture into a text file. Command-line-like parameters allow for a flexible argument list. Here, argc is the number of arguments supplied in array argv, where each item of the array is a string. E.g., argv may contain one item, which is the name of the file to which the architecture's parameters should be saved. In this case argc=1. 

virtual void setLearningParameters(int argc, char *argv[]) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Sets learning parameters (e.g. learning step). Command-line-like parameters allow for a flexible argument list. Here, argc is the number of arguments supplied in array argv, where each item of the array is a string. 

double getMaxParameterChange() 

Returns the maximum change in the tunable parameters in the course of learning, since the beginning of learning or since the last time this function was called. 

int getNumberParametersChanged() 

Retruns the number of tunable parameters affected by learning so far (since the beginning of learning). 

virtual ~Approximator() 
Destructor. 

Protected data members: 
double MaxParameterChange 
Maximum change (in absolute value) in the tunable parameters as a result of learning. Can be updated in learn function. 
int NumberParametersChanged 
Number of tunable parameters affected by learning so far (since the beginning of learning). Can be updated in learn function. 

--------------------------------------------------------------------------------

class StateActionFA 
Fully implemented class - intended as a holder for several approximators - one for each action in some set of actions. 
Synopsys #include "interface_classes.h"
Link: safa.cpp


Public methods: 

StateActionFA() 
Default constructor. 

StateActionFA(int n, Approximator** f) 

General constructor. Constructs an object that holds n architectures (of type Approximator). The pointers to those architectures are passes in the array of pointers f. They should be pointers to the objects of the class derived from Approximator. Some of the pointers may be NULL, however. In this case, calling other functions for those architectures will result in no action. 

int getSize() 

Returns the number of tunable parameters in one of the component architectures (assuming that all of them have the same number of parameters). 

void getMaxParameterChange(double* changes) 

For each component architecture, returns the maximum change in the tunable parameters in the course of learning, since the beginning of learning or since the last time this function was called. The values are returned in the array changes. 

void getNumberParametersChanged(int* changes) 

For each component architecture, returns the number of tunable parameters affected by learning so far (since the beginning of learning). he values are returned in the array changes. 

void predict(const Action& a, const State& s, double& output) 

Computes the output value with an approximator corresponding the action a. Input to this architecture is provided in s and the output is returned with output 

void learn(const Action& a, const State& s, double target) 

Learns an input-output pair with an approximator corresponding to the action a. Input to this architecture is provided in s and the desired target in target 

void computeGradient(const Action& a, const State& s, double* GradientVector) 

Compute the gradient with respect to tunable parameters for the component architecture corresponding to the action a. The gradient is computed at the current parameters' values and input s and returned in array GradientVector. 

void updateParameters(const Action& a, double* delta) 

Update tunable parameters for the component architecture corresponding to the action a by amounts in delta array (possibly multiplied with appropriate learning step). 

void clearTraces(const Action& a, const State& s, double replace) 

If Temporal Difference Reindorcement Learning is implemented, this function clears (sets to replace) traces of the tunable parameters for those component architectures correponding to actions that were not taken in state s. The action taken in s is passed in a. 

void replaceTraces(const Action& a, const State& s, double trace) 

If Temporal Difference Reindorcement Learning is implemented, this function replaces traces of tunable parameters activated by input s of the component architecture corresponding to action a. 

void decayTraces(double factor) 

If Temporal Difference Reindorcement Learning is implemented, this function decays (multiplies) traces of all tunable parameters for all component architectures. 

void accumulateTraces(const Action& a, const State& s, double amount) 

If Temporal Difference Reindorcement Learning is implemented, this function increments traces of tunable parameters activated by s for the component architecture correponding to action a by amount. 

void setArchitectureParameters(const Action& a, int argc, char *argv[]) 

Loads parameters of the architecture corresponding to action a from a text file. The arguments argc and argv should have the same meanning as for the similar function in the derived class of component architectures. 

void saveArchitectureParameters(const Action& a, int argc, char *argv[]) 

Saves parameters of the component architecture corresponding to action a to a text file. The arguments argc and argv should have the same meanning as for the similar function in the derived class of component architectures. 

void saveAllArchitectureParameters(char** fileNames) 

Saves parameters of all component architectures to text files with names provided in the array of strings fileNames. 

void setLearningParameters(const Action& a, int argc, char *argv[]) 

Sets learning parameters of the component architecture corresponding to action a. The arguments argc and argv should have the same meanning as for the similar function in the derived class of component architectures. 

void setAllLearningParameters(int argc, char *argv[]) 

Sets (the same) learning parameters of all component architectures. The arguments argc and argv should have the same meanning as for the similar function in the derived class of component architectures. 

~StateActionFA() 

Destructor. 

--------------------------------------------------------------------------------

class Agent 
Abstract class for the Rl agent. 
Synopsys #include "interface_classes.h"
Link: agent.cpp


Public methods: 

Agent(double g, const ActionSet& a_s, StateActionFA* const f, Environment* const e) 

General constructor. Constructs an object with the following parameters: 
e - pointer to the environment in which agent operates. 

int initTrial(int N, bool learning, bool SaveTrajectory, const State* s, char* fileName = NULL, bool ComputeBellmanError = false) 

Functionality: 
Initiates and calls appropriate functions to conduct a trial (sequence of environment-agent interactions) of the maximum length N; 
The argument learning indicates whether the agent shoul learn during the trial or not. Use global constants true and false to pass appropriate values. If learning==true, actAndLearn() function is called, otherwise act() function is called; 
The argument SaveTrajectory indicates whether the trajectory should be saved or not. Use global constants true and false to pass appropriate values. If SaveTrajectory==true, the trajectory issaved to the text file with the name fileName. The argument fileName is optional, so if trajectory does not have to be saved, the value for this argument can be omitted (provided that the value for ComputeBellmanError is also omitted). 
If the optional argument s is specified, the environment's current state is set to that state and the trial is started from that state. Otherwise, the start state is sampled from the environment's start state distribution, implemented in Environment::startState() function. The value for s argument can be omitted only if the values forboth arguments fileName and ComputeBellmanError are also omitted. 
The optional argument ComputeBellmanError indicates if estimated Bellman Error should be computed. The Bellman Error is computed only if learning==false at the states on the trajectory and average over those states is stored in the BellmanError protected data, assuming that the computation is implemented in the act() function of the derived class. 
Returns the number of steps actually performed during the trial. 

double getReturn() 

Function returns the RL agent's return collected during the last trial. 

double getBellmanError() 

Returns the last estimate of the BellmanError. 

virtual void setLearningParameters(int argc, char *argv[])=0 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Sets parameters of the RL learning algorithm (e.g. epsilon, lambda, etc.). Command-line-like parameters allow for a flexible argument list. Here, argc is the number of arguments supplied in array argv, where each item of the array is a string. 

void setArchitectureParameters(const Action& a, int argc, char *argv[]) 

From a text file, loads parameters of the architecture representing either value function of action a or probability distribution of some policy for action a. The arguments argc and argv should have the same meanning as for the similar function in the class used for the function approximation architecture. 

void saveArchitectureParameters(const Action& a, int argc, char *argv[]) 

Saves parameters of the architecture representing either value function of action a or probability distribution of some policy for action a to a text file. The arguments argc and argv should have the same meanning as for the similar function in the class used for the function approximation architecture. 

virtual ~Agent() 

Destructor. 

Protected methods: 

virtual int act(int N, bool SaveTrajectory, bool ComputeBellmanError) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Implements maximum of N successive steps of the trial or until a terminal state is entered by the environment; 
Computes return collected on this trial; 
The argument SaveTrajectory indicates whether to save trajectory. Use global constants true and false to pass appropriate values. If SaveTrajectory==true, the trajectory is stored in a local data structure and then saved to a text file by function initTrial(). 
ComputeBellmanError indicates whether (estimated) Bellman error should be computed for the state action pairs on the trajectory. Use global constants true and false to pass appropriate values. If ComputeBellmanError==true stores the estimate in the data member BellmanError, which can be accessed with the function getBellmanError(). 

virtual int actAndLearn(int N, bool SaveTrajectory) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Implements maximum of N successive steps of the trial with learning or until a terminal state is entered by the environment; 
Computes return collected on this trial; 
The argument SaveTrajectory indicates whether to save trajectory. Use global constants true and false to pass appropriate values. If SaveTrajectory==true, the trajectory is stored in a local data structure and then saved to a text file by function initTrial(). 

virtual void chooseAction(const State& s, Action& a) 

Pure virtual function: must be implemented by a derived class. 
Expected functionality: 
Implements behavior policy: chooses action a in state s. Uses FA representation, fa, of the action value functions or a randomized policy. 

Protected data members: 

State CurrentState 
Current state of the environment. 
Action CurrentAction 
Action chosen in the current state. 
bool terminal 
Indicates if the current state is terminal. 
double CurrentReward 
Reward obtained after the transition. 
const ActionSet& actions 
Action set of the RL system. 
StateActionFA* const fa 
Pointer to an arcitecture representing either action-value functions or a randomized policy. 
double gamma 
Discount factor. 
double Return 
Return collected during a trial. 
Environment* const env 
Pointer to the environment object. 
int* ApplicableActions 
Array that can be used by chooseAction() function. 
Trajectory* trajectory 
Data structure to store trajectory if it has to be saved. 
double BellmanError 
BellmanError estimate. 

Component structures: 

struct Trajectory 
Data structure to save a trajectory.


Public data members: 
StageInfo* stage 
Data structure for the information about every stage on the trajectory. 
int length 
Actual length of the recorded trajectory. 
Public methods: 
Trajectory(int n) 
General constructor. Construct an object with the maximal number n of stages in the trjectory. 
~Trajectory() 
Destructor. 

struct StageInfo 
Data structure to save information about one stage on a trajectory.


Public data members: 
State state 
Action action 
double reward 
double* Qvalue 
Array of action values for this state. 
double TDerror 
Estimated TD error for this state-action pair. 

Public methods: 
StageInfo() 
Default constructor. Allocates memory for array Qvalue. 
~StageInfo() 
Destructor. 

--------------------------------------------------------------------------------

Globally defined constants: 


Boolean constants: 
Synopsys: #include "interface_classes.h" 
true=1 
false=0 

--------------------------------------------------------------------------------

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -