📄 standard interface classes.txt
字号:
Expected functionality:
Loads parameters of the architecture from a text file. Command-line-like parameters allow for a flexible argument list. Here, argc is the number of arguments supplied in array argv, where each item of the array is a string. E.g., argv may contain one item, which is the name of the file from which the architecture's parameters should be read. In this case argc=1.
virtual void saveArchitectureParameters(int argc, char *argv[])
Pure virtual function: must be implemented by a derived class.
Expected functionality:
Saves parameters of the architecture into a text file. Command-line-like parameters allow for a flexible argument list. Here, argc is the number of arguments supplied in array argv, where each item of the array is a string. E.g., argv may contain one item, which is the name of the file to which the architecture's parameters should be saved. In this case argc=1.
virtual void setLearningParameters(int argc, char *argv[])
Pure virtual function: must be implemented by a derived class.
Expected functionality:
Sets learning parameters (e.g. learning step). Command-line-like parameters allow for a flexible argument list. Here, argc is the number of arguments supplied in array argv, where each item of the array is a string.
double getMaxParameterChange()
Returns the maximum change in the tunable parameters in the course of learning, since the beginning of learning or since the last time this function was called.
int getNumberParametersChanged()
Retruns the number of tunable parameters affected by learning so far (since the beginning of learning).
virtual ~Approximator()
Destructor.
Protected data members:
double MaxParameterChange
Maximum change (in absolute value) in the tunable parameters as a result of learning. Can be updated in learn function.
int NumberParametersChanged
Number of tunable parameters affected by learning so far (since the beginning of learning). Can be updated in learn function.
--------------------------------------------------------------------------------
class StateActionFA
Fully implemented class - intended as a holder for several approximators - one for each action in some set of actions.
Synopsys #include "interface_classes.h"
Link: safa.cpp
Public methods:
StateActionFA()
Default constructor.
StateActionFA(int n, Approximator** f)
General constructor. Constructs an object that holds n architectures (of type Approximator). The pointers to those architectures are passes in the array of pointers f. They should be pointers to the objects of the class derived from Approximator. Some of the pointers may be NULL, however. In this case, calling other functions for those architectures will result in no action.
int getSize()
Returns the number of tunable parameters in one of the component architectures (assuming that all of them have the same number of parameters).
void getMaxParameterChange(double* changes)
For each component architecture, returns the maximum change in the tunable parameters in the course of learning, since the beginning of learning or since the last time this function was called. The values are returned in the array changes.
void getNumberParametersChanged(int* changes)
For each component architecture, returns the number of tunable parameters affected by learning so far (since the beginning of learning). he values are returned in the array changes.
void predict(const Action& a, const State& s, double& output)
Computes the output value with an approximator corresponding the action a. Input to this architecture is provided in s and the output is returned with output
void learn(const Action& a, const State& s, double target)
Learns an input-output pair with an approximator corresponding to the action a. Input to this architecture is provided in s and the desired target in target
void computeGradient(const Action& a, const State& s, double* GradientVector)
Compute the gradient with respect to tunable parameters for the component architecture corresponding to the action a. The gradient is computed at the current parameters' values and input s and returned in array GradientVector.
void updateParameters(const Action& a, double* delta)
Update tunable parameters for the component architecture corresponding to the action a by amounts in delta array (possibly multiplied with appropriate learning step).
void clearTraces(const Action& a, const State& s, double replace)
If Temporal Difference Reindorcement Learning is implemented, this function clears (sets to replace) traces of the tunable parameters for those component architectures correponding to actions that were not taken in state s. The action taken in s is passed in a.
void replaceTraces(const Action& a, const State& s, double trace)
If Temporal Difference Reindorcement Learning is implemented, this function replaces traces of tunable parameters activated by input s of the component architecture corresponding to action a.
void decayTraces(double factor)
If Temporal Difference Reindorcement Learning is implemented, this function decays (multiplies) traces of all tunable parameters for all component architectures.
void accumulateTraces(const Action& a, const State& s, double amount)
If Temporal Difference Reindorcement Learning is implemented, this function increments traces of tunable parameters activated by s for the component architecture correponding to action a by amount.
void setArchitectureParameters(const Action& a, int argc, char *argv[])
Loads parameters of the architecture corresponding to action a from a text file. The arguments argc and argv should have the same meanning as for the similar function in the derived class of component architectures.
void saveArchitectureParameters(const Action& a, int argc, char *argv[])
Saves parameters of the component architecture corresponding to action a to a text file. The arguments argc and argv should have the same meanning as for the similar function in the derived class of component architectures.
void saveAllArchitectureParameters(char** fileNames)
Saves parameters of all component architectures to text files with names provided in the array of strings fileNames.
void setLearningParameters(const Action& a, int argc, char *argv[])
Sets learning parameters of the component architecture corresponding to action a. The arguments argc and argv should have the same meanning as for the similar function in the derived class of component architectures.
void setAllLearningParameters(int argc, char *argv[])
Sets (the same) learning parameters of all component architectures. The arguments argc and argv should have the same meanning as for the similar function in the derived class of component architectures.
~StateActionFA()
Destructor.
--------------------------------------------------------------------------------
class Agent
Abstract class for the Rl agent.
Synopsys #include "interface_classes.h"
Link: agent.cpp
Public methods:
Agent(double g, const ActionSet& a_s, StateActionFA* const f, Environment* const e)
General constructor. Constructs an object with the following parameters:
e - pointer to the environment in which agent operates.
int initTrial(int N, bool learning, bool SaveTrajectory, const State* s, char* fileName = NULL, bool ComputeBellmanError = false)
Functionality:
Initiates and calls appropriate functions to conduct a trial (sequence of environment-agent interactions) of the maximum length N;
The argument learning indicates whether the agent shoul learn during the trial or not. Use global constants true and false to pass appropriate values. If learning==true, actAndLearn() function is called, otherwise act() function is called;
The argument SaveTrajectory indicates whether the trajectory should be saved or not. Use global constants true and false to pass appropriate values. If SaveTrajectory==true, the trajectory issaved to the text file with the name fileName. The argument fileName is optional, so if trajectory does not have to be saved, the value for this argument can be omitted (provided that the value for ComputeBellmanError is also omitted).
If the optional argument s is specified, the environment's current state is set to that state and the trial is started from that state. Otherwise, the start state is sampled from the environment's start state distribution, implemented in Environment::startState() function. The value for s argument can be omitted only if the values forboth arguments fileName and ComputeBellmanError are also omitted.
The optional argument ComputeBellmanError indicates if estimated Bellman Error should be computed. The Bellman Error is computed only if learning==false at the states on the trajectory and average over those states is stored in the BellmanError protected data, assuming that the computation is implemented in the act() function of the derived class.
Returns the number of steps actually performed during the trial.
double getReturn()
Function returns the RL agent's return collected during the last trial.
double getBellmanError()
Returns the last estimate of the BellmanError.
virtual void setLearningParameters(int argc, char *argv[])=0
Pure virtual function: must be implemented by a derived class.
Expected functionality:
Sets parameters of the RL learning algorithm (e.g. epsilon, lambda, etc.). Command-line-like parameters allow for a flexible argument list. Here, argc is the number of arguments supplied in array argv, where each item of the array is a string.
void setArchitectureParameters(const Action& a, int argc, char *argv[])
From a text file, loads parameters of the architecture representing either value function of action a or probability distribution of some policy for action a. The arguments argc and argv should have the same meanning as for the similar function in the class used for the function approximation architecture.
void saveArchitectureParameters(const Action& a, int argc, char *argv[])
Saves parameters of the architecture representing either value function of action a or probability distribution of some policy for action a to a text file. The arguments argc and argv should have the same meanning as for the similar function in the class used for the function approximation architecture.
virtual ~Agent()
Destructor.
Protected methods:
virtual int act(int N, bool SaveTrajectory, bool ComputeBellmanError)
Pure virtual function: must be implemented by a derived class.
Expected functionality:
Implements maximum of N successive steps of the trial or until a terminal state is entered by the environment;
Computes return collected on this trial;
The argument SaveTrajectory indicates whether to save trajectory. Use global constants true and false to pass appropriate values. If SaveTrajectory==true, the trajectory is stored in a local data structure and then saved to a text file by function initTrial().
ComputeBellmanError indicates whether (estimated) Bellman error should be computed for the state action pairs on the trajectory. Use global constants true and false to pass appropriate values. If ComputeBellmanError==true stores the estimate in the data member BellmanError, which can be accessed with the function getBellmanError().
virtual int actAndLearn(int N, bool SaveTrajectory)
Pure virtual function: must be implemented by a derived class.
Expected functionality:
Implements maximum of N successive steps of the trial with learning or until a terminal state is entered by the environment;
Computes return collected on this trial;
The argument SaveTrajectory indicates whether to save trajectory. Use global constants true and false to pass appropriate values. If SaveTrajectory==true, the trajectory is stored in a local data structure and then saved to a text file by function initTrial().
virtual void chooseAction(const State& s, Action& a)
Pure virtual function: must be implemented by a derived class.
Expected functionality:
Implements behavior policy: chooses action a in state s. Uses FA representation, fa, of the action value functions or a randomized policy.
Protected data members:
State CurrentState
Current state of the environment.
Action CurrentAction
Action chosen in the current state.
bool terminal
Indicates if the current state is terminal.
double CurrentReward
Reward obtained after the transition.
const ActionSet& actions
Action set of the RL system.
StateActionFA* const fa
Pointer to an arcitecture representing either action-value functions or a randomized policy.
double gamma
Discount factor.
double Return
Return collected during a trial.
Environment* const env
Pointer to the environment object.
int* ApplicableActions
Array that can be used by chooseAction() function.
Trajectory* trajectory
Data structure to store trajectory if it has to be saved.
double BellmanError
BellmanError estimate.
Component structures:
struct Trajectory
Data structure to save a trajectory.
Public data members:
StageInfo* stage
Data structure for the information about every stage on the trajectory.
int length
Actual length of the recorded trajectory.
Public methods:
Trajectory(int n)
General constructor. Construct an object with the maximal number n of stages in the trjectory.
~Trajectory()
Destructor.
struct StageInfo
Data structure to save information about one stage on a trajectory.
Public data members:
State state
Action action
double reward
double* Qvalue
Array of action values for this state.
double TDerror
Estimated TD error for this state-action pair.
Public methods:
StageInfo()
Default constructor. Allocates memory for array Qvalue.
~StageInfo()
Destructor.
--------------------------------------------------------------------------------
Globally defined constants:
Boolean constants:
Synopsys: #include "interface_classes.h"
true=1
false=0
--------------------------------------------------------------------------------
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -