📄 cparameters.h

📁 强化学习算法（R-Learning）难得的珍贵资料
💻 H
📖 第 1 页 / 共 2 页
字号:
上一页 12
	All parameters of the given object gets added with the prefix to the parameter set.
	*/
	virtual void addParameters(CParameterObject *parameters, string prefix = "");

	/// reset all adaptive parameter calculators, this is needed when you want to restart learning.
	/** 
	Calls resetCalculators for all adaptive Parameters in the map.
	*/
	virtual void resetParameterCalculators();

	/// Returns the parameters value
	/**
	If there is the specified parameter is not adaptive, the function returns the constant parameter value (see CParameters). Otherwise the value is calculated by the adaptive parameter calculator.
	*/
	virtual rlt_real getParameter(string name);

	/// Add an adaptive parameter calculator for a given parameter
	/**
	 The adaptive Parameter calculator is added to the adaptiveParameters map, so getParameter can check wether an adaptive parameter calculator is defined for the specified calculator.
	 Be aware that the adaptive parameter calculator is always definied only for the current object in the parameter object hierarchy. So if you set an adaptive parameter for the Parameter "Lambda" in a TD-Learner object, it won't affect the etraces, where the paremeter initially belong. So you have to set the adaptive parameter for the etrace object directly. The calculators often have parameters themself (like the parameter scale or offset), these parameters are certainly added to the parameter object. When an adaptive parameter calculator is set to a parameter, the parameters name is used as prefix for the parameters of the adaptive parameter calculator.  
	 */
	virtual void addAdaptiveParameter(string name, CAdaptiveParameterCalculator *paramCalc);
	/// Remove the adaptive parameter again
	/**
	The parameters constant value is used again.
	*/
	virtual void removeAdaptiveParameter(string name);

	/// returns true if all parameters are the same
	virtual bool operator == (CParameters &parameters);
};

/// Interface for all adaptive Parameter Calculators
/** For each parameter you can specify an adaptive parameter calculator (APC), which calculates the parameter value each time it is retrieved. Now, each time the parameter's value is requested by "getParameter" the calculated value of the adaptive parameter calculator is returned instead of the constant rlt_real value of the parameter map. This is useful for example for adapting the learning rate or the exploration of a policy. The parameter's value can depend on any other value like the number of steps or episodes or even the current average reward.
Adaptive Parameter Calculators also have same parameters too, all parameters of the Adaptive Parameter Classes begin with the prefix "AP". When an adaptive parameter calculator is set to a parameter, the parameters name is used as prefix for the parameters of the adaptive parameter calculator. So the parameter "APFunctionKind" gets to the parameter "VLearningRateAPFunctionKind" if you specify a APC for the parameter "VLearningRate".
The interface CAdaptiveParameterCalculator already includes the parameter "APFunctionKind", the functionkind property is used to determine which function shall be used to transform the targetvalue in the parametervalue. The targetvalue can be the number of learning steps, number of episodes, the current value of a V-Function or the average reward. See the subclasses for more details. There are 6 different functionkinds implemented.
- Linear Function (LINEAR, 1) 
- Square Function (SQUARE, 2)
- Logarithm Function (LOG, 3)
- Fraction (FRACT, 4)
- Squared Fraction (FRACTSQUARE, 5)
- Logarithm Fraction (FRACTLOG, 6)
All these functions are used in a slightly different way for the 2 main subclasses, CAdaptiveParameterBoundedValuesCalculator and CAdaptiveParameterUnBoundedValuesCalculator. For more details see these classes.
Parameters of CAdaptiveParameterCalculator:
"APFunctionKind": Defining the function to transform target value into the parameter value.
@see CAdaptiveParameterUnBoundedValuesCalculator
@see CAdaptiveParameterBoundedValuesCalculator
*/
class CAdaptiveParameterCalculator : virtual public CParameterObject
{
protected:
	/// The targetvalue is stored here
	rlt_real targetValue;

	/// The function kind is stored here
	/**
	 The parameter "APFunctionKind" isn't used for performance reasons, functionKind is updated each time the "APFunctionKind" parameter changes (in the function onParameterChanged())
	 */
	int functionKind;

public:
	CAdaptiveParameterCalculator(int functionKind);
	virtual ~CAdaptiveParameterCalculator();

	/// Interface for all adaptive Parameter Calculators
	virtual rlt_real getParameterValue() = 0;

	/// Reset the targetValue 
	/**
	This function is used for resetting for example the steps or number of episodes when learning is restarted. (used for parameter evaluation)
	*/
	virtual void resetCalculator() {targetValue = 0.0;};
	/// Updates functionKind according to the parameter "APFunctionKind"
	virtual void onParametersChanged();

};

/// Super class for all classes which use bounded target values
/**
The subclasses of theses class use bounded target values. These are for example the average reward or the value of a V-Function. For Bounded Target values you can define a minimum and a maximum value of the target (Parameters: "APTargetMin", "APTargetMax"). For example if the reward is supposed to be between -1 and 0 you can define these values as minimum and maximum target values for the average reward adaptive parameter calculator. The intervall [targetmin, targetmax] of the target value gets normalized to the intervall [0,1]. This intervall can be scaled by the parameter "APTargetScale". After the normalization the function defined by the parameter functionKind gets applied. The 6 different functions are calucalated the following way:
- LINEAR: f(x) = x
- SQUARE: f(x) = x^2
- LOG: f(x) = log(x * targetScale + 1.0) / log(1.0 + targetScale);
- FRACT: f(x) = f(x) = (1.0 / (x * targetScale + 1.0) - 1.0 / (targetScale + 1.0)) * (1.0 + targetScale) /targetScale;
- FRACTSQUARE: f(x) = (1.0 / (x^2* targetScale^2 + 1.0) - 1.0 / (targetScale^2 + 1.0)) * (1.0 + targetScale^2) /targetScale^2;
- FRACTLOG : offset = 1.0 / (1.0 + log(1.0 + targetScale));
			 f(x) = (1.0 / (1.0 + log(x * targetScale + 1.0)) - offset) / (1 - offset);
All the functions are scaled so that there function values are again in the intervall [0,1]. So scaling the targetintervall is only useful if log or fract functions are used (so you can set the steepness of the slope of this functions).
The result can be inverted (1 - x) if the Parameter "APInvertTargetFunction" is true (1.0). This value is now scaled ("APParamScale") and an offset gets added ("APParamOffset"), so the resulting parameter value is calculated with the formular param = param_offset + param_scale * f(normalized_targetvalue), resp. param = param_offset + param_scale * (1 - f(normalized_targetvalue)). These gives you much degree of freedom to design your adaptive parameter calculator.
The values of the parameters "APInvertTargetFunction", "APParamScale", "APParamOffset", APTargetMin" and "APTargetMax" are again stored in own data element for performance reasons and updated by the function onParameterChanged.
See the subclasses for the different target values.
Parameters of CAdaptiveParameterBoundedValuesCalculator:
- "APFunctionKind": Defining the function to transform target value into the parameter value.
- "APInvertTargetFunction": Boolean value wether to invert target function or not
- "APParamScale": Scale of the parameter value
- "APParamOffset": Parameter Value offset
- "APTargetMin": Minimum value of the target
- "APTargetMax": Maximum value of the target
- "APTargetScale": Scale for the targetValue, so the targetValue is in the intervall [0, targetScale].
*/
class CAdaptiveParameterBoundedValuesCalculator : public CAdaptiveParameterCalculator
{
protected:
	rlt_real targetMin;
	rlt_real targetMax;

	rlt_real targetScale;

	rlt_real paramOffset;
	rlt_real paramScale;

	bool invertTarget;

public:
	CAdaptiveParameterBoundedValuesCalculator(int functionKind, rlt_real paramOffset, rlt_real paramScale, rlt_real targetMin, rlt_real targetMax);
	virtual ~CAdaptiveParameterBoundedValuesCalculator();

	/// Sets the targetValue to the targetMin value
	virtual void resetCalculator();
	/// Updates all data elements represents parameters
	virtual void onParametersChanged();

/// Returns the value of the parameter
/**
The value of the parameter is calculated the follwing way:
- param = param_offset + param_scale * f(normalized_targetvalue)
- param = param_offset + param_scale * (1 - f(normalized_targetvalue)) for inverted function values (APInvertTargetFunction)
For more details see class description.
*/
	virtual rlt_real getParameterValue();

};

/// Super class for all classes which use unbounded target values
/**
The subclasses of theses class use unbounded target values. These are for example the number of steps or the number of learned episodes. For unbounded target values you can define an offset and a scale value for the target (Parameters: "APTargetMin", "APTargetMax"). The target value is then transformed the following way x = target_offset + target_scale * target. After the transformation the function defined by the parameter "APFunctionKind" gets applied. The 6 different functions are calucalated the following way:
- LINEAR: f(x) = x
- SQUARE: f(x) = x^2
- LOG: f(x) = log(x + 1.0)
- FRACT: f(x) = (1.0 / (x + 1.0))
- FRACTSQUARE: f(x) = (1.0 / (x^2 + 1.0)
- FRACTLOG : f(x) = 1.0 / (1.0 + log(x + 1.0));
The result can be now again scaled and an offset can be added ("APParamScale", "APParamOffset"), so the resulting parameter value is calculated with the formular param = param_offset + param_scale * f(transformed_targetvalue). These gives you much degree of freedom to design your adaptive parameter calculator.
The values of the parameters "APParamScale", "APParamOffset", APTargetMin" and "APTargetMax" are again stored in own data element for performance reasons and updated by the function onParameterChanged.
See the subclasses for the different target values.
Parameters of CAdaptiveParameterBoundedValuesCalculator:
- "APFunctionKind": Defining the function to transform target value into the parameter value.
- "APParamScale": Scale of the parameter value
- "APParamOffset": Parameter Value offset
- "APTargetScale": Scale value of the target
- "APTargetOffset": Offset value of the target
*/
class CAdaptiveParameterUnBoundedValuesCalculator : public CAdaptiveParameterCalculator
{
protected:
	rlt_real targetOffset;
	rlt_real targetScale;

	rlt_real paramOffset;
	rlt_real paramScale;

	rlt_real paramLimit;

public:
	CAdaptiveParameterUnBoundedValuesCalculator(int functionKind, rlt_real param0, rlt_real paramScale, rlt_real targetOffset, rlt_real targetScale);
	virtual ~CAdaptiveParameterUnBoundedValuesCalculator();

	/// Updates all data elements which represents parameters
	virtual void onParametersChanged();

	/// Returns the value of the parameter
	/**
	The value of the parameter is calculated the follwing way:
	- param = param_offset + param_scale * f(target_offset + target_scale * target)
	For more details see class description.
	*/
	virtual rlt_real getParameterValue();
};


#endif
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -