⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ctheoreticalmodel.h

📁 强化学习算法(R-Learning)难得的珍贵资料
💻 H
📖 第 1 页 / 共 2 页
字号:
// Copyright (C) 2003
// Gerhard Neumann (gerhard@igi.tu-graz.ac.at)

//                
// This file is part of RL Toolbox.
// http://www.igi.tugraz.at/ril_toolbox
//
// All rights reserved.
// 
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
//    notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
//    notice, this list of conditions and the following disclaimer in the
//    documentation and/or other materials provided with the distribution.
// 3. The name of the author may not be used to endorse or promote products
//    derived from this software without specific prior written permission.
// 
// THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
// IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
// OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
// IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
// NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
// THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

#ifndef C_DISCRETETHEORETICALMODEL_H
#define C_DISCRETETHEORETICALMODEL_H

#include "cagentlistener.h"
#include "cutility.h"
#include "cvfunction.h"
#include "cqfunction.h"
#include "cdiscretizer.h"
#include "clearndataobject.h"
#include "ril_debug.h"

#define TRANSITION 1
#define SEMIMDPTRANSITION 2

class CFeatureQFunction;

/// Transition for the Markov Case
/**	
CTransition represents a transition for the Markov case. Stores the start-state, the end-state and the probability of the transition. 
The type of the transition is TRANSITION. With the type fiel you can determine wether its a CTransition or a CSemiMDPTransition.
*/

class CTransition
{
protected:
	int startState;
	int endState;
	rlt_real propability;
	
	int type;
public:
	CTransition(int startState, int endState, rlt_real prop);

	int getStartState();
	int getEndState();

	virtual rlt_real getPropability();
	virtual void setPropability(rlt_real prop);

	virtual void loadASCII(FILE *stream, int fixedState, bool forward);
	virtual void saveASCII(FILE *stream, bool forward);

	virtual bool isType(int Type);
};

/// Transition for the semiMDP case
/**
CSemiMDPTranstion stores a Transition for the Semi-Markov case. Additionally has an list of possible durations and the probabilities of the duration,
so the probability of the duration multiplied with the transition probability is the probability of coming from state A to state B in "duration" steps, executing the given action.
The type field of CSemiMDP transition is set to SEMIMDPTRANSITION.
When adding a duration with a specific factor, all other duration factors get multiplied by 1 - factor, and then the given duration's factor is
added or a nes duration is added to the list if the duration wasn't member. When setting a duration all other durations factors
are multiplied by 1 - (factor - factor_{old}).

*/
class CSemiMDPTransition : public CTransition
{
protected:
/// the list of the durations
	std::map<int, rlt_real> *durations;
public:
	CSemiMDPTransition(int startState, int endState, rlt_real prop);
	virtual ~CSemiMDPTransition();

	std::map<int, rlt_real> *getDurations();

/// adds the given factor the  duration's factor
/**
When adding a duration with a specific factor, all other duration factors get multiplied by 1 - factor, and then the given duration's factor is
added or a nes duration is added to the list if the duration wasn't member.
*/
	void addDuration(int duration, rlt_real factor);
/// adds the duration's factor to the given factor
	void setDuration(int duration, rlt_real factor);
	rlt_real getDurationFaktor(int duration);
/// returns the propability of transition with the specified duration
/** The propability is getPropability() * getDurationFaktor(duration)*/
	rlt_real getDurationPropability(int duration);

	virtual void loadASCII(FILE *stream, int fixedState, bool forward);
	virtual void saveASCII(FILE *stream, bool forward);
/** Returns the Faktor sum_N gamma^N*/
	rlt_real getSemiMDPFaktor(rlt_real gamma);
};


///Class for storing Transitions
/***The transitions are all stored in a CTransitionList object. The transition list stores whether it is a forward or a backward list. 
The Transitions are stored in a ordered list, the list is ordered by end-states for forward lists and by start-states for backward lists. 
It provides functions for adding a specific transition, getting the transition given a feature index and determining whether a feature index is member of the transition list. 
If the list is a forward list the search criteria for get and isMember is obviously the end-State otherwise the start-state of the transitions.
*/
class CTransitionList : public std::list<CTransition *>
{
protected:
/// flag if forward or backward List
	bool forwardList;

public:
	CTransitionList(bool forwardList);

/// Returns wether the feature is Member of the Transition
	/** I.e. if the list is a forward list it returns wether the featureIndex can be reached, fi the list is a backward list
	it returns wether the state "featuerIndex" can reach the assigned state from the List.
	*/
	bool isMember(int featureIndex);
	bool isForwardList();
/// Adds a transition
/*Adds a Transition to the sorted list in the right position*/
	void addTransition(CTransition *transition);
/// Returns the Transition with the specified feature as end (forward list) resp. start (backward list) state 
	CTransition *getTransition(int featureIndex);

	CTransitionList::iterator getTransitionIterator(int featureIndex);
/// Clears the List and deletes all CTransition objects
	void clearAndDelete();
};

///Class for storing the Backward and Forward Transitions for a given state action pair.
/**Saves the forward and the backward Transitions in 2 different Transition Lists.
*/
class CStateActionTransitions
{
protected:
	CTransitionList *forwardList;
	CTransitionList *backwardList;

public:
	CStateActionTransitions();
	~CStateActionTransitions();

	
	CTransitionList* getForwardTransitions();
	CTransitionList* getBackwardTransitions();
};

/// Interface for all model classes
/**The models are only designed for feature and discrete states (i.e. discretized states). The class defines the functions for getting the Probabilities of a specific state transition 
(so a state-action-state, resp. feature-action-feature tuple is given). In Addition you can retrieve the forward and the backward transitions for a specific state action pair.
<p>
The interface provides 4 Functions, which the subclasses have to implement:
<ul>
<li> getPropability(int oldFeature, int action, int newFeature) has to return the propability P(s'|s,a) </li>
<li> getPropability(int oldFeature, int action, int duration, int newFeature) has to return the propability for the semi MDP case, i.e. P(s',N|s,a) </li>
<li> CTransitionList* getForwardTransitions(int action, int state) has to return a list of all Transitions containing the states which can be reach from the state executing the action. </li>
<li> CTransitionList* getBackwardTransitions(int action, int state) has to return a list of all Transitions containing the states which can reach the state given state executing the action. </li>
</ul>
*/

class CAbstractFeatureStochasticModel : public CActionObject
{
protected:
	unsigned int numFeatures;

public:
/// To create the model you have to provide the Models actions and the number of different states
	CAbstractFeatureStochasticModel(CActionSet *actions, int numStates);
	virtual ~CAbstractFeatureStochasticModel() {};

/// Calls the getPropability function with the action index as argument
	virtual rlt_real getPropability(int oldFeature, CAction *action, int newFeature);
/// Interface function
/** 
has to return the propability P(s'|s,a)
*/
	virtual rlt_real getPropability(int oldFeature, int action, int newFeature) = 0;
/// Interface function
	/**has to return the propability for the semi MDP case, i.e. P(s',N|s,a)
*/

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -