⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 http:^^www.cs.wisc.edu^~finton^ibfe.html

📁 This data set contains WWW-pages collected from computer science departments of various universities
💻 HTML
字号:
Date: Mon, 04 Nov 1996 23:45:11 GMTServer: NCSA/1.5Content-type: text/htmlLast-modified: Tue, 11 Apr 1995 19:11:01 GMTContent-length: 9909<html><head><title>What Is Importance-Based Feature Extraction?</title></head><body><a name="top"><h2>Outline</h2></a><ul><li><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><a href="#idea">The main idea</a><li><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><a href="#everyday">An example from real life</a><li><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><a href="#nnex">A neural net example, with Gaussian detector nodes</a><li><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><a href="#new">What's new about importance-based feature extraction?</a></ul><h2><a name="idea">The main idea</a></h2> Suppose that an autonomous agent has feature detectors which identifyits state, and that it uses reinforcement learning to learn to succeedin its environment.  Importance-based feature extraction aims to tunethe agent's feature detectors to be most sensitive to states where theagent's choice of action is critical.  If the agent's action from thisstate will have little bearing on the agent's future success, we saythat the state is <em>unimportant</em>; we define an<em>important</em> state as one having very different predictions ofreinforcement for taking different actions.  In terms of Q-learning,this means that an important state is one from which the differentactions have very different Q-values.<p> Important states are not necessarily the most frequently seen.Frequency-based feature extraction can be misled byfrequently-occurring "red herring" states, and may miss states whichrepresent "rare opportunities."  For example, if the agent frequentlyfinds itself in a particular state where almost any action is equallygood, frequency-based feature tuning would cluster detectors aroundthat state; however, those detectors will be of little use to theagent because its choice of action at this state has little bearing onits future reinforcement.  Or the agent may find itself in ararely-seen state where its choice of action is critical for futuresuccess; such a state is important, though infrequent.  Of course,this view is based on the assumption that the agent's task is to<em>act</em> in a way which optimizes its reinforcement, regardless ofits understanding of aspects of the world which have no bearing on itsstrategy selection.<p>Furthermore, important states need not be associated with the mostextreme reinforcement values.  For example, there may be a state fromwhich the agent will fail, no matter what action it takes.  Therefore,this state will be strongly associated with failure, and most likely,extremely negative reinforcement values.  But detecting this state isnot very helpful to the agent, because when it is in this state thereis nothing it can do to prevent failure.  The agent would be betteroff using detectors to identify the state from which it made somecritical mistake.  That would be a state from which a correct actionmight have led to success instead of failure.  The Q-values at thisstate might not be as great in absolute magnitude as those associatedwith a state from which the agent always fails, or a state from whichthe agent always succeeds.  But since some actions from this statelead to success and some to failure, it has a greater <em>span</em> ofQ-values associated with the actions; this makes it an importantstate.<p><h2><a name="everyday">An example from real life</a></h2>Cognitive economy is one of the principles we use to cope with dailylife.  People tend to use broad classifications whenever possible,because this allows them to apply information learned from a fewsamples to any of a large collection of objects; they avoid having tore-learn how to handle each new individual object.  But this kind ofstereotyping is only of use if the classifications relate to our goalsand the feedback we receive.  When our goals require us to respond tosome particular features of an individual, we need to learn torecognize those features which make this individual a special case.<p>For example, consider the concept of "snow."  My concept of snow hasto do with whether I can pack it into a snowball ("wet snow"), or not("fluffy snow").  Otherwise, snow is just something pretty that pilesup and requires me to get out the shovel.  But skiers talk about morevarieties of snow, and the distinctions are relevant to them becausedifferent kinds of snow will have different effects on their skiing.I may not remember all the varieties of snow which my skier friendshave spoken of; this is not necessarily a comment on my memory, but ismore likely due to the fact that I do not ski and I derive no benefitfrom knowing the distinctions.  Supposedly, Eskimos have words formany different kinds of snow.  But to someone who lived their wholelife close to the equator, snow might simply be "snow," some form ofwhite precipitation which they've never seen.  In each case, we arealloting cognitive resources for those distinctions which relate toour goals.  This is an example of importance-based feature extraction,since we are "tuning" our "feature detectors" to respond to thosefeatures which make a difference in the things we have to do, andotherwise falling back on broad stereotypes.<p><h2><a name="nnex">A neural net example, with Gaussian detector nodes</a></h2>A common architecture for a reinforcement-learning agent is afeed-forward connectionist network which has inputs, a layer of hiddennodes, and an output layer of nodes which control action selection.We can think of the hidden nodes as feature detectors, which provide adistributed representation of the current system state.Importance-based feature extraction attempts to tune the featuredetectors according to their importance in selecting the agent'sactions; a detector is considered important if the links from it tothe outputs have very different weights.  If the weights were all thesame, that detector would contribute the same impulse to each of thecompeting output nodes, and thus would have no influence on theagent's choice of action.  Since the link weights are used tocalculate Q-values, a detector with a sizable spread of weights on itsoutgoing links represents a state from which different actions havevery different expectations of reinforcement.  In other words, thisdetector is valuable because it detects a state from which the agent'schoice of action will strongly affect the agent's liklihood ofsuccess.<p>In a system having Gaussian detector nodes, importance-based featureextraction tunes their centers in order to maximize each detector'sestimate of its importance.  I have found it convenient to definethe importance of detector <em>i</em> as the variance of the weights onits links to the output nodes;  however, alternative definitions arecertainly possible.<p><h2><a name="new">What's new about importance-based feature extraction?</a></h2>In reinforcement learning problems the sparseness of the feedbackincreases the difficulty of feature extraction.  Importance-basedfeature extraction addresses this problem by relying on bottom-upfeature extraction.  Other examples of this general approach includethe use of bottom-up clustering methods such as Kohonen'sSelf-Organized Map, Chapman & Kaelbling's use of a "relevance"criterion, and statistical approaches built around principal componentanalysis.<p>Bottom-up clustering methods are based on the frequency of states.Kohonen's Self-Organized Map and related clustering methods attempt todistribute the feature detectors according to the probability densityfunction of the states seen by the agent.  In contrast,importance-based feature extraction recognizes that, to an autonomousagent, the important states are not necessarily the most frequent, asnoted above.  What the agent needs is not to detect commonly-seenstates, but <em>important</em> states---states which matter in termsof the action decisions the agent must make.  The Self-Organized Mapwas designed for a different type of problem, that of<em>modelling</em> some feature domain and producing a brain-likemapping from inputs to common features.  Here, there is noreinforcement, and the topological structure of the feature space iswhat is important.  But in a <em>control</em> task, thefrequency-based approach is blind toward the reinforcement, and thereinforcement is what makes some states more important than others tothe agent.<p>Chapman & Kaelbling's concept of "relevance" biases feature extractiontoward the detection of features which are associated with extremereinforcement values.  As discussed above, extreme reinforcementvalues do not necessarily indicate an important state, from which theagent's choice of action really matters.  Relevance tuning producesfeature detectors which are relevant to predicting the agent's futuresuccess, but which may not be relevant to choosing its next action.When the agent detects a feature, if all its actions will produceoutcomes which are equally good, that feature doesn't make anydifference in determining its strategy, even if the feature isrelevant to predicting its future success.  Relevance tuning cannottell that such features are unimportant. <p>Rarely are developments in neural networks unanticipated by the fieldof statistics, although researchers may not recognize the commonthreads at first glance.  But I am not aware of a concept likeimportance-based feature extraction in statistics.  Principalcomponent analysis can very efficiently give the structure of thefeature space, but it is blind toward the reinforcement seen by theagent.  Therefore, like the other approaches, it cannot guide featureextraction according to the reinforcements the agent receives forvarious state/action combinations under its current performance task.<p><h2>Return to:</h2><ul><li><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><a href="#top">Top of this document<a><li><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><a href="http://www.cs.wisc.edu/~finton/rlpage.html">DJF's Reinforcement Learning Page<a></ul><hr><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><a href="http://www.cs.wisc.edu/~finton/finton.html"><em>finton@cs.wisc.edu</em></a>, November 30, 1994.</body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -