📄 http:^^www.cs.duke.edu^~mlittman^courses^cps370^
字号:
Date: Wed, 20 Nov 1996 22:15:54 GMT
Server: NCSA/1.5.1
Last-modified: Thu, 07 Nov 1996 22:14:29 GMT
Content-type: text/html
Content-length: 21116
<head><TITLE>CPS 370 - FALL 1996 </TITLE><LINK REV="made" HREF="mailto:mlittman@cs.duke.edu"></head><body> <p><center><TABLE BORDER=10 CELLSPACING=5 > <TR> <TD align=center colspan=4> <!WA0><img src="http://www.cs.duke.edu/~mlittman/courses/cps370/cps-t.gif" alt="CPS370"> </TD> </TR> <TR></TR> <TR> <TD></TD> <TD ALIGN=center> <H1>Fall 1996</H1> </TD> <TD align=center></TD> <TD ALIGN=center> <H1>Planning Under Uncertainty</H1> </TD> </TR></TABLE></center><p><center>[ <!WA1><A href="#background"> Background </A> |<!WA2><A href="#grading"> Grading </A> |<!WA3><A href="#outline"> Outline </A> |<!WA4><A href="#projects"> Projects </A> ]</center> <p><center><!WA5><a href="#now">Jump to current place in outline...</a></center><hr><p><!WA6><a href="http://www.cs.duke.edu/~mlittman/courses/cps370/syllabus">Evolving Syllabus...</a> <p><a name="background"><h2> Background </h2>This seminar will introduce students to an exciting area of researchthat draws upon results from operations research (Markov decisionprocesses), machine learning (reinforcement learning), and traditionalAI (planning) to attack problems with great scientific and commercialpotential. We will read and discuss a handful of recent papers thatwill give students an appreciation for the state of the art. Studentswill undertake a group research project to solidify theirunderstanding of the basic concepts and perhaps to contribute to thefield. <p>I will make a number of presentations to introduce students to thebackground material necessary to read the research papers---onlyundergraduate-level computer science knowledge and basic probabilitytheory and calculus will be assumed. I think there are usefulcontributions to be made by researchers in AI, algorithms, complexity,numerical methods, and systems, and I think people in all these areaswould find some useful information in the seminar. Everyone's welcome!<h3> Instructor </h3><!WA7><A href="http://www.cs.duke.edu/~mlittman">Michael L. Littman</a><ul><li>Office: D209 LSRC<li>Phone: 660-6537<li>Email: <!WA8><ahref="mailto:mlittman@cs.duke.edu">mlittman@cs.duke.edu</a><li>Office hours: TBA</ul><h3> Description </h3>Research in planning, making a sequence of choices to achieve somegoal, has been a mainstay of artificial intelligence (AI) for manyyears. Traditionally, the decision-making models that have beenstudied admit no uncertainty whatsoever---every aspect of the worldthat is relevant to the generation and execution of a plan is known inadvance. In contrast, work in operations research (OR) has focussedon the uncertainty of actions but uses an impoverished representationfor specifying planning problems. <p>The purpose of this seminar is to explore some middle ground betweenthese two well-studied extremes with the hope of understanding how wemight create systems that can reason efficiently about plans incomplex, uncertain worlds. We will review the foundational resultsfrom AI and OR and read a series of papers written over the last fewyears that have begun to bridge the gap.<h3>Philosophy</h3>There are basically two or three papers on different approaches to thesame basic problem that I'd like people to read and understand. Thesepapers are quite recent and represent active areas of research thathave been maturing quite quickly over the last few years. As aresult, to get a deep appreciation for this work, we will need to reada number of papers that introduce the necessary background. <p>My approach to organizing the seminar will be to try to keep theassigned reading to a minimum and to ask students to concentrate onunderstanding the state of the field and on identifying the importantopen research questions.<h3>Prerequisites</h3>The seminar should be accessible to any advanced computer sciencestudent. My goal is to introduce critical background material as theneed arises. <p>Nonetheless, we need some common ground to begin. I will assume thatstudents are familiar with programming (any language), algorithmanalysis (big O notation), calculus (derivates of multivariatefunctions), and probability theory (conditional probabilities).<h3>Postrequisites</h3>In addition to exploring the question of how to create plans that areeffective in uncertain environments, there are a number of otherimportant topics that students will learn about in this seminar:students will be exposed to Markov decision processes, dynamicprogramming, linear programming, temporal difference learning,supervised function approximation, gradient descent and neuralnetworks, STRIPS rules, and partial order planning.<hr><h2> <A NAME="grading"> Grading </h2>The grading policy is designed to stimulate students to think aboutsome of the important issues in this area. Class grade will be basedon:<UL><LI> class participation (25%),<LI> short homework assignments (25%), and<LI> a final project (50%).</UL><hr><A NAME="outline"><h2>Outline</h2><h3>Organization Meeting</h3>On Thursday, September 5th (D344, 2:30-3:00), we'll get together todiscuss the best time to schedule the class. If you can't make themeeting, please send me email (<!WA9><ahref="mailto:mlittman@cs.duke.edu">mlittman@cs.duke.edu</a>).<h3>Introduction</h3>What is planning? What is uncertainty? What are some applications ofplanning under uncertainty? I'll lay out the space of issues anddescribe the part of the space we'll explore. <p><!WA10><img src="http://www.cs.duke.edu/~mlittman/courses/cps370/book.gif"> Michael Lederman Littman. Algorithms forSequential Decision Making. Ph.D. dissertation and Technical ReportCS-96-09, Brown University, Department of Computer Science,Providence, RI, March 1996. Chapter 1: Introduction. (<!WA11><ahref="http://www.cs.duke.edu/~mlittman/courses/cps370/papers/littman-chap1.ps">local postscript</a>, <!WA12><ahref="http://www.cs.duke.edu/~mlittman/courses/cps370/papers/littman-bib.ps">local bibliography postscript</a>)<h3>Markov Decision Processes</h3>I'll introduce the MDP model, which is a formal specification for theparticular problem we will be examining. I'll describe thefundamental concepts (states, actions, transitions, rewards,discounting, horizons), results (existence and dominance of optimalvalue function, optimal greedy policies), and the algorithms (valueiteration, policy iteration, modified policy iteration, linearprogramming). <p>In a sense, these algorithms completely solve the problem of planningunder uncertainty. The rest of the seminar is concerned with solvingMDPs more efficiently by exploiting additional structure present insome instances. <p><!WA13><img src="http://www.cs.duke.edu/~mlittman/courses/cps370/book.gif"> Michael L. Littman, Thomas L. Dean, and Leslie PackKaelbling. On the complexity of solving Markov decision problems. In<em>Proceedings of the Eleventh Annual Conference on Uncertainty inArtificial Intelligence (UAI--95)</em>, Montreal, Quebec, Canada,1995. (<!WA14><ahref="http://www.cs.duke.edu/~mlittman/papers/mdp-complexity.ps">postscript</a>)<p>Homework: Represent a complex domain as an MDP. <p>Slides: <!WA15><a href="http://www.cs.duke.edu/~mlittman/courses/cps370/slides/lect2.fm.ps">postscript</a> <p><h3>Accelerating Solutions to MDPs</h3>One class of algorithms for solving MDPs more quickly restrictsvalue-iteration updates to states that are likely to benefit fromadditional computational resources. <p>Prioritized Sweeping uses a heuristic for measuring when updating thevalue of a state is likely to be important for computing anapproximately optimal solution quickly. <p><!WA16><img src="http://www.cs.duke.edu/~mlittman/courses/cps370/book.gif"> Andrew W. Moore and Christopher G. Atkeson. Prioritizedsweeping: Reinforcement learning with less data and less realtime. <em>Machine Learning</em>, 13:103, 1993. (<!WA17><ahref="http://www.cs.cmu.edu/afs/cs/project/reinforcement/papers/moore.prisweep.ps.Z">compressedpostscript</a>) <p>Real-time dynamic programming attempts to find a good approximatepolicy quickly by focussing updates on states that are likely to bevisited. <p><!WA18><img src="http://www.cs.duke.edu/~mlittman/courses/cps370/book.gif"> Andrew G. Barto, S. J. Bradtke and Satinder P. Singh.Learning to act using real-time dynamic programming. <em>ArtificialIntelligence</em>, 72(1):81--138, 1995. (<!WA19><ahref="ftp://ftp.cs.umass.edu/pub/anw/pub/barto/realtime-dp.ps.Z">compressedpostscript</a>) <p>Another approach is to explicitly produce a good partial policy byidentifying states that are likely to be visited and solving a smallerMDP. <p><!WA20><img src="http://www.cs.duke.edu/~mlittman/courses/cps370/book.gif"> Jonathan Tash and Stuart Russell. Control strategies for astochastic planner. In <em>Proceedings of the 12th NationalConference on Artificial Intelligence</em>, 1079--1085, 1994. (<!WA21><ahref="http://HTTP.CS.Berkeley.EDU/~russell/papers/aaai94-mdp.ps">postscript</a>)<p>Slides: <!WA22><a href="http://www.cs.duke.edu/~mlittman/courses/cps370/slides/lect3.fm.ps">postscript</a> <p><h3>Value Function Approximation</h3>The above approaches represent states as being completely unrelatedobjects. In many domains, states can be described in such a way thatsimilar states (from a policy or value standpoint) have similarrepresentations. For example, any attempt to create a transitionfunction for the game of backgammon would likely make use of aboard-based representation of states. <p>This insight can be exploited by exchanging the classical table-basedmethod for representating value functions to one that uses a functionapproximator (for example, a neural net) to map state descriptionvectors to values. A wildly successful example of this is TD-Gammon;this work makes use of several important background ideas includinggradient descent and temporal difference learning that we will need tolook at as well. <p><!WA23><img src="http://www.cs.duke.edu/~mlittman/courses/cps370/book.gif"> Richard S. Sutton. Learning to predict by the method oftemporal differences. <em>Machine Learning</em>, 3(1):9-44, 1988. (<!WA24><ahref="ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-88.ps">postscript</a>)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -