📄 hw97.tex

📁 Mitchell的《机器学习〉随书源码
💻 TEX
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
\documentstyle[12pt]{cmu-art}\def\BOX#1{\fbox{\tt #1}}\begin{document}\title{Homework 3: Neural networks and face images }\author{15-681: Machine Learning \\Tom Mitchell \\Carnegie Mellon University}\date{Due: October 16, 1997}\maketitle\section{Introduction}This assignment gives you an opportunity to apply neural network learning tothe problem of face recognition.  It is broken into two parts.  For the firstpart, which you must do alone, you will experiment with a neural networkprogram to train a sunglasses recognizer, a face recognizer, and a poserecognizer.  For the second part, which is optional and for extra credit only,you have the option of working with a group of 1 or 2 other students to studysome issue of your own choosing.  The face images you will use are faces ofstudents from earlier Machine Learning classes.You will not need to do significant amounts of coding for this assignment,and you should not let the size of this document scare you, but trainingyour networks will take time.  It is recommended that you read theassignment in its entirety first, and start early.\subsection{The face images}The image data can be found in {\tt /afs/cs/project/theo-8/faceimages/faces}.This directory contains 20 subdirectories, one for each person, named byuserid.  Each of these directories contains several different face images ofthe same person.You will be interested in the images with the following naming convention:{\tt <userid>\_<pose>\_<expression>\_<eyes>\_<scale>.pgm}\begin{itemize}\item {\tt <userid>} is the user id of the person in the image, and thisfield has 20 values: an2i, at33, boland, bpm, ch4f, cheyer, choon, danieln,glickman, karyadi, kawamura, kk49, megak, mitchell, night, phoebe, saavik,steffi, sz24, and tammo.\item {\tt <pose>} is the head position of the person, and this fieldhas 4 values: straight, left, right, up.\item {\tt <expression>} is the facial expression of the person, and thisfield has 4 values: neutral, happy, sad, angry.\item {\tt <eyes>} is the eye state of the person, and this field has2 values: open, sunglasses.\item {\tt <scale>} is the scale of the image, and this field has 3values: 1, 2, and 4.  1 indicates a full-resolution image ($128$ columns$\times$ $120$ rows); 2 indicates a half-resolution image ($64 \times 60$);4 indicates a quarter-resolution image ($32 \times 30$).  For thisassignment, you will be using the quarter-resolution images for experiments,to keep training time to a manageable level.\end{itemize}If you've been looking closely in the image directories, you may noticethat some images have a {\tt .bad} suffix rather than the {\tt .pgm}suffix.  As it turns out, 16 of the 640 images taken have glitches due toproblems with the camera setup; these are the {\tt .bad} images.Some people had more glitches than others, but everyone who got ``faced''should have at least 28 good face images (out of the 32 variationspossible, discounting scale).\subsection{Viewing the face images}To view the images, you can use the program {\tt xv}.  This is availableas {\tt /usr/local/bin/xv} on Andrew machines, and{\tt /usr/misc/.X11-others/bin/xv} or {\tt /usr/local/bin/xv}on CS machines.  {\tt xv} handlesa variety of image formats, including the PGM format in which ourface images are stored.  While we won't go into detail about {\tt xv}in this document, we will quickly describe the basics you need to knowto use {\tt xv}.To start {\tt xv}, just specify one or more images on the command line,like this:{\tt xv /afs/cs/project/theo-8/faceimages/faces/glickman/glickman\_straight\_happy\_open\_4.pgm}This will bring up an X window displaying the face.  Clicking the right buttonin the image window will toggle a control panel with a variety of buttons.The {\tt Dbl Size} button doubles the displayed size of the image every timeyou click on it.  This will be useful for viewing the quarter-resolutionimages, as you might imagine.You can also obtain pixel values by holding down the left button while movingthe pointer in the image window.  A text bar will be displayed, showing youthe image coordinates and brightness value where the pointer is located.To quit {\tt xv}, just click on the {\tt Quit} button or type {\tt q}in one of the {\tt xv} windows.\subsection{The neural network and image access code}We're supplying C code for a three-layer fully-connected feedforwardneural network which uses the backpropagation algorithm to tune its weights.To make life as easy as possible, we're also supplying you with an imagepackage for accessing the face images, as well as the top-levelprogram for training and testing, as a skeleton for you to modify.To help explore what the nets actually learn, you'll also find autility program for visualizing hidden-unit weights as images.The code is located in {\tt /afs/cs/project/theo-8/faceimages/code}.  Copy allof the files in this area to your own directory, and type {\tt make}. Note: take care to use {\tt cp *} instead of {\tt cp *.*} in order to ensure that you get the {\tt Makefile}. Whenthe compilation is done, you should have one executable program: {\ttfacetrain}.  Briefly, {\tt facetrain} takes lists of image files as input, anduses these as training and test sets for a neural network.  {\tt facetrain}can be used for training and/or recognition, and it also has the capability tosave networks to files.The code has been compiled and tested successfully on CS-side Alphas,DecStations, Sun SPARC-2s, and IBM PowerPC's, and Andrew-side DecStations andSun SPARC-5s.  If you wish to use the code on some other platform, feel free,but be aware that the code has only been tested on these platforms.  Details of the routines, explanations of the source files, and relatedinformation can be found in Section \ref{docs} of this handout.\section{The Assignment}\subsection{Part I (required)}Turn in a short write-up of your answers to the questions found in thefollowing sequence of initial experiments.\begin{enumerate}\item Issue the following command in your home directory to obtainthe training and test set data for this assignment:{\tt cp /afs/cs/project/theo-8/faceimages/trainset/*.list .}\item The code you have been given is currently set up to learn to recognizethe person with userid {\tt glickman}.  Modify this code to implement a``sunglasses'' recognizer; i.e., train a neural net which, when given an imageas input, indicates whether the face in the image is wearing sunglasses, ornot. Refer to the beginning of Section 3 for an overview of how tomake changes to this code.\item Train a network using the default learning parameter settings (learningrate 0.3, momentum 0.3) for 75 epochs, with the following command:{\tt facetrain -n shades.net -t straightrnd\_train.list -1 straightrnd\_test1.list}\newline {\tt -2 straightrnd\_test2.list -e 75}{\tt facetrain}'s arguments are described in Section \ref{RUNFACE},but a short description is in order here.  {\tt shades.net} is the name ofthe network file which will be saved when training is finished.{\tt straightrnd\_train.list}, {\tt straightrnd\_test1.list}, and{\tt straightrnd\_test2.list} are text files which specify the trainingset (70 examples) and two test sets (34 and 52 examples), respectively.This command creates and trains your net on a randomly chosen sample of 70 ofthe 156 ``straight'' images, and tests it on the remaining 34 and 52 randomlychosen images, respectively.  One way to think of this test strategy is thatroughly $\frac{1}{3}$ of the images ({\tt straightrnd\_test2.list}) have beenheld over for testing.  The remaining $\frac{2}{3}$ have been used for a trainand cross-validate strategy, in which $\frac{2}{3}$ of these are being usedfor as a training set ({\tt straightrnd\_train.list}) and $\frac{1}{3}$ arebeing used for the validation set to decide when to halt training ({\ttstraightrnd\_test1.list}).\item What code did you modify?  What was the maximum classification accuracy achieved on the training set?  How many epochs did it take to reach thislevel?  How about for the validation set?  The test set?  Note that if you runit again on the same system with the same parameters and input, you should getexactly the same results because, by default, the code uses the same seed tothe random number generator each time. You will need to read Section3.1.2 carefully in order to be able to interpret your experiments andanswer these questions.\item Now, implement a 1--of--20 face recognizer; i.e. implement a neural net that accepts an image as input, and outputs the userid of the person.  Todo this, you will need to implement a different output encoding (since youmust now be able to distinguish among 20 people).  (Hint: leave learning rateand momentum at 0.3, and use 20 hidden units).\item As before, train the network, this time for 100 epochs:{\tt facetrain -n face.net -t straighteven\_train.list -1 straighteven\_test1.list}\newline {\tt -2 straighteven\_test2.list -e 100}You might be wondering why you are only training on samples from a limiteddistribution (the ``straight'' images).  The essential reason is trainingtime.  If you have access to a very fast machine (anything slower than anAlpha or Sun4 may be too slow), then you are welcome to do these experimentson the entire set (replace {\tt straight} with {\tt all} in the command above.Otherwise, stick to the ``straight'' images.The difference between the {\tt straightrnd\_*.list} and the {\ttstraighteven\_*.list} sets is that while the former divides the images purelyrandomly among the training and test sets, the latter ensures a relativelyeven distribution of each individual's images over the sets.  Because we haveonly 7 or 8 ``straight'' images per individual, failure to distribute themevenly would result in testing our network the most on those faces on which itwas trained the least.\item Which parts of the code was it necessary to modify this time?How did you encode the outputs?  What was the maximum classification accuracyachieved on the training set?  How many epochs did it take to reach thislevel?  How about for the validation and test set?\item Now let's take a closer look at which images the net may have failedto classify:{\tt facetrain -n face.net -T -1 straighteven\_test1.list -2 straighteven\_test2.list}Do there seem to be any particular commonalities between the misclassifiedimages?\item Implement a pose recognizer; i.e. implement a neural net which,when given an image as input, indicates whether the person in the imageis looking straight ahead, up, to the left, or to the right.  Youwill also need to implement a different output encoding for this task.(Hint: leave learning rate and momentum at0.3, and use 6 hidden units).\item Train the network for 100 epochs, this time on samples drawnfrom all of the images:{\tt facetrain -n pose.net -t all\_train.list -1 all\_test1.list}\newline {\tt -2 all\_test2.list -e 100}Since the pose-recognizing network should have substantially fewer weights toupdate than the face-recognizing network, even those of you with slow machinescan get in on the fun of using all of the images.  In this case, 260 examplesare in the training set, 140 examples are in test1, and 193 are in test2.\item How did you encode your outputs this time?What was the maximum classification accuracy achieved on the training set?How many epochs did it take to reach this level?  How about for each test set?\item Now, try taking a look at how backpropagation tuned the weightsof the hidden units with respect to each pixel.  First type {\tt makehidtopgm} to compile the utility on your system.  Then, to visualize the
12 3 下一页
💿 文件大小 14668 K
👤 上传用户 xiaoexiao
📂 所属分类人工智能/神经网络
🏷️ 相关标签

#Mitchell #机器学习 #源码
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -