📄 using a particle filter for gesture recognition.htm
字号:
src="Using a Particle Filter for Gesture Recognition.files/img18.gif" width=15
align=middle> refers to the position in the model for the right hand's
trajectory).
<P>In summary, there are 7 parameters that describe each <EM>state</EM>.
<P>
<H4><A name=SECTION00021000000000000000>Initialization</A></H4>
<P>The sample set is initialized with <I>N</I> samples distributed over possible
starting states and each assigned a weight of <IMG height=27
alt=tex2html_wrap_inline184
src="Using a Particle Filter for Gesture Recognition.files/img19.gif" width=12
align=middle> . Specifically, the initial parameters are picked uniformly
according to: <BR><BR><IMG height=24 alt=tex2html_wrap_inline186
src="Using a Particle Filter for Gesture Recognition.files/img20.gif" width=91
align=middle> <BR><IMG height=35 alt=tex2html_wrap_inline188
src="Using a Particle Filter for Gesture Recognition.files/img21.gif" width=74
align=middle> , where <IMG height=20 alt=tex2html_wrap_inline190
src="Using a Particle Filter for Gesture Recognition.files/img22.gif" width=22
align=middle> [0,1] <BR><IMG height=30 alt=tex2html_wrap_inline192
src="Using a Particle Filter for Gesture Recognition.files/img23.gif" width=124
align=middle> <BR><IMG height=30 alt=tex2html_wrap_inline194
src="Using a Particle Filter for Gesture Recognition.files/img24.gif" width=119
align=middle> <BR><BR>In this application, I set the parameters as follows:
<BR><IMG height=14 alt=tex2html_wrap_inline196
src="Using a Particle Filter for Gesture Recognition.files/img25.gif" width=35
align=middle> = 2, since there are two models <BR><IMG height=22
alt=tex2html_wrap_inline198
src="Using a Particle Filter for Gesture Recognition.files/img26.gif" width=157
align=middle> for 50 percent scaling <BR><IMG height=22
alt=tex2html_wrap_inline200
src="Using a Particle Filter for Gesture Recognition.files/img27.gif" width=153
align=middle> for 50 percent scaling <BR>
<P>
<H4><A name=SECTION00022000000000000000>Prediction</A></H4>
<P>In the prediction step, each parameter of a randomly sampled <IMG height=14
alt=tex2html_wrap_inline136
src="Using a Particle Filter for Gesture Recognition.files/img2.gif" width=11
align=middle> is used to determine <IMG height=15 alt=tex2html_wrap_inline138
src="Using a Particle Filter for Gesture Recognition.files/img3.gif" width=27
align=middle> based on the parameters of that particular <IMG height=14
alt=tex2html_wrap_inline136
src="Using a Particle Filter for Gesture Recognition.files/img2.gif" width=11
align=middle> . Each old state, <IMG height=14 alt=tex2html_wrap_inline136
src="Using a Particle Filter for Gesture Recognition.files/img2.gif" width=11
align=middle> , is randomly chosen from the sample set, based on the weight of
each sample. That is, the weight of each sample determines the probability of
its being chosen. This is done efficiently by creating a cumulative probability
table, choosing a uniform random number on [0,1], and then using binary search
to pull out a sample (see Isard and Blake for details).
<P>The following equations are used to choose the new state <BR><BR><IMG
height=15 alt=tex2html_wrap_inline210
src="Using a Particle Filter for Gesture Recognition.files/img28.gif" width=69
align=middle> <BR><IMG height=30 alt=tex2html_wrap_inline212
src="Using a Particle Filter for Gesture Recognition.files/img29.gif" width=169
align=middle> <BR><IMG height=30 alt=tex2html_wrap_inline214
src="Using a Particle Filter for Gesture Recognition.files/img30.gif" width=137
align=middle> <BR><IMG height=30 alt=tex2html_wrap_inline216
src="Using a Particle Filter for Gesture Recognition.files/img31.gif" width=132
align=middle> <BR><BR>where <IMG height=23 alt=tex2html_wrap_inline218
src="Using a Particle Filter for Gesture Recognition.files/img32.gif" width=44
align=middle> refers to a number chosen randomly according to the normal
distribution with standard deviation <IMG height=14 alt=tex2html_wrap_inline220
src="Using a Particle Filter for Gesture Recognition.files/img33.gif" width=13
align=middle> . This adds an element of uncertainty to each prediction, which
keeps the sample set diffuse enough to deal with noisy data. In this application
I set: <IMG height=21 alt=tex2html_wrap_inline222
src="Using a Particle Filter for Gesture Recognition.files/img34.gif" width=131
align=middle> .
<P>For a given drawn sample, predictions are generated until all of the
parameters are within the accepted range. If, after, a set number of attempts it
is still impossible to generate a valid prediction, a new sample is created
according to the initialization procedure above. In addition, 10 percent of all
samples in the new sample set are initialized randomly as in the initialization
step above (with the exception that rather than having the phase parameter
biased towards zero, it is biased towards the number of observations that have
been made thus far). This ensures that local maxima can't completely take over
the curve; new hypothesese are always given a chance to dominate.
<P>
<H4><A name=SECTION00023000000000000000>Updating</A></H4>
<P>After the Prediction step above, there exists a new set of <I>N</I> predicted
samples which need to be assigned weights. The weight of each sample is a
measure of its likelihood given the observed data <IMG height=23
alt=tex2html_wrap_inline226
src="Using a Particle Filter for Gesture Recognition.files/img35.gif" width=128
align=middle> . I define <IMG height=26 alt=tex2html_wrap_inline228
src="Using a Particle Filter for Gesture Recognition.files/img36.gif" width=163
align=middle> as a sequence of observations for the <I>i</I>th coefficient over
time; specifically, let <IMG height=22 alt=tex2html_wrap_inline232
src="Using a Particle Filter for Gesture Recognition.files/img37.gif" width=131
align=middle> be the sequence of observations of the horizontal velocity of the
left hand, the vertical velocity of the left hand, the horizontal velocity of
the right hand, and the vertical velocity of the right hand respectively.
<P>Extending Black and Jepson, I then calculate the weight by the following
equation:
<P><IMG height=48 alt=equation84
src="Using a Particle Filter for Gesture Recognition.files/img38.gif" width=500
align=bottom>
<P><BR>where <BR>
<P><IMG height=48 alt=displaymath234
src="Using a Particle Filter for Gesture Recognition.files/img39.gif" width=450
align=bottom>
<P>and where <I>w</I> is the size of a temporal window that spans back in time
(here, I take <I>w</I> = 10). Note that <IMG height=11
alt=tex2html_wrap_inline174
src="Using a Particle Filter for Gesture Recognition.files/img15.gif" width=15
align=bottom> , <IMG height=24 alt=tex2html_wrap_inline172
src="Using a Particle Filter for Gesture Recognition.files/img14.gif" width=14
align=middle> , and <IMG height=22 alt=tex2html_wrap_inline176
src="Using a Particle Filter for Gesture Recognition.files/img16.gif" width=14
align=middle> refer to the appropriate parameters of the model for the blob in
question and that <IMG height=37 alt=tex2html_wrap_inline246
src="Using a Particle Filter for Gesture Recognition.files/img40.gif" width=95
align=middle> refers to the value given to the <I>i</I>th coefficient of the
model <IMG height=14 alt=tex2html_wrap_inline160
src="Using a Particle Filter for Gesture Recognition.files/img9.gif" width=8
align=middle> interpolated at time <IMG height=24 alt=tex2html_wrap_inline252
src="Using a Particle Filter for Gesture Recognition.files/img41.gif" width=59
align=middle> and scaled by <IMG height=11 alt=tex2html_wrap_inline174
src="Using a Particle Filter for Gesture Recognition.files/img15.gif" width=15
align=bottom> .
<P>
<H4><A name=SECTION00024000000000000000>Classification</A></H4>
<P>With this algorithm in place, all that remains is actually classifying the
video sequence as one of the two signs. Since the whole idea of Condensation is
that the most likely hypothesis will dominate by the end, I chose to use the
criterion of which model was deemed most likely at the end of the video sequence
to determine the class of the entire video sequence. Determining the probability
assigned to each model is a simple matter of summing the weights of each sample
in the sample set at a given moment whose <EM>state</EM> refers to the model in
question. The following graphs plot the likelihood of each model over time for
an instance of each sign (the first is a sign that is classified as model 1, the
second a sign that is classified as model 2): <BR><IMG alt=tex2html_wrap256
src="Using a Particle Filter for Gesture Recognition.files/img42.gif"
align=bottom> <IMG alt=tex2html_wrap258
src="Using a Particle Filter for Gesture Recognition.files/img43.gif"
align=bottom> <BR>Using this criterion, my system correctly classified 80
percent of the signs it was trained on and 75 percent of novel signs.
<H3>Condensation Implementation</H3>The Condensation algorithm was coded in C++
and ran much in much faster than real time on the sweet hall machines (excluding
the image preprocessing). The complete source code is here:
<UL>
<LI>Condensation code
<UL>
<LI><A
href="http://www.mit.edu/~alexgru/vision/condensation_source/sample.cpp">sample.cpp</A>
<LI><A
href="http://www.mit.edu/~alexgru/vision/condensation_source/sample.h">sample.h</A>
<LI><A
href="http://www.mit.edu/~alexgru/vision/condensation_source/model.cpp">model.cpp</A>
<LI><A
href="http://www.mit.edu/~alexgru/vision/condensation_source/model.h">model.h</A>
<LI><A
href="http://www.mit.edu/~alexgru/vision/condensation_source/main.cpp">main.cpp</A>
</LI></UL>
<LI>Utilities
<UL>
<LI><A
href="http://www.mit.edu/~alexgru/vision/condensation_source/random_utils.cpp">random_utils.cpp</A>
(create random gaussian numbers, etc)
<LI><A
href="http://www.mit.edu/~alexgru/vision/condensation_source/random_utils.h">random_utils.h</A>
<LI><A
href="http://www.mit.edu/~alexgru/vision/condensation_source/utility.h">utility.h</A>
<LI>scanner.cpp String tokenizer
<LI><A
href="http://www.mit.edu/~alexgru/vision/condensation_source/scanner.h">scanner.h</A>
</LI></UL></LI></UL>
<H1>References </H1>For more background on gesture recognition, see my
literature review here: <A
href="http://www.mit.edu/~alexgru/vision/review.ps">ps</A> or <A
href="http://www.mit.edu/~alexgru/vision/review.pdf">pdf</A><BR><BR>
<DT><A name=condensationGesture><STRONG>1</STRONG></A>
<DD>Michael J. Black and Allan D. Jepson. <A
href="http://citeseer.nj.nec.com/black98probabilistic.html">A probabilistic
framework for matching temporal trajectories: Condensation-based recognition of
gestures and expressions.</A> In <EM>Proceedings 5th European Conference
Computer Vision</EM>, volume 1, pages 909-924, 1998.
<P></P>
<DT><A name=condensation><STRONG>2</STRONG></A>
<DD>Michael Isard and Andrew Blake.<A
href="http://citeseer.nj.nec.com/isard96contour.html"> Contour tracking by
stochastic propagation of conditional density.</A> In <EM>Proceedings European
Conference on Computer Vision</EM>, pages 343-356, 1996.
<P></P>
<DT><A name=condensationSwitching><STRONG>3</STRONG></A>
<DD>Michael Isard and Andrew Blake. <A
href="http://citeseer.nj.nec.com/isard98mixedstate.html">A mixed-state
condensation tracker with automatic model-switching.</A> In <EM>Proceedings 6th
Internal Conference Computer Vision</EM>, pages 107-112, 1998.
<P></P>
<DT><A name=motionTrajectories><STRONG>4</STRONG></A>
<DD>Ming-Hsuan Yang and Narendra Ahuja. <A
href="http://citeseer.nj.nec.com/yang00recognizing.html">Recognizing hand
gesture using motion trajectories.</A> In <EM>IEEE CS Conference on Computer
Vision and Pattern Recognition</EM>, volume 1, pages 466-472, June 1999.
<DL></DL><BR>
<HR>
<HR>
<ADDRESS><A href="mailto:alexgru@stanford.edu">Alexander Houston
Gruenstein</A></ADDRESS><!-- Created: Fri Feb 22 13:17:58 PST 2002 --><!-- hhmts start -->Last
modified: Tue May 28 01:32:21 PDT 2002 <!-- hhmts end --></DD></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -