http:^^www.cs.cornell.edu^info^faculty^bsmith^query-by-humming.html
来自「This data set contains WWW-pages collect」· HTML 代码 · 共 671 行 · 第 1/3 页
HTML
671 行
<H2><A NAME="sec:Evaluation">Evaluation </H2><P>This section describes the results of an experimental evaluation ofthe system. Our evaluation tested the tolerance of the system withrespect to input errors, whether from mistakes in the user'shumming or from problems with the pitch-tracking.<H3><A NAME="subsec:Robustness">Robustness </H3><P>The effectiveness of this method is directly related to the accuracywith which pitches that are hummed can be tracked and the accuracy ofthe melodic information within the database. Under ideal circumstances,we can achieve close to 100%accuracy tracking humming, where idealcircumstances mean the user places a small amount of space between eachnote and hits each note strongly. For this purpose, humming short notesis encouraged. Even more ideal is for the user to aspirate the notes asmuch as possible, perhaps going so far as to voice a vowel, as in``haaa haaa haaa''. We have currently only experimented with malevoices.<P>The evaluation database currently contains a total of 183 songs. Eachsong was converted from public domain General MIDI sources. Melodiesfrom different musical genres were included, including both classicaland popular music. A few simple heuristics were used to cut down onthe amount of irrelevant information from the data, e.g. MIDI channel10 was ignored as this is reserved for percussion in the General MIDIstandard. However the database still contains a great deal ofinformation unrelated to the main theme of the melody. Even with thislimitation, we discovered that sequences of 10-12 pitch transitionswere sufficient to discriminate 90%of the songs.<P>As a consequence of using a fast approximate string matching algorithm,search keys can be matched with any portion of the melody, rather thanjust the beginning. As the size of the database grows larger, however,this may not prove to be an advantage.<H5><!WA43><!WA43><!WA43><!WA43><!WA43><!WA43><!WA43><!WA43><A HREF="#sec:Evaluation"><-- Evaluation</A></H5><H3><A NAME="subsec:Performance">Performance </H3><P>The version of the pitch-tracker using a modified form ofautocorrelation (<!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><A HREF=#note:2>**</A>) takes between 20 and 45seconds on a Sparc 10 workstation to process typical sequences of hummednotes. A brute-force search of the database unsurprisingly shows lineargrowth with the size of the database, but remains below 4 seconds for100 songs on a Sparc 2. Therefore the search time is currentlyeffectively limited by the efficiency of the pitch-tracker.<P>Contour representations for each song are currently stored in separatefiles, so opening and closing files becomes a significant overhead.Performance could be improved by packing all the songs into one file,or by using a database manager. We plan to modularize our code to makeit independent of any particular database schema.<H5><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><A HREF="#sec:Evaluation"><-- Evaluation</A></H5><H5><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><A HREF="#toc"><-- Table of Contents</A></H5><HR><H2><A NAME="sec:Future">Future directions and Related Work </A></H2><P>We plan to improve the performance and speed and robustness of thepitch-tracking algorithm by using a cubic-spline wavelet. The cubicspline wavelet peaks at discontinuities in the signal (i.e. the airimpulses). One of the most significant features of the wavelet analysisis that it can be computed in <i>O(n)</i> time. Currently, the pitchtracker is the slowest link in our system, so using wavelets for thispurpose has obvious advantages.<P>The pattern matching algorithm in its present form does notdiscriminate the various forms of pattern matching errors discussedearlier, but only accounts for them collectively. Some forms of errorsmay be more common than others depending upon the way people casuallyhum different tunes. For example drop-out errors reflected as droppednotes in tunes are more common than transposition or duplicationerrors. Tuning the key-search so that it is more tolerant to drop-outerrors, for example, may yield better results.<P>The melodic contours of the source songs are currently generatedautomatically from MIDI data, which is convenient but not optimal. Moreaccuracy and less redundant information could be obtained by enteringthe melodic themes for particular songs by hand. From a researchstandpoint, an interesting question is how to extract melodies fromcomplex audio signals<!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><A HREF=#ref:hawley>[4]</A>.<P>Finally, we would like to characterize the improvement gained byincreasing the resolution of the relative pitch differences byconsidering query alphabets of three, five and more possiblerelationships between adjacent pitches. Early experiments using analphabet of five relative pitch differences (same, higher, much higher,lower, much lower) verified that changes of this sort are promising.One drawback of introducing more resolution is that the user must besomewhat more accurate in the intervals they actually hum. We willexplore the various tradeoffs involved. An important issue is preciselywhere to draw the line between notes that are a little higher from theprevious note and those that are much higher.<P>Previous work on efficiently searching a database of melodies byhumming seems to be limited. Mike Hawley <!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><A HREF=#ref:hawley>[4]</A>briefly discusses a method of querying a collection of melodic themesby searching for exact matches of sequences of relative pitches inputby a MIDI keyboard. We have incorporated approximate pattern matching,implementing an actual audio database (of MIDI songs) and mostsignificantly by allowing queries to be hummed. Kageyama and Takashima<!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><A HREF=#ref:kageyama>[8]</A> published a paper on retrieving melodiesusing a hummed melody in a Japanese journal, but we were unable tolocate a translated version.<H5><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><A HREF="#toc"><-- Table of Contents</A></H5><HR><H2><A NAME="sec:Footnotes">Footnotes </A></H2><DL><DT> <A NAME=note:1>*</A><DD> The terms vocal folds and vocal chords are more or less used assynonyms in the literature.<DT> <A NAME=note:2>**</A><DD> The modifications include low-pass filtering andcenter-clipping (as described in Sondhi's paper<!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><A HREF=#ref:sondhi>[13]</A>) which helpeliminate the formant structure that generally causes difficulty forautocorrelation based pitch detectors.</DL><H5><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><A HREF="#toc"><-- Table of Contents</A></H5><HR><H2><A NAME="sec:References">References </A></H2><DL><DT><A NAME=ref:asifa><B>1</B></A><DD>Ricardo A. Baesa-Yates and Chris H. Perleberg.Fast and practical approximate string matching.<em>Combinatorial Pattern Matching, Third Annual Symposium</em>, pages 185-192, 1992.<DT><A NAME=ref:asifd><B>2</B></A><DD>Ricardo Baesa-Yates and G.H. Gonnet.Fast string matching with mismatches.<em>Information and Computation</em>, 1992.<DT><A NAME=ref:handel><B>3</B></A><DD>Stephen Handel.<em>Listening: An Introduction to the Perception of Auditory Events</em>.The MIT Press, 1989.<DT><A NAME=ref:hawley><B>4</B></A><DD>Michael Jerome Hawley.<em>Structure out of Sound</em>.PhD thesis, MIT, September 1993.<DT><A NAME=ref:hess><B>5</B></A><DD>Wolfgang Hess.<em>Pitch Determination of Speech Signals</em>.Springer-Verlag, Berlin Heidelberg, 1983.<DT><A NAME=ref:hirano><B>6</B></A><DD>M. Hirano.Structure and vibratory behavior of the vocal folds.In M. Sawashima and F.S. Cooper, editors, <em>Dynamic aspects ofspeech production</em>, pages 13-27. University of Tokyo Press, 1976.<DT><A NAME=ref:autocor><B>7</B></A><DD>L.R. Rabiner J.J. Dubnowski and R.W. Schafer.Real-time digital hardware pitch detector.<em>IEEE Transactions on Acoustics, Speech and Signal Processing</em>, ASSP-24(1):2-8, Feb 1976.<DT><A NAME=ref:kageyama><B>8</B></A><DD>T. Kageyama and Y. Takashima.A melody retrieval method with hummed melody (language: Japanese).<em>Transactions of the Institute of Electronics, Information and Communication Engineers D-II</em>, J77D-II(8):1543-1551, August 1994.<DT><A NAME=ref:asifb><B>9</B></A><DD>G. Landau and U. Vishkin.Efficient string matching with k mismatches.<em>Theoretical Computer Science</em>, 43:239-249, 1986.<DT><A NAME=ref:opp><B>10</B></A><DD>A. V. Oppenheim.A speech analysis-synthesis system based on homomorphic filtering.<em>J. Acoustical Society of America</em>, 45:458-465, February 1969.<DT><A NAME=ref:dtsp><B>11</B></A><DD>Alan V. Oppenheim and Ronald W. Schafer.<em>Discrete-time Signal Processing</em>.Prentice Hall, Englewood Cliffs, NJ, 1989.<DT><A NAME=ref:plomp><B>12</B></A><DD>R. Plomp.<em>Aspects of tone sensation</em>.Academic Press, London, 1976.<DT><A NAME=ref:sondhi><B>13</B></A><DD>M. M. Sondhi.New methods of pitch extraction.<em>IEEE Trans. Audio Electroacoust. (Special Issue on Speech Communication and Processing-Part II</em>, AU-16:262-266, June 1968.<DT><A NAME=ref:mlh><B>14</B></A><DD>James D. Wise, James R. Caprio, and Thomas W. Parks.Maximum likelihood pitch estimation.<em>IEEE Trans. Acoustics, Speech, Signal Processing</em>, 24(5):418-423, October 1976.</DL><H5><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><A HREF="#toc"><-- Table of Contents</A></H5><HR></BODY></HTML>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?