http:^^www.cs.cornell.edu^info^faculty^bsmith^query-by-humming.html

来自「This data set contains WWW-pages collect」· HTML 代码 · 共 671 行 · 第 1/3 页

HTML
671
字号
MIME-Version: 1.0
Server: CERN/3.0
Date: Sunday, 24-Nov-96 21:32:38 GMT
Content-Type: text/html
Content-Length: 27263
Last-Modified: Wednesday, 27-Sep-95 23:06:43 GMT

<HTML><HEAD><TITLE>Query By Humming -- Musical Information Retrieval in an Audio Database</TITLE><BODY><b>     ACM Multimedia 95 - Electronic Proceedings <br>        November 5-9, 1995 <br> San Francisco, California</b><HR><H1>Query By Humming -- Musical Information Retrieval in an Audio Database</H1><DL><DT><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><a href="http://www.cs.cornell.edu/Info/People/ghias/home.html">Asif Ghias</a></STRONG><DT>  <DD>Department of Computer Science  <DD>4130 Upson Hall  <DD>Cornell University  <DD>Ithaca, NY 14853 US  <DD>ghias@cs.cornell.edu<P><DT><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><a href="http://www.cs.cornell.edu/Info/People/logan/">Jonathan Logan</a></STRONG><DT>  <DD>Department of Computer Science  <DD>4130 Upson Hall  <DD>Cornell University  <DD>Ithaca, NY 14853 US  <DD>logan@ghs.com<P><DT>David Chamberlin</STRONG><DT>  <DD>School of Electrical Engineering  <DD>224 Phillips Hall  <DD>Cornell University  <DD>Ithaca, NY 14853 US  <DD>chamberlin@engr.sgi.com<P><DT><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><a href="http://www.cs.cornell.edu/Info/Faculty/Brian_Smith.html">Brian C. Smith</a></STRONG><DT>  <DD>Department of Computer Science  <DD>4130 Upson Hall  <DD>Cornell University  <DD>Ithaca, NY 14853 US  <DD>bsmith@cs.cornell.edu<P></DL><HR><H4> <!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><A HREF="http://www.cs.cornell.edu/Info/Faculty/bsmith/acmcopyright.html"> ACM Copyright Notice </A> </H4> <HR><H2>Abstract</H2>The emergence of audio and video data types in databases will requirenew information retrieval methods adapted to the specificcharacteristics and needs of these data types. An effective and naturalway of querying a musical audio database is by humming the tune of asong.  In this paper, a system for querying an audio database byhumming is described along with a scheme for representing the melodicinformation in a song as relative pitch changes.  Relevant difficultiesinvolved with tracking pitch are enumerated, along with the approach wefollowed, and the performance results of system indicating itseffectiveness are presented.<P><HR><H2><A NAME="toc">Table of Contents</A></H2><UL><LI><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><A HREF="#sec:Introduction">Introduction</A><LI><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><A HREF="#sec:System">System Architecture</A><LI><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><A HREF="#sec:Tracking">Tracking Pitch in Hummed Queries </A>    <UL>    <LI> <!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><A HREF="#subsec:Tracking">Tracking pitch</A>    </UL><LI><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><A HREF="#sec:Searching">Searching the database <LI><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><A HREF="#sec:Evaluation">Evaluation     <UL>    <LI><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><A HREF="#subsec:Robustness">Robustness </A>    <LI><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><A HREF="#subsec:Performance">Performance </A>    </UL><LI><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><A HREF="#sec:Future">Future directions and Related Work </A><LI><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><A HREF="#sec:References">References </A></UL><HR><H2><A NAME="sec:Introduction">Introduction</A></H2><P>Next generation databases will include image, audio and video data inaddition to traditional text and numerical data. These data types willrequire query methods that are more appropriate and natural to the typeof respective data. For instance, a natural way to query an imagedatabase is to retrieve images based on operations on images orsketches supplied as input. Similarly a natural way of querying anaudio database (of songs) is to hum the tune of a song.<P>Such a system would be useful in any multimedia database containingmusical data by providing an alternative and natural way of querying.One can also imagine a widespread use of such a system in commercialmusic industry, music radio and TV stations, music stores and even forone's personal use.<P>In this paper, we address the issue of how to specify a hummed queryand report on an efficient query execution implementation usingapproximate pattern matching.  Our approach hinges upon the observationthat melodic contour, defined as the sequence of relative differencesin pitch between successive notes, can be used to discriminate betweenmelodies. Handel <!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><A HREF=#ref:handel>[3]</A> indicates that melodiccontour is one of the most important methods that listeners use todetermine similarities between melodies.  We currently use an alphabetof three possible relationships between pitches (`U', `D', and `S'),representing the situations where a note is above, below or the same asthe previous note, which can be pitch-tracked quite robustly.  With thecurrent implementation of our system we are successfully able toretrieve most songs within 12 notes. Our database currently comprises acollection of all parts (melody and otherwise) from 183 songs,suggesting that three-way discrimination would be useful for finding aparticular song among a private music collection, but that higherresolutions will probably be necessary for larger databases.<P>This paper is organized as follows.  The first section describes thearchitecture of the current system. The second section describes whatpitch is, why it is important in representing the melodic contents ofsongs, several techniques for tracking pitch we tried and discarded,and the method we settled on.  Next we discuss pattern matching as itis used in the current implementation of the database.  The last twosections describe our evaluation of the current system and specify somefuture extensions that we are considering incorporating in the existingsystem.<H5><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><A HREF="#toc"><-- Table of Contents</A></H5><HR><H2><A NAME="sec:System">System Architecture</A></H2><P>There are three main components to the our system: a pitch-trackingmodule, a melody database, and a query engine. The architecture isillustrated in Figure 1.  Operation of the system is straight-forward.Queries are hummed into a microphone, digitized, and fed into apitch-tracking module.  The result, a contour representation of thehummed melody, is fed into the query engine, which produces a rankedlist of matching melodies.<P><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><IMG ALT="[IMG]" SRC="http://www.cs.cornell.edu/Info/Faculty/bsmith/query-by-humming/fig1.gif"> <BR><STRONG>Figure 1.</STRONG> System Architecture<P>The database of melodies was acquired by processing public domain MIDIsongs, and is stored as a flat-file database.  Pitch tracking isperformed in Matlab, chosen for its built-in audio processingcapabilities and the ease of testing a number of algorithms within it.Hummed queries may be recorded in a variety of formats, depending uponthe platform-specific audio input capabilities of Matlab. We haveexperimented with 16-bit, 44Khz WAV format on a Pentium system, and8-bit, 8Khz AU format on a Sun Sparcstation.  The query engine uses anapproximate pattern matching algorithm <!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><A HREF=#ref:asifa>[1]</A>,described in below, in order to tolerate humming errors.<H5><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><A HREF="#toc"><-- Table of Contents</A></H5><HR><H2><A NAME="sec:Tracking">Tracking Pitch in Hummed Queries </H2>This section describes how user input to the system (humming) isconverted into a sequence of relative pitch transitions. A note in theinput is classified in one of three ways: a note is either the same asthe previous note (S), higher than previous note (U), or lower than theprevious note (D). Thus, the input is converted into a string with athree letter alphabet (U,D,S). For example, the introductory themeBeethoven's 5th Symphony would be converted into the sequence: - S S DU S S D (the first note is ignored as it has no previous pitch).<P>To accomplish this conversion, a sequence of pitches in the melody mustbe isolated and tracked. This is not as straight-forward as it sounds,however, as there is still considerable controversy over exactly whatpitch is.  The general concept of pitch is clear: given a note, thepitch is the frequency that most closely matches what we hear.Performing this conversion in a computer can become troublesome becausesome intricacies of human hearing are still not understood.  Forinstance, if we play the 4th, 5th, and 6th harmonics of somefundamental frequency, we actually hear the fundamental frequency, notthe harmonics even though the fundamental frequency is not present.This phenomenon was first discovered by Schouten in some pioneerinvestigations carried out from 1938 to 1940. Schouten studied thepitch of periodic sound waves produced by an optical siren in which thefundamental of 200Hz was canceled completely.  The pitch of the complextone, however, was the same as that prior to the elimination of thefundamental <!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><A HREF=#ref:plomp>[12]</A><P>Since we were interested in tracking pitch in humming, we examinedmethods for automatically tracking pitch in a human voice. Before wecan estimate the pitch of an acoustic signal, we must first understandhow this signal is created, which requires forming a model of soundproduction at the source.  The vibrations of the vocal cords in voiced

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?