📄 paper.txt
字号:
INTRODUCING MEGAHALJason L. Hutchens Michael D. AlderDept. of E&E Engineering Dept. of MathematicsUniversity of Western Australia University of Western AustraliaNedlands W.A. 6907, Australia Nedlands W.A. 6907, Australiahutch@ciips.ee.uwa.edu.au mike@maths.uwa.edu.auAbstractConversation simulators are computer programs which give the appearance ofconversing with a user in natural language. Alan Turing devised a simpletest in order to decide whether such programs are intelligent. In 1991,the Cambridge Centre for Behavioural Studies held the first formalinstantiation of the Turing Test. In this incarnation the test was known asthe Loebner contest, as Dr. Hugh Loebner pledged a $100,000 grand prize forthe first computer program to pass the test. In this paper we give a briefbackground to the contest, before describing in detail the workings ofMegaHAL, the primary author's entry to the 1998 Loebner contest.1. IntroductionAlan Turing was a brilliant British mathematician who played a great role inthe development of the computer. The imitation game, nowadays known as theTuring test, was devised by Turing as a method for deciding whether or not acomputer program is intelligent.The Turing test takes place between an interrogator and two subjects. Theinterrogator communicates with these subjects via a computer terminal, andmust decide which is a human being and which is a computer program. Thehuman being helps the interrogator to make the correct identification, whilethe computer program attempts to trick the interrogator into making thewrong identification. If the latter case occurs, the computer program issaid to be exhibiting intelligence (Turing, 1992).One of the great advantages of the Turing test is that it allows theinterrogator to evaluate almost all of the evidence that we would assume toconstitute thinking (Moor, 1976). For instance, the interrogator can posehypothetical situations in order to ask the subjects how they would react.Alan Turing died in 1954, a decade before conversation simulators such asELIZA emerged. It is indeed unfortunate that he did not live to witnesshis test being performed. One cannot help but think that he would havebeen disappointed.2. The Loebner ContestApart from a few limited tests performed by programmers of conversationsimulators (Colby, 1981), the Turing test was not formally conducted until1995. Although the inaugural Loebner contest, held in 1991, was touted asthe first formal instantiation of the Turing test, it was not until 1995that it truly satisfied Turing's original specifications (Hutchens, 1996).The first Loebner contest was held on the 8th of November 1991 in Boston'sComputer Museum. Because this was a contest rather than an experiment, sixcomputer programs were accepted as subjects. Four human subjects and tenjudges were selected from respondents to a newspaper advertisement; none ofthem had any special expertise in Computer Science (Epstein, 1992).The original Turing test involved a binary decision between two subjects bya single judge. With ten subjects and ten judges, the situation wassomewhat more complex. After months of deliberation, the prize committeedeveloped a suitable scoring mechanism. Each judge was required to rankthe subjects from least human-like to most human-like, and to mark thepoint at which they believed the subjects switched from computer programsto human beings.If the median rank of a computer program exceeded the median rank of atleast one of the human subjects, then that computer program would win thegrand prize of $100,000[1]. If there was no grand prize winner, thecomputer program with the highest median rank would win the contest with aprize of $2000.3. Conversation SimulatorsSince its inception, the Loebner contest has primarily attracted hobbyistentries which simulate conversation using template matching; a methodemployed by Joseph Weizenbaum in his ELIZA conversation simulator, developedat MIT between 1964 and 1966. Put simply, these programs look for certainpatterns of words in the user's input, and reply with a pre-determinedoutput, which may contain blanks to be filled in with details such as theuser's name.Such programs are effective because they exploit the fact that human beingstend to read much more meaning into what is said than is actually there; weare fooled into reading structure into chaos, and we interpret non-sequituras whimsical conversation (Shieber, 1994).Weizenbaum was shocked at the reaction to ELIZA. He noticed three mainphenomenon which disturbed him greatly (Weizenbaum, 1976):1. A number of practising psychiatrists believed that ELIZA could grow into an almost completely automatic form of psychotherapy.2. Users very quickly became emotionally involved---Weizenbaum's secretary demanded to be left alone with the program, for example.3. Some people believed that the program demonstrated a general solution to the problem of computer understanding of natural language.Over three decades have passed since ELIZA was created. Computers havebecome significantly more powerful, while storage space and memory sizehave increased exponentially. Yet, at least as far as the entrants of theLoebner contest go, the capabilities of conversation simulators haveremained exactly where they were thirty years ago. Indeed, judges in the1991 contest said that they felt let down after talking to the computerentrants, as they had had their expectations raised when using ELIZA duringthe selection process.4. MegaHALIn 1996 the primary author entered the Loebner contest with an ELIZAvariant named HeX, which was written during his spare time in under amonth. Apart from the lure of the prize money, a major motivation for theentry was a desire to illustrate the shortcomings of the contest (Hutchens,1996). A considerably more powerful program, SEPO, was entered thefollowing year, where it was placed second. We believe this to beindicative of a gradual improvement in the quality of the contestants.The program submitted to this year's contest, MegaHAL, uses a significantlydifferent method of simulating conversation than either HeX or SEPO, and wededicate the remainder of this paper to describing its workings.4.1 Markov ModellingMegaHAL is able to construct a model of language based on the evidence itencounters while conversing with the user. To begin with, the inputreceived from the user is parsed into an alternating sequence of words andnon-words, where a word is a series of alphanumeric characters and anon-word is a series of other characters. This is done to ensure not onlythat new words are learned, but that the separators between them arelearned as well. If the user has a habit of putting a double space after afull stop, for instance, MegaHAL will do just the same.The resulting string of symbols[2] is used to train two 4th-order Markovmodels (Jelinek, 1986). One of these models can predict which symbol willfollowing any sequence of four symbols, while the other can predict whichsymbol will precede any such sequence. Markov models express theirpredictions as a probability distribution over all known symbols, and aretherefore capable of choosing likely words over unlikely ones. Models oforder 4 were chosen to ensure that the prediction is based on two words;this has been found necessary to produce output resembling natural language(Hutchens, 1994).4.2 Generating Candidate RepliesUsing a Markov model to generate replies is easy; Shannon was doing muchthe same thing by flipping through books back in 1949 (Shannon, 1949).However, such replies will often be nonsensical, and will bear norelationship to the user's input.MegaHAL therefore attempts to generate suitable replies by basing them onone or more keywords from the user's input. This explains why two Markovmodels are necessary; the first model generates a sentence from the keywordon, while the second model generates the remainder of the sentence, fromthe keyword back to the beginning.Keywords are obtained from the users input. Frequently occurring words,such as "the", "and" and "what", are discarded, as their presence in theinput does not mean they need to be present in the output. The remainingwords are transformed if necessary---"my" becomes "your" and "why" becomes"because", for example. What remains is used to seed the output.4.3 Selecting a ReplyMegaHAL is able to generate many hundreds of candidate replies per second,each of which contain at least one keyword. Once a small time period haselapsed, the program must display a reply to the user. A method is neededfor selecting a suitable reply out of the hundreds of candidates. I(w|s) = -log P(w|s) - Equation 1MegaHAL chooses the reply which assigns the keywords the highestinformation. The information of a word is defined in Equation 1 as thesurprise it causes the Markov model. Hence the most surprising reply isselected, which helps to guarantee its originality. Note that P(w|s) isthe probability of word w following the symbol sequence s, according to theMarkov model.The algorithm for MegaHAL proceeds as follows:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -