📄 design-of-wordnet.txt

📁 此文档为wordnet的介绍文档
💻 TXT
📖 第 1 页 / 共 4 页
字号:
上一页 1 2 34
           the search results. The help information is displayed in the output buffer before
           the search results.
     [2] Many WordNet synsets have a textual gloss which often provides an
           explanation of what the synset represents. The Textual Gloss option controls
           this display.
     [3] In addition to being viewed in the output buffer, search results may be appended
           to a file. When the Log option is On, search results are appended to the file
           named when the option is displayed. By default this file is wnoutput.log. If the
           WNLOG environment variable is set, the filename is the value of the variable
           with .log appended.
     [4] Display license allows a user to view the WordNet copyright notice, version
           number, and license.
     [5] Selecting Quit exits xwordnet.

Output

     The output of a WordNet search is intended to be self-explanatory, given that the
user knows what type of search was requested. Visual cues, such as indentation to
represent levels in retrieved hierarchies, are relied upon to aide a user in interpreting the
formatted search results. The complex nature of the adjective structure, unfortunately,
makes for less straightforward output of retrieved adjective synsets. In an attempt to
clarify the display of adjectival information, direct antonyms, which are generally
represented only by head synsets, are always displayed together. This allows a user to
distinguish head synsets from satellite synsets, as well as different senses of a head
synset.
     The output of a search is displayed in the large buffer below the status line. Both
horizontal and vertical scroll bars are used to view data that exceeds the window's
borders. The output consists of an ordinal sense number (simply indicating position in
the list of senses), followed by a line with the synset that the search string is in, followed
by the search results. Each line of search output is preceded by a marker and the synset
containing the requested information. If a search traverses more than one level of the
tree, then successive lines are indented by spaces corresponding to its level in the
hierarchy. If a search doesn't apply to all senses of the search string, the search results
are headed by a string such as:

      2 of 5 senses of table

     When ``Sample sentences for verb _____'' is selected, verb frames that are
acceptable for all words in a synset are preceded by the string ``*>''. If a frame is
acceptable for the search string only, it is preceded by the string ``=>''.
     When an adjective is printed, its direct antonym, if it has one, is also printed in
parentheses. Since adjectives can be in either head synsets, satellite synsets, or both, any
head synsets that the word appears in are printed first, followed by all of the satellite
synsets that the word appears in, with an indication of the head synset that the adjective
is a satellite of. When the search string is in a head synset, all of the head synset's
satellites are also displayed. The position of an adjective in relation to the noun may be
restricted to the prenominal, postnominal, or predicative position. Where present, these
restrictions are noted in parentheses.
     When an adverb is derived from an adjective, the specific adjectival sense on which
it is based is printed, along with the relevant adjective synset. If the adjective synset
indicated is a satellite synset, then the pertinent head synset is printed following the
satellite synset.

Morphy

     Many dictionaries hang their information on uninflected headwords without
separate listings for inflectional (or many derivational) forms of the word. In a printed
dictionary, that practice causes little trouble; with a few highly irregular exceptions,
morphologically related words are generally similar enough in spelling to the reference
form that the eye, aided by boldface type, quickly picks them up. In an electronic
dictionary, on the other hand, when an inflected form is requested, the response is likely
to be a frustrating announcement that the word is not in the database; users are required
to know the reference form of every word they want to look up. In WordNet, only base
forms of words are generally represented. In order to spare users the trouble of affix
stripping, and to assist with the creation of programs that use WordNet to automatically
process natural language texts, the WordNet software suite includes functions that give
WordNet some intelligence about English morphology. At the present time no
morphological processes are performed on adverbs.
     The WordNet morphological processing functions, Morphy, handle a wide range of
morphological transformations. Morphy uses two types of processes to try to convert a
word form into a form that is found in the WordNet database. There are lists of
inflectional endings, based on syntactic category, that can be detached from individual
words in an attempt to find a form of the word that is in WordNet. There are also
exception lists for each syntactic category in which a search for an inflected form may be
done. Morphy tries to use these two processes in an intelligent manner to translate the
word form passed to the form found in WordNet. Morphy first checks for exceptions, 
then uses the rules of detachment.
     The Morphy functions are part of the WordNet library and are used by the retrieval
software and various applications. The primary interface function is passed a string (a
word form or collocation) and a syntactic category. Since some words, such as axes can
have more than one base form (axe and axis), Morphy is set up to work in the following
manner. The first time that Morphy is called with a specific string, it returns a base form.
For each subsequent lookup of the same string, Morphy returns an alternative base form.
Whenever Morphy cannot perform a transformation, NULL is returned.

Exception Lists

     There is one exception list for each syntactic category (except adverbs). The
exception lists contain the morphological transformations for words that are not regular
and therefore cannot be processed in an algorithmic manner. Each line of an exception
list contains an inflected form of a word, followed by one or more base forms of the
word. The list is kept in alphabetical order and a binary search is used to find words in
these lists.

Single Words

     In general, single words are relatively easy to process. Morphy first looks for the
word form in the exception list. If it is found, then the first base form is returned.
Subsequent lookups for the same word form return alternative base forms, if present. A
NULL is returned when there are no more base forms of the word.
     If the word is not found in the exception list corresponding to the syntactic category,
then an algorithmic process that looks for a matching suffix is applied. If a matching
suffix is found, a corresponding ending is applied, if necessary, and WordNet is consulted
to see if the resulting word is found in WordNet. Refer to Table 4 for a list of suffixes
and endings for each syntactic category.

Collocations

     As opposed to single words, collocations can be quite challenging to transform into
a base form that is present in WordNet. In general, only base forms of words, even those
comprising collocations such as attorney general, are stored in WordNet. Transforming
the collocation attorneys general is then simply a matter of finding the base forms of the
individual words comprising the collocation. This usually works for nouns, therefore
non-conforming nouns, such as customs duty are presently entered in the noun exception
list (a transformation on each word results in the base form custom duty, which is not in
WordNet).
     Verb collocations that have prepositions, such as stand in line, are more difficult.
As with single words, the exception list is searched first. If the collocation is not found,
special code in Morphy determines whether a verb collocation has a preposition in it. If
it does, the following process is applied to try to find the base form. It is assumed that
the first word in the collocation is a verb and that the last word is a noun. The algorithm
then builds a search string with the base forms of the verb and noun, leaving the remainder 
of the collocation (usually just the preposition, but more words may be involved) in the 
middle. For example, passed standing in lines, the database search would be performed with 
stand in line, which is found in WordNet, and therefore returned from Morphy. If a verb 
collocation does not contain a preposition, then the base form of each word in the collocation
is found and WordNet is searched for the resulting string.

Hyphenation

     Hyphenation also presents special difficulties when searching WordNet. It is often
a subjective determination whether a word is hyphenated, is closed up, or is a collocation
of several words, and which of the various forms are entered into WordNet. When
Morphy breaks a string into ``words'', it looks for both spaces and hyphens as delimiters.

Future Work

     Since many noun collocations contains prepositions, such as line of products, an
algorithm similar to that used for verbs should be written for nouns. In the present
scheme, if Morphy is passed lines of products, the search string becomes line of product,
which is not in WordNet. Morphy should also be able to work in both directions -
when passed a base form, it should be possible to obtain inflected forms of the word.

Table 4    Morphy Suffixes and Endings

        Noun           |            Verb           |         Adjective
_______________________|___________________________|__________________________   
Sufffix       Ending   |    Suffix         Ending  |    Suffix       Ending

  s                          s                           er 
 ses            s           ies              y           est
 xes            x            es              e           er           e
 zes            z            es                          est          e 
 ches           ch           ed              e
 shes           sh           ed
                            ing              e
                            ing          
 
               
note

1  This is a revised version of "Implementing a Lexical Network" in CSL Report #43, 
prepared by Randee Tengi. UNIX is a registered trademark of UNIX System 
Laboratories, Inc. Sun, Sun 3 and Sun 4 are trademarks of Sun Microsystems, Inc. 
Macintosh is a trademark of Macintosh Laboratory, Inc. licensed to Apple Computer, Inc. 
NeXT is a trademark of NeXT. Microsoft Windows is a trademark of Microsoft Corporation. 
IBM is a registered trademark of InternationalBusiness Machines Corporation. 
X Windows is a trademark of the Massachusetts Institute of Technology. 
DECstation is a trademark of Digital Equipment Corporation.
上一页 1 2 34
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -