📄 eqtusersguide.tex
字号:
Normal usage involves:\begin{enumerate} \item Starting the program. \item Filling out the Connection settings dialog (SQLite version, MySQL/PostgreSQL version). This connects to a database using a certain configuration. \item Opening a query, or or writing a query in the input area. \item Executing the query with the \includegraphics[scale=0.5]{flash.png} {\bf execute} button. \item Examining the output in the output area. \item Repeating from step 3, or quitting the program.\end{enumerate}\section{Configuring the program}\subsection{}\subsection{Format of the configuration file}The configuration file follows many other Unix and Windowsconfiguration files in that:\begin{itemize} \item Comments are prefixed by \#, and anything from the \# to the end of the line is ignored. \item Blank lines are ignored. \item The rest is a number of "key = value" pairs. \item The keys are pre-defined (see below). \item The values are either "quote-enclosed strings" (e.g., "C:$\backslash$Emdros$\backslash$mymap.map") or consist of letters, numbers, underscores, and/or dots, optionally followed by a "quote-enclosed string" (e.g., 'word.surfce', 'word.surface."C:$\backslash$Documents and Settings$\backslash$Administrator$\backslash$teckitmap.map"').\end{itemize}When a value has dots that are not enclosed in "quotes", then thestrings on either side of the dots are interpreted as subkeys. Forexample, the value "word.surface" represents the subkey "word" withthe value "surface", and the value"word.surface."/home/myname/Blah.map" represents the subkey "word"with the subsubkey "surface", followed by the value"/home/myname/Blah.map".Here is a sample configuration file, explained bit by bit:\subsubsection{Database selection}\begin{verbatim}# databasedatabase = mydb\end{verbatim}You can specify a database that is always to be used with thisconfiguration file (unless overridden with the -d switch toeqtc).If using SQLite, you may wish to specify a path. Do so inquotes:\begin{verbatim}# databasedatabase = "C:\Program Files\Emdros\EmdrosSQLite-1.2.0.pre173\db\mydb"\end{verbatim}\subsubsection{Rasterising unit}\begin{verbatim}# rasterising unitraster_unit = clause\end{verbatim}The Emdros Query Tool operates with a notion of "rasterising unit".That is the unit to be displayed on one line. For example, if yourquery returns a bunch of words, then, in the example above, allclauses that contains at least one of the words will be fetched anddisplayed.There can only be one rasterising unit.\subsubsection{Raster context}\begin{verbatim}# raster contextraster_context_before = 10raster_context_after = 10\end{verbatim}The "raster\_unit" can be replaced with "so many monadsof context" (before and after a hit). If a raster\_unit isspecified, it will take priority. If a raster unit is not specified,then both of the raster\_context\_before / raster\_context\_after valuesmust be present.\subsubsection{Data units}\begin{verbatim}# data unitsdata_unit = clausedata_unit = phrasedata_unit = worddata_feature = word.surfacedata_feature = word.pspdata_feature = phrase.phrase_typedata_feature = phrase.function # You can have more than onedata_unit_name = clause."Cl"data_left_boundary = phrase.OPEN_BRACKET # Specifies left boundary markerdata_right_boundary = phrase.CLOSE_BRACKET # Specifies right boundary marker\end{verbatim}The data units are the units to be displayed in each rasterisingline. They can be anything, and need not be words.You must specify which feature(s) to display for each data unit.The feature-names must be prefixed with the name of the data unit plusa dot, as in the example above.The capitalisation must be exactly the same as the value for the"data\_unit" key. For example, if you said "data\_unit = phrase", thenyou must also say "data\_feature = phrase.phrase\_type", not"Phrase.phrase\_type".There can be more than one data unit. If so, they should bespecified in the order from largest to smallest (e.g., clause, phrase,word). This will give the "output" output style (see below) a hint asto how to print things in the right order.You can optionally specify "boundary markers" that will be printedat the left and right boundaries of a unit respectively. The stringsto be printed can be taken from the following table:\bigskip\begin{tabular}{*{2}{|l}|}\hline\begin{minipage}[t]{4cm}This string...\end{minipage} & \begin{minipage}[t]{4cm}Is replaced by...\newline(without the quotes)\end{minipage}\\\hline\begin{minipage}[t]{4cm}SPACE\end{minipage} & \begin{minipage}[t]{4cm}"\ "\end{minipage}\\\hline\begin{minipage}[t]{4cm}COMMA\end{minipage} & \begin{minipage}[t]{4cm}","\end{minipage}\\\hline\begin{minipage}[t]{4cm}COMMA\_SPACE\end{minipage} & \begin{minipage}[t]{4cm}",\ "\end{minipage}\\\hline\begin{minipage}[t]{4cm}COLON\end{minipage} & \begin{minipage}[t]{4cm}":"\end{minipage}\\\hline\begin{minipage}[t]{4cm}COLON\_SPACE\end{minipage} & \begin{minipage}[t]{4cm}":\ "\end{minipage}\\\hline\begin{minipage}[t]{4cm}OPEN\_BRACE\end{minipage} & \begin{minipage}[t]{4cm}"\{"\end{minipage}\\\hline\begin{minipage}[t]{4cm}CLOSE\_BRACE\end{minipage} & \begin{minipage}[t]{4cm}"\}"\end{minipage}\\\hline\begin{minipage}[t]{4cm}OPEN\_BRACKET\end{minipage} & \begin{minipage}[t]{4cm}"["\end{minipage}\\\hline\begin{minipage}[t]{4cm}CLOSE\_BRACKET\end{minipage} & \begin{minipage}[t]{4cm}"]"\end{minipage}\\\hline\begin{minipage}[t]{4cm}OPEN\_PAREN\end{minipage} & \begin{minipage}[t]{4cm}"("\end{minipage}\\\hline\begin{minipage}[t]{4cm}CLOSE\_PAREN\end{minipage} & \begin{minipage}[t]{4cm}")"\end{minipage}\\\hline\begin{minipage}[t]{4cm}NEWLINE\end{minipage} & \begin{minipage}[t]{4cm}newline\end{minipage}\\\hline\begin{minipage}[t]{4cm}NIL\end{minipage} & \begin{minipage}[t]{4cm}""\end{minipage}\\\hline\end{tabular}\bigskipThe "data\_unit\_name" key gives, for a given object type, a stringwhich will appear above all the other data\_features (if any). In theabove example, the clause unit is given a "Cl" label.Finally, in the graphical version of the Emdros Query Tool, it ispossible to have an interlinear display. The order of the lines inthe interlinear display is the same as the data\_feature keys. Thenumber of lines is equal to the number of features for the data unitfor which the most data\_feature keys are given, plus the number ofdata\_unit\_name keys for that unit.\subsubsection{TECkit mappings}\begin{verbatim}#surfacedata_feature_teckit_mapping = word.surface."e:\TECkit\mymap.map"data_feature_teckit_in_encoding = word.surface.bytesdata_feature_teckit_out_encoding = word.surface.unicode# lemmadata_feature_teckit_mapping = word.lemma."e:\TECkit\mymap.map"data_feature_teckit_in_encoding = word.lemma.bytesdata_feature_teckit_out_encoding = word.lemma.unicode\end{verbatim}{\bf TECkit} is a tool made by SIL International. Itconverts between encodings, in particular to and from Unicode. TheEmdros Query Tool incorporates TECkit, and you can apply it to anytextual feature of any object type.TECkit works with a so-called "map file" -- a text file which youor someone else writes. More information about writing TECkitmappings can be found on SIL's website:\begin{center}{\bf http://scripts.sil.org/TECkit/}\end{center}The Emdros Query Tool needs three pieces of information inorder for TECkit to work on a particular feature:\begin{enumerate} \item The name of the file which holds the maping. This is given with the key "data\_feature\_teckit\_mapping". \item The input encoding (encoding of the feature-string): This is given with the key "data\_feature\_teckit\_in\_encoding". The value can be either "bytes" or "unicode" (without the quotes). "bytes" means that TECkit does not convert to UTF-8. "unicode" means it is converted to UTF-8 for display. You should use whatever is used in the map file for input encoding here. \item The output encoding (encoding to transform into): This is given with the key "data\_feature\_teckit\_out\_encoding". The same meanings and restrictions apply as for the input encoding.\end{enumerate}TECkit can not only convert between encodings, but also removestuff from a string. This can come in handy when you have charactersin your feature-strings which you do not wish to display. Again, seethe TECkit site on SIL's website for information on how to write aTECkit mapping.You should give first the object type, then a dot, then thefeature-name, then a dot, then the full path to the map file. Youprobably need to enclose the path in "double quotes".You can only have one TECkit per feature.\subsubsection{Reference unit}\begin{verbatim}# reference unitsreference_unit = versereference_feature = verse.bookreference_feature = verse.chapterreference_feature = verse.versereference_sep = SPACE # between book and chapterreference_sep = COMMA # between chapter and verse\end{verbatim}If you have a unit in your database which somehow identifies theposition in the document, or an ID, you can display these units at theleft of each line. The canonical example is the Biblical system ofbook-chapter-verse, but in many corpora, there will be a unitidentifying, e.g., which newspaper article something came from.In the above example, verse is the reference unit, and threefeatures are fetched, namely book, chapter, verse. The order in whichthey are specified in the configuration file is the order in whichthey will be emitted.If there is more than one reference unit feature, you must specifythe separators to separate them. In the above example "SPACE" will beemitted between "book" and "chapter", and "COMMA" will be emittedbetween the chapter and the verse (again, the order matters). See thetable above for some possibilities of using special characters.There can be only one reference unit.\subsubsection{Output style}\begin{verbatim}#output_style = kwic#output_style = treeoutput_style = output\end{verbatim}Specifies which implementation to use for emittingsolutions. Currently, three kinds of output style are implemented:\begin{itemize} \item {\bf output}: A "bracketed" view. \item {\bf tree}: A "tree" view \item {\bf kwic}: A "key words in context" view.\end{itemize}\subsubsection{Data tree parent}\begin{verbatim}# Tree parent feature.# If output_style = tree, then it is assumed that# there is a feature on all relevant data units which gives the# id_d of the parent. That is, each child node in the tree# must have a feature which provides the id_d of its parent.# If a data_unit is provided which does not have a data_tree_parent,# then that data_unit *must* contain the top-most nodes in the tree.data_tree_parent = clause.parentdata_tree_parent = phrase.parentdata_tree_parent = word.parent\end{verbatim}If "output\_style" is set to "tree", then this option specifies, foreach terminal and non-terminal in the tree, what feature gives theparent of the node. Note that this feature must have type "id\_d", andthe value must point to the id\_d of the parent node.\subsubsection{Tree terminal unit}\begin{verbatim}# Tree terminal unit.# If output_style = tree, then the Emdros Query Tool needs to know# which object types are terminals (i.e., leaf nodes in the tree)# and which object types are non-terminals. This is done by# designating *one* (1) data_unit to be the data_tree_terminal_unit.# The rest of the data_units will then be non-terminals.data_tree_terminal_unit = word\end{verbatim}This options tells the tree layout code which data\_unit containsthe terminals. Note that the Emdros Query Tool assumes that terminalsand nonterminals are different object types. There may be more thanone nonterminal object type, but only one terminal object type. Thenon-terminsl object types are determined based on the data\_unitoption.\subsubsection{Hit type}\begin{verbatim}# hit type# hit_ must be one of:# focus# innermost# innermost_focus# outermosthit_type = outermost\end{verbatim}The hit type determines how the sheaf is interpreted. There arefour available options:\begin{itemize} \item {\bf focus}: Means that an object originating in a block with the FOCUS keyword present will result in one "hit". \item {\bf innermost}: Means that only the innermost MatchedObjects will give rise to hits; one hit per string of blocks in which all matched objects have no descendants (i.e., no inner sheaf). \item {\bf innermost\_focus}: Like innermost, but only those matched objects whose "focus" boolean is set will have their monads included. \item {\bf outermost}: Means that only the outermost MatchedObjects will give rise to hits; one hit per outermost MatchedObject.\end{itemize}If none of these are specified, then "outermost" isassumed as the default.\subsubsection{Options}\begin{verbatim}# display optionsoption = apply_focusoption = break_after_rasteroption = quietoption = single_raster_units\end{verbatim}You can have these options:\bigskip\begin{tabular}{*{2}{|l}|}\hline\begin{minipage}[t]{4cm}Option\end{minipage} & \begin{minipage}[t]{4cm}Meaning\end{minipage}\\\hline\begin{minipage}[t]{4cm}apply\_focus\end{minipage} & \begin{minipage}[t]{4cm}If set, then those data units which had the "focus" keyword in the original query will be surrounded by \{braces\} in the output.\end{minipage}\\\hline\begin{minipage}[t]{4cm}break\_after\_raster\end{minipage} & \begin{minipage}[t]{4cm}If set, then a newline is emitted after each raster-line. If not set, then the raster-lines are run together.\end{minipage}\\\hline\begin{minipage}[t]{4cm}quiet\end{minipage} & \begin{minipage}[t]{4cm}If set, then only results will be printed; nothing else. If not set, then things like progress and number of solutions will be printed. If an error occurs, then that will be printed regardless of the status of this option.\end{minipage}\\\hline\begin{minipage}[t]{4cm}single\_raster\_units\end{minipage} & \begin{minipage}[t]{4cm}If set, then each raster unit will only ever be printed once. This affects the number of solutions printed: If two solutions each contain the same raster unit, then only one of the solutions will be printed.\end{minipage}\\\hline\end{tabular}\bigskip
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -