📄 chunkingtoolguide.tex
字号:
The "Commit" menu item saves the chunking-changes made in the program,by committing them to the database.equivalent <page\_anchor id="1130">toolbar button:\includegraphics[scale=0.5]{commit.png}.\paragraph{Exit}The "Exit" menu item quits the program.Equivalent toolbar button:None.\paragraph{Edit menu}\paragraph{Split}The "Split" menu item splits the current chunk right before thecurrently selected box.Equivalent toolbar button:\includegraphics[scale=0.5]{flash.png}.\paragraph{Combine}The "Combine" menu item combines the chunk containing the currentlyselected box with the previous chunk.Equivalent toolbar button:\includegraphics[scale=0.5]{together.png}.\paragraph{Move left}The "Move left" menu item moves the chunk containing the currentlyselected box one tab-stop to the left.Equivalent toolbar button:\includegraphics[scale=0.5]{leftarrow.png}.\paragraph{Move right}The "Move right" menu item moves the chunk containing the currentlyselected box one tab-stop to the right.Equivalent toolbar button:\includegraphics[scale=0.5]{rightarrow.png}.\paragraph{Tools menu}\paragraph{Configure...}This menu-item has not been implemented yet. Sorry.\paragraph{Help menu}\paragraph{Help Contents...}This brings up this help document.\paragraph{About Emdros Chunking Tool...}This brings up the "About box". Press "OK" to dismiss itagain.Equivalent toolbar button:None.\subsubsection{Chunking Area}The chunking area is middle of the program window. It is this areathat is the place for interacting with the program.Once a database connection has been loaded, you will see some textin this window. The text is divided into clickable (and thusselectable) boxes.You can use the "split" \includegraphics[scale=0.5]{flash.png} and"combine" \includegraphics[scale=0.5]{together.png} buttons to split orcombine the text into "chunks", each on its own line.These chunks can then be indented with respect to each other withthe "Move left" \includegraphics[scale=0.5]{leftarrow.png} and "Moveright" \includegraphics[scale=0.5]{rightarrow.png} buttons.\paragraph{Example}In the image below, you can see an example database that has beenchunked and indented.\bigskip\includegraphics[scale=0.5]{MainScreenGreekExample.png}\bigskip\section{Configuring the program}\subsection{}\subsection{Format of the configuration file}The configuration file follows many other Unix and Windowsconfiguration files in that:\begin{itemize} \item Comments are prefixed by \#, and anything from the \# to the end of the line is ignored. \item Blank lines are ignored. \item The rest is a number of "key = value" pairs. \item The keys are pre-defined (see below). \item The values are either "quote-enclosed strings" (e.g., "C:$\backslash$Emdros$\backslash$mymap.map") or consist of letters, numbers, underscores, and/or dots, optionally followed by a "quote-enclosed string" (e.g., 'word.surface', or 'word.surface."C:$\backslash$Documents and Settings$\backslash$Administrator$\backslash$teckitmap.map"').\end{itemize}When a value has dots that are not enclosed in "quotes", then thestrings on either side of the dots are interpreted as subkeys. Forexample, the value "word.surface" represents the subkey "word" withthe value "surface", and the value"word.surface."/home/myname/Blah.map" represents the subkey "word"with the subsubkey "surface", followed by the value"/home/myname/Blah.map".Here is a sample configuration file, explained bit by bit:\subsubsection{Database selection}\begin{verbatim}# databasedatabase = mydb\end{verbatim}You can specify a database that is always to be used with thisconfiguration file.If using SQLite, you may wish to specify a path. Do so inquotes:\begin{verbatim}# database# Put the database name in quotes.# For SQLite 2 and SQLite 3, you should probably give# the full path to the file as well.database = "C:\Program Files\Emdros\Emdros-1.2.0.pre228\db\mydb"\end{verbatim}\subsubsection{Data unit}\begin{verbatim}# data unit# There can only be one data unit# but it can have as many data_features as you like.# Each data_feature will go on its own interlinear line.# data_unit = worddata_feature = graphical_worddata_feature = graphical_lexeme\end{verbatim}The data unit is the basic unit that will result in one box in thechunking area. They can be any object type, and need not be words.However, probably you want them to be words or word-like objects. Itdepends on how large segments you want to be able to chunk at atime.You must specify which feature(s) to display for the data unit.There can only be one data unit.\subsubsection{TECkit mappings}\begin{verbatim}# TECKit## data_feature_teckit_mapping defines what TECkit map to use# for a given data_feature.## data_feature_teckit_in_encoding specifies the in_encoding ("bytes" # or "unicode") for the given data_feature.## data_feature_teckit_out_encoding specifies the out_encoding ("bytes"# or "unicode") for the given data_feature.# data_feature_teckit_mapping = graphical_word."Amsterdam.map"data_feature_teckit_in_encoding = graphical_word.bytesdata_feature_teckit_out_encoding = graphical_word.unicode\end{verbatim}{\bf TECkit} is a tool made by SIL International. Itconverts between encodings, in particular to and from Unicode. TheEmdros Chunking Tool incorporates TECkit, and you can apply it to anytextual feature of any object type.TECkit works with a so-called "map file" -- a text file which youor someone else writes. More information about writing TECkitmappings can be found on SIL's website:\begin{center}{\bf http://scripts.sil.org/TECkit/}\end{center}The Emdros Chunking Tool needs three pieces of information inorder for TECkit to work on a particular feature:\begin{enumerate} \item The name of the file which holds the maping. This is given with the key "data\_feature\_teckit\_mapping". \item The input encoding (encoding of the feature-string): This is given with the key "data\_feature\_teckit\_in\_encoding". The value can be either "bytes" or "unicode" (without the quotes). "bytes" means that TECkit does not convert to UTF-8. "unicode" means it is converted to UTF-8 for display. You should use whatever is used in the map file for input encoding here. \item The output encoding (encoding to transform into): This is given with the key "data\_feature\_teckit\_out\_encoding". The same meanings and restrictions apply as for the input encoding.\end{enumerate}TECkit can not only convert between encodings, but also removestuff from a string. This can come in handy when you have charactersin your feature-strings which you do not wish to display. Again, seethe TECkit site on SIL's website for information on how to write aTECkit mapping.You should give first the object type, then a dot, then thefeature-name, then a dot, then the full path to the map file. Youprobably need to enclose the path in "double quotes".You can only have one TECkit per feature.\subsubsection{Options}\begin{verbatim}# Options## The only option available is 'right_to_left', which, if set,# will cause the chunking area to run right to left rather than# left to right.option = right_to_left\end{verbatim}\subsubsection{Display options}\begin{verbatim}# Fonts -- chunking area font names.# If you give more than one chunking_area_font_name,# they will be assigned to individual data_feature interlinear# lines, in the same order as the data_feature keys appear.## If you give less keys here than you have data_feature keys,# then the last one will be used for the ones that aren't assigned# an explicit value.## If you give no values for this key, then some sensible default# font will be used.#chunking_area_font_name = "Ezra SIL"chunking_area_font_name = "Courier"chunking_area_font_name = "Ezra SIL"## The magnification (in percent) of the chunking area.# 100 corresponds approximately to a font size of 12 points.#chunking_area_magnification = 120\end{verbatim}\end{document}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -