📄 notes2
字号:
The SGML summarizer uses a SGML-to-SOIF translation table. Each linecontains two fields: the SGML tag plus modifiers, and a list of SOIFattributes. This is an example that can be used for simple summarizingof HTML documents: <BODY> bodyThe effect of this is that all text which appears between the BODYtags is placed into the SOIF attribute named 'body'. This works becausedata for tags not listed in the table are passed up to the parent tag.If we know that data appearing between PRE tags is useless to index,we can leave it out by specifying the SOIF attribute as 'ignore'. <BODY> body <PRE> ignoreThe title is not included in the body, so we can add a specialattribute for that <BODY> body <PRE> ignore <TITLE> titleThis will get a us pretty far. Now maybe we want to use words in bold or other special fonts as keywords in the summary <BODY> body <PRE> ignore <TITLE> title <B> keywords <EM> keywordsOne bad effect of this is that words in bold are taken out of 'body' and put into 'keywords'. This makes the body text somewhat unreadable and might cause phrase searches on the body to miss. We can have thedata for these bolded words passed up to the parent tag by listing'parent' as a SOIF attribute: <BODY> body <PRE> ignore <TITLE> title <B> keywords,parent <EM> keywords,parentSo far we have only summarized the content _between_ tags. Perhaps we would also like to include the URLs of hypertext links in the SOIF data. These appears as attribute values _within_ an SGML tag.They are specified in the translation table as <TAG:ATTR>: <BODY> body <PRE> ignore <TITLE> title <B> keywords,parent <EM> keywords,parent <A:HREF> url-referencesHTML has a very flexible META tag with which we can write things like <META NAME="Author" CONTENT="Duane Wessels"> <META NAME="Data-Source" CONTENT="Dept of Records"> <META NAME="Data-Quality" CONTENT="5"> <META NAME="keywords" CONTENT="skiing winter recreation expensive">Rather than having all of these appear in a 'meta' SOIF attribute, itwould be nice to have them each appear in their own attribute. Thiscan be done by using a variable-like notation to specify the SOIFattribute as the value of one of the SGML attributes, in this caseNAME. <BODY> body <PRE> ignore <TITLE> title <B> keywords,parent <EM> keywords,parent <A:HREF> url-references <META:CONTENT> $NAMEThe resulting SOIF would look like: author{13}: Duane Wessels data-source{15}: Dept of Records data-quality{1}: 5 keywords{34}: expensive recreation skiing winterThe 'Rainbow' project translate MIF/RTF/Interleaf into SGML. Ratherthan having "fixed" tag names such as <PRE> and <STRONG>, it has moregeneric tags and uses attributes to specify more about the data.For example, a bold phrase might appear as ...the <CLF FONT="bold">Hounds of the Baskervilles</CLF> was...Similarly, paragraphs appear as <PARA PARATYPE="title">How I spent my summer vacation<PARA>This is accomodated in the SGML summarizer by giving attribute values in the mapping table: <PARA,PARATYPE=title> title <PARA,PARATYPE=heading 1> headings <PARA,PARATYPE=heading 2> headings <PARA> bodyNote that order is important here. The first match found is accepted.Less-specific specifications should be listed later.The bad news here is that it is unclear how the magic words such as'title' and 'heading 1' are choosen. I suspect that they arehard-coded into most word processors, but different across versions andplatforms. FrameMaker probably allows the user to create and name acustom paragraph type. So in order to really use the SGML summarizereffectively here, the Harvest admin will need to know something aboutthe documents being summarized.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -