http:^^www.cs.cornell.edu^info^people^summers^classify.html
来自「This data set contains WWW-pages collect」· HTML 代码 · 共 46 行
HTML
46 行
MIME-Version: 1.0
Server: CERN/3.0
Date: Sunday, 01-Dec-96 20:18:01 GMT
Content-Type: text/html
Content-Length: 1457
Last-Modified: Wednesday, 28-Feb-96 20:31:20 GMT
<html><head><title>Kristen Summers -- Document Structure Classification</title></head><body><h1>Near-Wordless Document Structure Classification</h1><p>In <em>Proceedings of the International Conference on DocumentAnalysis and Recognition</em> (ICDAR '95), pp. 462 - 465, Montréal,August 1995.</p><hr><p><strong>Abstract</strong><br>Automatic derivation of logical document structure from generic layoutwould enable a multiplicity of electronic document manipulationtools of a type that is becoming crucial to users who wishto browse the internet.This problem can be divided into segmentation (dividing the textinto a hierarchy of pieces) and classification (categorizingthese pieces as particular logical structures.)This paper proposes an approach to the classification oflogical document structures, according to theirdistance from prototypes that are primarily geometric. Theprototypes consider linguistic information minimally,thus relying minimally on the accuracy of OCR and decreasing language-dependence. Different classes of logicalstructures and the differences in therequisite information for classifying them are presented.A prototype format is proposed,existing prototypes and a distance measurement are described, and performance results are provided.</p><hr>You can view the <!WA0><!WA0><!WA0><!WA0><a href="http://www.cs.cornell.edu/Info/People/summers/Papers/classify.ps">full postscript file</a>or return to <!WA1><!WA1><!WA1><!WA1><a href="http://www.cs.cornell.edu/Info/People/summers/summers.html">my home page</a>.</body></html>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?