📄 http:^^www.cs.wisc.edu^~keeper^webtalk.html
字号:
Date: Tue, 05 Nov 1996 21:57:36 GMTServer: NCSA/1.5Content-type: text/htmlLast-modified: Fri, 05 May 1995 15:13:10 GMTContent-length: 11933<HTML><HEAD><TITLE>World Wide Web Specification Issues</TITLE></HEAD><BODY><H1>World Wide Web Specification Issues</H1><H3>Steven Fought - 5 May 1995</H3><P> <H2>Sources</H2><UL> <LI> W3O Official specifications <LI> Internet RFCs and drafts <LI> WWW newsgroups <LI> Experience</UL><P> <H2>Personal Experience</H2><UL> <LI> I have been working with Web related programs for 2 years <LI> Webmaster at Caltech from inception in November 1993 until August 1994 <LI> Implemented database search and entry tools using FORMS <LI> Installed most Web software packages available for UNIX <LI> Followed Web newsgroups from the beginning <LI> Currently the ``Webmaster'' at UW CS</UL><P> <H2>The Origins of the World Wide Web</H2><UL> <LI> Conceived by Tim Berners-Lee and others at CERN <LI> Designed to foster communication between High Energy Physicists <LI> First specification called for a hypertextual system</UL><P> Tim Berners-Lee (now with W3O) was asked to design a system that would allowphysicists in different parts of the world to collaborate on projects andshare information using the Internet after it was decided that existingtools weren't adequate. <P> Berners-Lee decided to use a hypertextual model, and then set out to solvea number of problems posed by that model.<H2>First problem:</H2><P> In any hypertext system you need a way to point to informationobjects so you can ``carry'' the pointer instead of the object.<H2>Solution:</H2><P> The Uniform Resource Identifier (URI) specification, a general specification that makes it possible to point to any document, anywhere.<H2>Uniform Resource Locators (URIs)</H2><P> The URI specification ``defines a way to encapsulate a name in any registered namespace, and label it with the namespace, producinga member of the universal set.''<P> In other words, the URI specification defines a superset to all existing and possible namespaces. Any namespace can be given a label and incorporated into the URI space.<H2>Properties of URIs</H2><DL><DT> Extensible<DD> New naming schemes can be easily added.<DT> Complete<DD> It is possible to encode any naming scheme<DT> Printable<DD> URIs are encoded in 7-bit ASCII and are designed to be at least partially human-understandable and communicable.</DL><H2>Parts of the URI specification</H2><P> URIs consist of two parts: <UL><LI> A <EM>prefix</EM> that indicates what namespace is being referenced, followed by a colon<LI> A string with format defined as a function of the prefix</UL><P> The extensibility requirement is met by the ability to register new uniqueprefixes. The completeness requirement is met by the ability to encodeany binary information in the string following the prefix (in Base64, for instance). The printability requirement is left to the implementationof specific namespace encodings.<H2>Special considerations and reserved characters in URIs</H2><DL><DT> \ <DD> is reserved as an escape character, so non-7-bit ASCII charactersand reserved characters can be used in URIs easily<DT> / <DD> is reserved as a delimiter of a hierarchical set of substrings<DT> . and ..<DD> are reserved if they are used between / characters, to indicate the current and previous level in a hierarchy respectively<DT> \# <DD> is reserved to separate a URI from a ``fragment identifier''<DT> ? <DD> is reserved to delimit the boundary between a URI and a queryable object<DT> + <DD> is reserved as a shorthand notation for a space, so real + signs must be encoded.<DT> * and !<DD> are reserved for use with special significance withinspecific namespaces.</DL><H2>Relative URIs</H2><P> Reserving /, . and .. allowed the specification of relative URIs, which work much like relative paths in a filesystem. When a relative URI is found the URI of the containing document is used as a reference to construct a new full URI following these semantics:<UL><LI> If a partial URI starts with some number of slashes, the parent URIis searched for the first occurrence of the same number of slashes, and therelative URI is substituted for the remaining part of the parent, providedthat no greater number of consecutive slashes are in the remaining part ofthe parent.<LI> Within the result all occurrences of ``xxx/../'' or ``/.'' arerecursively removed, where ``xxx'', ``..'', and ``.'' are complete path elements.</UL><H2>Examples of relative URI substitutions</H2>If the parent URI is <TT>http://www/b/c//d/e/f</TT> the following partial URIs result in the listed full URIs:<DL><DT> g <DD> <TT>http://www/b/c//d/e/g</TT><DT> /g <DD> <TT>http://www/g</TT><DT> //g <DD> <TT>http://g</TT><DT> ../g <DD> <TT>http://www/b/c//d/g</TT><DT> g:h <DD> <TT>g:h</TT></DL><P> Note that using the parent URI <TT>http://www/b/c//d/e/</TT> would yield thesame results.<H2>Second problem:</H2> Pointing to the documents we have now.<P> Now that a we have the URI specification, we need to be able to point to existing documents available on the Web.<H2>Solution:</H2> The Uniform Resource Locator (URL) specifications, one for each supported Internet protocol.Some examples:<DL> <DT> ftp: <DD> <TT>ftp://ftp.cs.wisc.edu/condor/</TT> <DT> telnet: <DD> <TT>telnet://keeper:notquite@spacely.cs.wisc.edu</TT> <DT> http: <DD> <TT>http://spacely.cs.wisc.edu:8000/home.html</TT></DL><H2>Side note: Work on the URN specification</H2><P> There is a working group of the IETF attempting to define a Uniform ResourceName specification. URNs are meant to be persistent objects regardless ofhow machine and server configurations are changed. URNs solve the same problemfor URLs as DNS solves for IP numbers.<H2>Third problem:</H2> <P> Now that we have pointers to document objects, we need aplace to put them.<H2>Solution:</H2> HTML, the Hypertext Markup Language. <P> Design features of HTML:<UL> <LI> Defined as an SGML Document Type Definition, allowing easy processing of HTML by SGML parsers <LI> Structural Markup <LI> Simple and quick to render (no lookahead) <LI> Human readable and editable (no special tools are needed to create HTML documents)</UL><P> HTML is beyond the scope of the talk.<H2>Side Note: Multimedia and MIME</H2><P> The original Web browsers used the extension of a file to determine its type.This method had several disadvantages:<UL> <LI> A single extension may be used for more than one kind of file. <LI> File extensions do not generally carry enough information to allow identification of a file format by a human. <LI> Not everyone will agree on what file extensions map to what types of files.</UL><P> To fix this problem parts of the existing MIME (Multipurpose Internet Mail Extensions) system was integrated into Web clients and servers.<H2>How MIME works</H2><P> Before a document is transmitted it is assigned a MIME type by the server ormailer. This assignment is often made based on file extension, but becausethe assignment is made locally the user can make sure the appropriate type isdefined. The MIME type is a description of the contents of the file.<P> When the file is received, the browser uses the MIME type to find an appropriate viewer for the file.<P> MIME features:<UL> <LI> New MIME types can be added at any time <LI> An official organization exists to register and distribute MIME types <LI> Several implementations of either end of the MIME system exist for many different architectures</UL><H2>Fourth Problem:</H2> <P> How to transfer documents from the author to the user.<H2>Solution:</H2> <P> The Hypertext Transfer Protocol (HTTP).<P> Any simple summary of the features of HTTP would ignore the seriouschanges its role precipitated by other changes in other WWW tools.A chronological summary of the changes in HTTP features is more interesting.<H2>HTTP 0.9: The original features and purpose</H2><P> The first version of HTTP to be distributed widely was 0.9. The only request that could be made was ``GET (url)'', where ``url'' is an HTTP URLwith the prefix stripped. The document pointed to by the URL would be returned to the browser.<P> HTTP 0.9 was designed to deliver documents with the lowest amount of overhead as possible. FTP can perform the same function, but it requiresa costly login process. HTTP is a stateless protocol. Berners-Lee sawthat a document would be transferred and read, and then a link would be followed to another document, possibly not on the same server. There wasno advantage to keeping a socket open.<H2>HTTP 1.0: Document Typing and CGI</H2><P> The next version of HTTP was designed to fix a number of problems with theprevious versions and add new features. The major change was the additionof document typing using MIME-related headers. In addition other <EM>Methods</EM> were included in addition to the GET method. Some of these were:<DL><DT> HEAD<DD> is the same as GET, but only returns the headers<DT> PUT <DD> allows data sent to the server to be stored under the suppliedURL (not widely used)<DT> POST <DD> Creates a new object based on the data sent that is linkedto the object specified in the supplied URL<DT> LINK <DD> links an object to the specified object (not widely used)<DT> UNLINK <DD> removes a link or other information from an object</DL><P> The most important of these methods is PUT, which is used in conjunctionwith the Common Gateway Interface.<H2>The Common Gateway Interface (CGI) and Forms</H2><P> Forms: A specification for creating a fill-out form within an HTML document.Each browser that implements Forms is responsible for packing the informationinto a special format when a form is submitted and sending it to a specifiedURL.<P> CGI: A specification for a script on an HTTP server that has its own URL. When the URL is accessed, the script is run and its output is sent to theclient. Used in conjunction with Forms, a set of scripts can carry on a"dialogue" with a client.<BLOCKQUOTE>Interesting note: Because HTTP is stateless, CGI scripts often have to play tricks to ensure that the state of a conversation is stored in the document returned to a client.</BLOCKQUOTE><H2>Problems caused by inlined documents</H2><P> During the development of Mosaic, one of the programmers (Marc Andreesen)decided he wanted to add support for displaying pictures inside ofdocuments. As with every decision made by Andreesen and the new NetscapeCommunications company since, he designed a quick-and-dirty solutionthat served his needs and caused significant problems he could blame on other people. <P> Rather than find a way of encapsulating a picture with a document he decidedon the most general model, which was to have the browser perform anadditional request for each picture. This changed the model that Berners-Leehad originally envisioned and created performance problems caused by theoverhead of forming a TCP socket.<H2>Proposed solutions to the inlining problem</H2><P> There are two proposed solutions to the problem of inlined documents: <UL><LI> Include a multiple GET method in HTTP 1.1, which will requireat most two sockets to be created (one for the original document, and theother for the supporting documents). <LI> HTTP-NG, which is basedon top of the <EM>Session Protocol Architecture</EM> and allows multiple low-levelvirtual connections to be encoded on top of one socket. The socket couldbe kept open until the browser was finished with the server.</UL><H2>HTTP 1.1: Proposed Additions to the protocol</H2><P> Other additions include support for more advanced applications, and for encryption of sensitive data. <P> Care is being taken to ensure that the protocol will be extensible.</BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -