⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 auth_proposal.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
📖 第 1 页 / 共 3 页
字号:
      The <i class="parameter"><tt>login URI</tt></i> is the login page whose successful      navigation gives access to the protected space: e.g. If the pattern we      used testing was, "http://www.archive.org/private/*", the      <i class="parameter"><tt>login URI</tt></i> might be      "http://www.archive.org/private/login.html".</p><p>If the current URI matches one of the <i class="parameter"><tt>login URI      pattern</tt></i> list, we pull the matched patterns associated      <i class="parameter"><tt>login record</tt></i>. If the <i class="parameter"><tt>ran      login</tt></i> flag has not been set, the <i class="parameter"><tt>login      URI</tt></i> is <span class="emphasis"><em>force</em></span> queued. Its force queued      in case the URI has been seen (GET'd) already. The <i class="parameter"><tt>login      URI</tt></i> (somehow) has the <i class="parameter"><tt>login record</tt></i>      associated. The presence of the <i class="parameter"><tt>login record</tt></i>      distingushes the <i class="parameter"><tt>login URI</tt></i>. The current URI is      requeued (Precondition not met). Otherwise the current URI is let run      through as per normal.</p><p>When the <i class="parameter"><tt>login URI</tt></i> becomes the current URI      and is being processed by the HTTP fetcher, the presence of the      <i class="parameter"><tt>login record</tt></i> with a <i class="parameter"><tt>ran      login</tt></i> set to false signals the HTTP fetcher to run the      abnormal login sequence rather than do its usual GET. The      <i class="parameter"><tt>login record</tt></i> has all the HTTP fetcher needs to      execute the login. Upon completion, the <i class="parameter"><tt>login ran</tt></i>      flag is set in the <i class="parameter"><tt>login record</tt></i> and the      <i class="parameter"><tt>login record</tt></i> is removed from the <i class="parameter"><tt>login      URI</tt></i>.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">GET of the login URI</h3><p>What if we haven't already seen the login page? Should the login        precondition first force fetch the login URI without the login record        loaded so its first GET'd before the we run a login?</p></div><p>This implementation cannot guarantee successful login nor is there      provision for retries. The general notion is that the single running of      the login succeeds and that the produced success cookie or rewritten URI      makes it back to the Heritrix client gaining us access to the protected      area.</p><p>Configuration would enable or disable this feature.</p><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="N10202"></a>3.2.1.&nbsp;Login Record</h4></div></div><div></div></div><p>A login record would be keyed by the pattern it applies to and        would contain aforementioned <i class="parameter"><tt>ran login</tt></i> flag and        <i class="parameter"><tt>login URI</tt></i>. Tied to the login URI would be a        list of key-value pairs to hold the login form content as well as        specification of whether the form is to be POSTed or GETed.</p></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="commonage"></a>3.3.&nbsp;Commonage</h3></div></div><div></div></div><p>Here we discuss features common to the two above authentication      scheme implementations.</p><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="N10215"></a>3.3.1.&nbsp;URI#authority as URI canonical root URL</h4></div></div><div></div></div><p>Proposal is to equate the two. Doing so means no need to change        CrawlServer. Currently the CawlServer is constructed wrapping the        URI#authority portion of an URI. URI#authority is <i class="parameter"><tt>URI        canonical root URL</tt></i> absent the scheme. Assuming CrawlServer        is for http only, then it should be safe making this equation.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">DNS</h3><p>Are there CrawlServer instances made for anything but http          schemes?</p></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">HTTPS</h3><p>Check that <i class="parameter"><tt>URI canonical root URL</tt></i>s of          <tt class="filename">http://www.example.com</tt> and          <tt class="filename">https://www.example.com</tt> result in different          <tt class="classname">CrawlServer</tt> instances.</p></div></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="N10237"></a>3.3.2.&nbsp;Population of Domain/VirtualDomain object with        Credentials</h4></div></div><div></div></div><p>Proposal is that CrawlServer encapsulate credentials store        accessing, that it read the store upon construction.</p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="N1023C"></a>3.3.3.&nbsp;Caching of Credentials</h4></div></div><div></div></div><p>Once read from the store, we need to cache the credentials in        CrawlServer.</p><div class="sect4" lang="en"><div class="titlepage"><div><div><h5 class="title"><a name="N10241"></a>3.3.3.1.&nbsp;JAAS Subject, Principal and Credentials [<a href="#jaas" title="[jaas]"><span class="abbrev">jaas</span></a>]</h5></div></div><div></div></div><p>Proposal is that we at least look at selectively exploiting          this library caching credentials. For example, a CrawlServer might          implement the java.security.auth.Subject interface. To this Subject,          we'd add implementations of the Principals and Credentials          interfaces (Makes sense for the carrying of RFC2617 credentials.          Less so for login credentials. TBD).</p></div></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="store"></a>3.3.4.&nbsp;Credential Stores</h4></div></div><div></div></div><p>The credential store would be on disk.</p><p>For convenience, particularly listing credentials in a global        file store, credentials can be grouped first by host (the base domain        -- domain minus port #) and then by URI#authority (domain plus any        port #).</p><p>Configuration would allow us to point at a global store of        credentials.</p><div class="sect4" lang="en"><div class="titlepage"><div><div><h5 class="title"><a name="N10252"></a>3.3.4.1.&nbsp;Layering of Credential Stores</h5></div></div><div></div></div><p>Subsequently, we'd add support for          <span class="emphasis"><em>layering</em></span> stores. Modeled after apache's          <tt class="filename">.htaccess</tt> mechanism for selectively overriding          the main server configuration on a directory scope, or, closer to          home, on how Heritrix settings can be overridden on a per-host          basis, it'd be possible to point the store querying code at a          directory whose subdirectories are named for domains progressing          from a root down through the macro level org, com, gov, etc.,          subdomains getting progressively more precise: e.g travel.yahoo.com          would be found under the yahoo.com directory which would be under          the com directory. Searching for credentials, we'd search up through          the directory structure going from the current domain on up to the          root. <i class="parameter"><tt>realm + canonical root URL</tt></i> key. If not          found in the domain store, of if a domain store did not exist, we'd          back up the settings hierarchy until we hit the global store.</p></div><div class="sect4" lang="en"><div class="titlepage"><div><div><h5 class="title"><a name="N10262"></a>3.3.4.2.&nbsp;Exploit the settings framework implementing credentials          store</h5></div></div><div></div></div><p>Propose extending or adapting the Heritrix settings framework          to have it manage our credentials store so we can exploit code          already written.</p></div></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="N10267"></a>3.3.5.&nbsp;Logging</h4></div></div><div></div></div><p>A new log will trace authentication transactions. Log will        include listing of credentials offered, new cookies, query parameters,        and pertinent HTTP headers returned by the submitted authentication,        and where possible, report on whether authentication succeeded or        not</p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="N1026C"></a>3.3.6.&nbsp;Debugging tool</h4></div></div><div></div></div><p>A command-line tool to run single logins to aid debugging logins        will aid development and be of use to operators.</p></div></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N10271"></a>4.&nbsp;Design</h2></div></div><div></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N10274"></a>4.1.&nbsp;Configuration</h3></div></div><div></div></div><p>Will add to the HTTP Fetcher options that enable, disable and      configuration of the two authentication types supported.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N10279"></a>4.2.&nbsp;Credential store</h3></div></div><div></div></div><p>Below is a static class model diagram for accessing the credential      store.</p><div class="mediaobject"><img src="credentials.gif"></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Implementation looks nothing like the above</h3><p>Ignore the above design. The implementation turned out to be        something else altogether. The model was effectively inverted        (credentials hold domains) and notions of going via a        CredentialManager/CredentialStore to do all operations on the store        were removed. While the resultant implementation is not a good OOM,        its amenable to UI manipulation (and sits easily atop the heritrix        settings system).</p></div></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="N10287"></a>5.&nbsp;Future</h2></div></div><div></div></div><p>This section has issues to be addressed later, probably in a version    2.0 of the authentication system.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N1028C"></a>5.1.&nbsp;Same URL different Page Content</h3></div></div><div></div></div><p>Heritrix distingushes pages by URIs. Pages seen can be different      whether logged in or not. We'll need some way to force/suggest sets of      URIs are revisitable after a login token is received. This might mean      the 'fingerprint' of a URI includes any authentication information to be      used.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N10291"></a>5.2.&nbsp;Integration with the UI</h3></div></div><div></div></div><p>Add/Edit/Delete of Credentials via the UI. Flagging the operator      about 401s and likely html login forms.</p></div></div><div class="bibliography" id="N10296"><div class="titlepage"><div><div><h2 class="title"><a name="N10296"></a>Bibliography</h2></div></div><div></div></div><div class="biblioentry"><a name="heritrix"></a><p>[<span class="abbrev">heritrix</span>] <span class="title"><i><a href="http://crawler.archive.org" target="_top">Heritrix is the Internet      Archive's open-source, extensible, web-scale, archival-quality web      crawler project.</a></i>. </span></p></div><div class="biblioentry"><a name="httpclient"></a><p>[<span class="abbrev">httpclient</span>] <span class="title"><i>Apache Jakarta Commons HTTPClient <a href="http://jakarta.apache.org/commons/httpclient/authentication.html" target="_top">Authentication      Guide</a></i>. </span><span class="edition">Commons HTTPClient version 2.0.. </span></p></div><div class="biblioentry"><a name="jaas"></a><p>[<span class="abbrev">jaas</span>] <span class="title"><i><a href="http://java.sun.com/products/jaas/index.jsp" target="_top">Java      Authentication and Authorization Service (JAAS)</a></i>. </span></p></div><div class="biblioentry"><a name="ntlm"></a><p>[<span class="abbrev">ntlm</span>] <span class="title"><i>The <a href="http://davenport.sourceforge.net/ntlm.html" target="_top">NTLM      Authentication Protocol</a></i>. </span></p></div><div class="biblioentry"><a name="rfc2617"></a><p>[rfc2617] <span class="title"><i>RFC2617 <a href="http://ftp.ics.uci.edu/pub/ietf/http/rfc2617.txt" target="_top">HTTP      Authentication: Basic and Digest Access Authentication</a></i>. </span></p></div></div></div></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -