⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 heritrix.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
📖 第 1 页 / 共 5 页
字号:
<a name="588" href="#588">588</a>         String selfTestName = <strong>null</strong>;<a name="589" href="#589">589</a>         <a href="../../../org/archive/crawler/CommandLineParser.html">CommandLineParser</a> clp = <strong>new</strong> <a href="../../../org/archive/crawler/CommandLineParser.html">CommandLineParser</a>(args, Heritrix.out,<a name="590" href="#590">590</a>             Heritrix.getVersion());<a name="591" href="#591">591</a>         List arguments = clp.getCommandLineArguments();<a name="592" href="#592">592</a>         Option [] options = clp.getCommandLineOptions();<a name="593" href="#593">593</a> <a name="594" href="#594">594</a>         <em class="comment">// Check passed argument.  Only one argument, the ORDER_FILE is allowed.</em><a name="595" href="#595">595</a>         <em class="comment">// If one argument, make sure exists and xml suffix.</em><a name="596" href="#596">596</a>         <strong>if</strong> (arguments.size() > 1) {<a name="597" href="#597">597</a>             clp.usage(1);<a name="598" href="#598">598</a>         } <strong>else</strong> <strong>if</strong> (arguments.size() == 1) {<a name="599" href="#599">599</a>             crawlOrderFile = (String)arguments.get(0);<a name="600" href="#600">600</a>             <strong>if</strong> (!(<strong>new</strong> File(crawlOrderFile).exists())) {<a name="601" href="#601">601</a>                 clp.usage(<span class="string">"ORDER.XML &lt;"</span> + crawlOrderFile +<a name="602" href="#602">602</a>                     <span class="string">"> specified does not exist."</span>, 1);<a name="603" href="#603">603</a>             }<a name="604" href="#604">604</a>             <em class="comment">// Must end with '.xml'</em><a name="605" href="#605">605</a>             <strong>if</strong> (crawlOrderFile.length() > 4 &amp;&amp;<a name="606" href="#606">606</a>                     !crawlOrderFile.substring(crawlOrderFile.length() - 4).<a name="607" href="#607">607</a>                         equalsIgnoreCase(<span class="string">".xml"</span>)) {<a name="608" href="#608">608</a>                 clp.usage(<span class="string">"ORDER.XML &lt;"</span> + crawlOrderFile +<a name="609" href="#609">609</a>                     <span class="string">"> does not have required '.xml' suffix."</span>, 1);<a name="610" href="#610">610</a>             }<a name="611" href="#611">611</a>         }<a name="612" href="#612">612</a> <a name="613" href="#613">613</a>         <em class="comment">// Now look at options passed.</em><a name="614" href="#614">614</a>         <strong>for</strong> (<strong>int</strong> i = 0; i &lt; options.length; i++) {<a name="615" href="#615">615</a>             <strong>switch</strong>(options[i].getId()) {<a name="616" href="#616">616</a>                 <strong>case</strong> 'h':<a name="617" href="#617">617</a>                     clp.usage();<a name="618" href="#618">618</a>                     <strong>break</strong>;<a name="619" href="#619">619</a> <a name="620" href="#620">620</a>                 <strong>case</strong> 'a':<a name="621" href="#621">621</a>                     adminLoginPassword = options[i].getValue();<a name="622" href="#622">622</a>                     <strong>break</strong>;<a name="623" href="#623">623</a> <a name="624" href="#624">624</a>                 <strong>case</strong> 'n':<a name="625" href="#625">625</a>                     <strong>if</strong> (crawlOrderFile == <strong>null</strong>) {<a name="626" href="#626">626</a>                         clp.usage(<span class="string">"You must specify an ORDER_FILE with"</span> +<a name="627" href="#627">627</a>                             <span class="string">" '--nowui' option."</span>, 1);<a name="628" href="#628">628</a>                     }<a name="629" href="#629">629</a>                     Heritrix.gui = false;<a name="630" href="#630">630</a>                     <strong>break</strong>;<a name="631" href="#631">631</a>                 <a name="632" href="#632">632</a>                 <strong>case</strong> 'b':<a name="633" href="#633">633</a>                     Heritrix.guiHosts = parseHosts(options[i].getValue());<a name="634" href="#634">634</a>                     <strong>break</strong>;<a name="635" href="#635">635</a> <a name="636" href="#636">636</a>                 <strong>case</strong> 'p':<a name="637" href="#637">637</a>                     <strong>try</strong> {<a name="638" href="#638">638</a>                         Heritrix.guiPort =<a name="639" href="#639">639</a>                             Integer.parseInt(options[i].getValue());<a name="640" href="#640">640</a>                     } <strong>catch</strong> (NumberFormatException e) {<a name="641" href="#641">641</a>                         clp.usage(<span class="string">"Failed parse of port number: "</span> +<a name="642" href="#642">642</a>                             options[i].getValue(), 1);<a name="643" href="#643">643</a>                     }<a name="644" href="#644">644</a>                     <strong>if</strong> (Heritrix.guiPort &lt;= 0) {<a name="645" href="#645">645</a>                         clp.usage(<span class="string">"Nonsensical port number: "</span> +<a name="646" href="#646">646</a>                             options[i].getValue(), 1);<a name="647" href="#647">647</a>                     }<a name="648" href="#648">648</a>                     <strong>break</strong>;<a name="649" href="#649">649</a> <a name="650" href="#650">650</a>                 <strong>case</strong> 'r':<a name="651" href="#651">651</a>                     runMode = <strong>true</strong>;<a name="652" href="#652">652</a>                     <strong>break</strong>;<a name="653" href="#653">653</a> <a name="654" href="#654">654</a>                 <strong>case</strong> 's':<a name="655" href="#655">655</a>                     selfTestName = options[i].getValue();<a name="656" href="#656">656</a>                     selfTest = <strong>true</strong>;<a name="657" href="#657">657</a>                     <strong>break</strong>;<a name="658" href="#658">658</a> <a name="659" href="#659">659</a>                 <strong>default</strong>:<a name="660" href="#660">660</a>                     assert false: options[i].getId();<a name="661" href="#661">661</a>             }<a name="662" href="#662">662</a>         }<a name="663" href="#663">663</a> <a name="664" href="#664">664</a>         <em class="comment">// Ok, we should now have everything to launch the program.</em><a name="665" href="#665">665</a>         String status = <strong>null</strong>;<a name="666" href="#666">666</a>         <strong>if</strong> (selfTest) {<a name="667" href="#667">667</a>             <em class="comment">// If more than just '--selftest' and '--port' passed, then</em><a name="668" href="#668">668</a>             <em class="comment">// there is confusion on what is being asked of us.  Print usage</em><a name="669" href="#669">669</a>             <em class="comment">// rather than proceed.</em><a name="670" href="#670">670</a>             <strong>for</strong> (<strong>int</strong> i = 0; i &lt; options.length; i++) {<a name="671" href="#671">671</a>                 <strong>if</strong> (options[i].getId() != 'p' &amp;&amp; options[i].getId() != 's') {<a name="672" href="#672">672</a>                     clp.usage(1);<a name="673" href="#673">673</a>                 }<a name="674" href="#674">674</a>             }<a name="675" href="#675">675</a> <a name="676" href="#676">676</a>             <strong>if</strong> (arguments.size() > 0) {<a name="677" href="#677">677</a>                 <em class="comment">// No arguments accepted by selftest.</em><a name="678" href="#678">678</a>                 clp.usage(1);<a name="679" href="#679">679</a>             }<a name="680" href="#680">680</a>             status = selftest(selfTestName, Heritrix.guiPort);<a name="681" href="#681">681</a>         } <strong>else</strong> {<a name="682" href="#682">682</a> 			<strong>if</strong> (!isValidLoginPasswordString(adminLoginPassword)) {<a name="683" href="#683">683</a> 				clp.usage(<span class="string">"Invalid admin login:password value, or none "</span><a name="684" href="#684">684</a> 						+ <span class="string">"specified. "</span>, 1);<a name="685" href="#685">685</a> 			}<a name="686" href="#686">686</a> 			<a name="687" href="#687">687</a> 			<strong>if</strong> (!Heritrix.gui) {<a name="688" href="#688">688</a> 				<strong>if</strong> (options.length > 1) {<a name="689" href="#689">689</a> 					<em class="comment">// If more than just '--nowui' passed, then there is</em><a name="690" href="#690">690</a> 					<em class="comment">// confusion on what is being asked of us. Print usage</em><a name="691" href="#691">691</a> 					<em class="comment">// rather than proceed.</em><a name="692" href="#692">692</a> 					clp.usage(1);<a name="693" href="#693">693</a> 				}<a name="694" href="#694">694</a> 				Heritrix h = <strong>new</strong> <a href="../../../org/archive/crawler/Heritrix.html">Heritrix</a>(<strong>true</strong>);<a name="695" href="#695">695</a> 				status = h.doOneCrawl(crawlOrderFile);<a name="696" href="#696">696</a> 			} <strong>else</strong> {<a name="697" href="#697">697</a> 				status = startEmbeddedWebserver(<a name="698" href="#698">698</a>                         Heritrix.guiHosts, Heritrix.guiPort,<a name="699" href="#699">699</a> 						adminLoginPassword);<a name="700" href="#700">700</a> 				Heritrix h = <strong>new</strong> <a href="../../../org/archive/crawler/Heritrix.html">Heritrix</a>(<strong>true</strong>);<a name="701" href="#701">701</a> <a name="702" href="#702">702</a> 				String tmp = h.launch(crawlOrderFile, runMode);<a name="703" href="#703">703</a> 				<strong>if</strong> (tmp != <strong>null</strong>) {<a name="704" href="#704">704</a> 					status += ('\n' + tmp);<a name="705" href="#705">705</a> 				}<a name="706" href="#706">706</a> 			}<a name="707" href="#707">707</a> 		}<a name="708" href="#708">708</a>         <strong>return</strong> status;<a name="709" href="#709">709</a>     }<a name="710" href="#710">710</a>     <a name="711" href="#711">711</a>     <em>/**<em>*</em></em><a name="712" href="#712">712</a> <em>	 * @return The file we dump stdout and stderr into.</em><a name="713" href="#713">713</a> <em>	 */</em><a name="714" href="#714">714</a>     <strong>public</strong> <strong>static</strong> String getHeritrixOut() {<a name="715" href="#715">715</a>         String tmp = System.getProperty(<span class="string">"heritrix.out"</span>);<a name="716" href="#716">716</a>         <strong>if</strong> (tmp == <strong>null</strong> || tmp.length() == 0) {<a name="717" href="#717">717</a>             tmp = Heritrix.DEFAULT_HERITRIX_OUT;<a name="718" href="#718">718</a>         }<a name="719" href="#719">719</a>         <strong>return</strong> tmp;<a name="720" href="#720">720</a>     }<a name="721" href="#721">721</a> <a name="722" href="#722">722</a>     <em>/**<em>*</em></em><a name="723" href="#723">723</a> <em>     * Exploit &lt;code>-Dheritrix.home&lt;/code> if available to us.</em><a name="724" href="#724">724</a> <em>     * Is current working dir if no heritrix.home property supplied.</em><a name="725" href="#725">725</a> <em>     * @return Heritrix home directory.</em><a name="726" href="#726">726</a> <em>     * @throws IOException</em><a name="727" href="#727">727</a> <em>     */</em><a name="728" href="#728">728</a>     <strong>protected</strong> <strong>static</strong> File getHeritrixHome()<a name="729" href="#729">729</a>     throws IOException {<a name="730" href="#730">730</a>         File heritrixHome = <strong>null</strong>;<a name="731" href="#731">731</a>         String home = System.getProperty(<span class="string">"heritrix.home"</span>);<a name="732" href="#732">732</a>         <strong>if</strong> (home != <strong>null</strong> &amp;&amp

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -