⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 parserutils.java

📁 html 解析处理代码
💻 JAVA
📖 第 1 页 / 共 4 页
字号:
     * <BR>you obtain a string &quot;Trim all spaces but not the ones inside the string&quot; as output (all the spaces inside the string are preserved).     * @param input The string in input.     * @param charsToBeRemoved The chars to be removed.     * @return The string as output.    */    public static String trimCharsBeginEnd (String input, String charsToBeRemoved)    { 	        String output = new String();        int begin=0;        int end=input.length()-1;        boolean charFound=false;        boolean ok=true;        for (int index=begin; (index<input.length()) && ok; index++)        {                            charFound=false;            for (int charsCount=0; charsCount<charsToBeRemoved.length(); charsCount++)                if (charsToBeRemoved.charAt(charsCount)==input.charAt(index))                    charFound=true;            if (!(charFound))            {                begin=index;                ok=false;            }        }        ok=true;        for (int index=end; (index>=0) && ok; index--)        {            charFound=false;            for (int charsCount=0; charsCount<charsToBeRemoved.length(); charsCount++)                if (charsToBeRemoved.charAt(charsCount)==input.charAt(index))                    charFound=true;            if (!(charFound))            {                end=index;                ok=false;            }        }        output=input.substring(begin,end+1);        return output;            }    /**     * Split the input string in a string array,     * considering the tags as delimiter for splitting.     * @see ParserUtils#splitTags (String input, String[] tags, boolean recursive, boolean insideTag).     */    public static String[] splitTags (String input, String[] tags)        throws ParserException, UnsupportedEncodingException    {        return splitTags (input, tags, true, true);    }        /**     * Split the input string in a string array,     * considering the tags as delimiter for splitting.     * <BR>For example if you call splitTags(&quot;Begin &lt;DIV&gt;&lt;DIV&gt;  +12.5 &lt;/DIV&gt;&lt;/DIV&gt; ALL OK&quot;, new String[] {&quot;DIV&quot;}),     * <BR>you obtain a string array {&quot;Begin &quot;, &quot; ALL OK&quot;} as output (splitted &lt;DIV&gt; tags and their content recursively).     * <BR>For example if you call splitTags(&quot;Begin &lt;DIV&gt;&lt;DIV&gt;  +12.5 &lt;/DIV&gt;&lt;/DIV&gt; ALL OK&quot;, new String[] {&quot;DIV&quot;}, false, false),     * <BR>you obtain a string array {&quot;Begin &quot;, &quot;&lt;DIV&gt;  +12.5 &lt;/DIV&gt;&quot;, &quot; ALL OK&quot;} as output (splitted &lt;DIV&gt; tags and not their content and no recursively).     * <BR>For example if you call splitTags(&quot;Begin &lt;DIV&gt;&lt;DIV&gt;  +12.5 &lt;/DIV&gt;&lt;/DIV&gt; ALL OK&quot;, new String[] {&quot;DIV&quot;}, true, false),     * <BR>you obtain a string array {&quot;Begin &quot;, &quot;  +12.5 &quot;, &quot; ALL OK&quot;} as output (splitted &lt;DIV&gt; tags and not their content recursively).     * <BR>For example if you call splitTags(&quot;Begin &lt;DIV&gt;&lt;DIV&gt;  +12.5 &lt;/DIV&gt;&lt;/DIV&gt; ALL OK&quot;, new String[] {&quot;DIV&quot;}, false, true),     * <BR>you obtain a string array {&quot;Begin &quot;, &quot; ALL OK&quot;} as output (splitted &lt;DIV&gt; tags and their content).     * @param input The string in input.     * @param tags The tags to be used as splitting delimiter.     * @param recursive Optional parameter (true if not present), if true delete all the tags recursively.     * @param insideTag Optional parameter (true if not present), if true delete also the content of the tags.     * @return The string array containing the strings delimited by tags.     */    public static String[] splitTags (String input, String[] tags, boolean recursive, boolean insideTag)        throws ParserException, UnsupportedEncodingException    { 	        ArrayList outputArrayList = new ArrayList();        int minCapacity = 0;        String output = new String();        String inputModified = new String(input);        String[] outputStr = new String[] {};                String dummyString = createDummyString (' ', input.length());                // loop inside the different tags to be trimmed        for (int i=0; i<tags.length; i++)        {                        // loop inside the tags of the same type            NodeList links = getLinks (inputModified, tags[i], recursive);            for (int j=0; j<links.size(); j++)            {                CompositeTag beginTag = (CompositeTag)links.elementAt(j);                Tag endTag = beginTag.getEndTag();                // positions of begin and end tags                int beginTagBegin = beginTag.getStartPosition ();                int endTagBegin = beginTag.getEndPosition ();                int beginTagEnd = endTag.getStartPosition ();                int endTagEnd = endTag.getEndPosition ();                if (insideTag)                {                    dummyString = modifyDummyString (new String(dummyString), beginTagBegin, endTagEnd);                }                else                {                    dummyString = modifyDummyString (new String(dummyString), beginTagBegin, endTagBegin);                    dummyString = modifyDummyString (new String(dummyString), beginTagEnd, endTagEnd);                }            }            for (int k=dummyString.indexOf(' '); (k<dummyString.length()) && (k!=-1);)            {                int kNew = dummyString.indexOf('*',k);                if (kNew!=-1)                {                    output = inputModified.substring(k,kNew);                    k = dummyString.indexOf(' ',kNew);                                        minCapacity++;                    outputArrayList.ensureCapacity(minCapacity);                    if (outputArrayList.add(output))                        output = new String();                    else                        minCapacity--;                }                else                {                    output = inputModified.substring(k,dummyString.length());                    k = kNew;                                        minCapacity++;                    outputArrayList.ensureCapacity(minCapacity);                    if (outputArrayList.add(output))                        output = new String();                    else                        minCapacity--;                }            }            StringBuffer outputStringBuffer = new StringBuffer();            outputArrayList.trimToSize();            Object[] outputObj = outputArrayList.toArray();            outputStr = new String[outputArrayList.size()];            for (int j=0; j<outputArrayList.size(); j++)            {                outputStr[j] = new String((String) outputObj[j]);                outputStringBuffer.append(outputStr[j]);            }            outputArrayList = new ArrayList();            inputModified = new String(outputStringBuffer.toString());            dummyString = createDummyString (' ', inputModified.length());        }                return outputStr;            }        /**     * Split the input string in a string array,     * considering the tags as delimiter for splitting.     * <BR>Use Class class as input parameter     * instead of tags[] string array.     * @see ParserUtils#splitTags (String input, String[] tags, boolean recursive, boolean insideTag).     */    public static String[] splitTags (String input, Class nodeType)        throws ParserException, UnsupportedEncodingException    {        return splitTags (input, new NodeClassFilter (nodeType), true, true);    }        /**     * Split the input string in a string array,     * considering the tags as delimiter for splitting.     * <BR>Use Class class as input parameter     * instead of tags[] string array.     * @see ParserUtils#splitTags (String input, String[] tags, boolean recursive, boolean insideTag).     */    public static String[] splitTags (String input, Class nodeType, boolean recursive, boolean insideTag)        throws ParserException, UnsupportedEncodingException    {        return splitTags (input, new NodeClassFilter (nodeType), recursive, insideTag);    } 	    /**     * Split the input string in a string array,     * considering the tags as delimiter for splitting.     * <BR>Use NodeFilter class as input parameter     * instead of tags[] string array.     * @see ParserUtils#splitTags (String input, String[] tags, boolean recursive, boolean insideTag).     */    public static String[] splitTags (String input, NodeFilter filter)        throws ParserException, UnsupportedEncodingException    {        return splitTags (input, filter, true, true);    }        /**     * Split the input string in a string array,     * considering the tags as delimiter for splitting.     * <BR>Use NodeFilter class as input parameter     * instead of tags[] string array.     * @see ParserUtils#splitTags (String input, String[] tags, boolean recursive, boolean insideTag).     */    public static String[] splitTags (String input, NodeFilter filter, boolean recursive, boolean insideTag)        throws ParserException, UnsupportedEncodingException    { 	        ArrayList outputArrayList = new ArrayList();        int minCapacity = 0;        String output = new String();                String dummyString = createDummyString (' ', input.length());        // loop inside the tags of the same type        NodeList links = getLinks (input, filter, recursive);        for (int j=0; j<links.size(); j++)        {            CompositeTag beginTag = (CompositeTag)links.elementAt(j);            Tag endTag = beginTag.getEndTag();            // positions of begin and end tags            int beginTagBegin = beginTag.getStartPosition ();            int endTagBegin = beginTag.getEndPosition ();            int beginTagEnd = endTag.getStartPosition ();            int endTagEnd = endTag.getEndPosition ();            if (insideTag)            {                dummyString = modifyDummyString (new String(dummyString), beginTagBegin, endTagEnd);            }            else            {                dummyString = modifyDummyString (new String(dummyString), beginTagBegin, endTagBegin);                dummyString = modifyDummyString (new String(dummyString), beginTagEnd, endTagEnd);            }        }        for (int k=dummyString.indexOf(' '); (k<dummyString.length()) && (k!=-1);)        {            int kNew = dummyString.indexOf('*',k);            if (kNew!=-1)            {                output = input.substring(k,kNew);                k = dummyString.indexOf(' ',kNew);                                    minCapacity++;                outputArrayList.ensureCapacity(minCapacity);                if (outputArrayList.add(output))                    output = new String();                else                    minCapacity--;            }            else            {                output = input.substring(k,dummyString.length());                k = kNew;                                    minCapacity++;                outputArrayList.ensureCapacity(minCapacity);                if (outputArrayList.add(output))                    output = new String();                else                    minCapacity--;            }                    }                outputArrayList.trimToSize();        Object[] outputObj = outputArrayList.toArray();        String[] outputStr = new String[outputArrayList.size()];        for (int i=0; i<outputArrayList.size(); i++)            outputStr[i] = new String((String) outputObj[i]);        return outputStr;            }    /**     * Trim the input string, removing all the tags in the input string.     * <BR>The method trims all the substrings included in the input string of the following type:     * &quot;&lt;XXX&gt;&quot;, where XXX could be a string of any type.     * <BR>If you set to true the inside parameter, the method deletes also the YYY string in the following input string:     * &quot;&lt;XXX&gt;YYY&lt;ZZZ&gt;&quot;, note that ZZZ is not necessary the closing tag of XXX.     * @param input The string in input.     * @param inside If true, it forces the method to delete also what is inside the tags.     * @return The string without tags.     */    public static String trimAllTags (String input, boolean inside)    { 	        StringBuffer output = new StringBuffer();        if (inside) {            if ((input.indexOf('<')==-1) || (input.lastIndexOf('>')==-1) || (input.lastIndexOf('>')<input.indexOf('<'))) {                output.append(input);            } else {                output.append(input.substring(0, input.indexOf('<')));                output.append(input.substring(input.lastIndexOf('>')+1, input.length()));            }        } else {            boolean write = true;            for (int index=0; index<input.length(); index++)            {                    if (input.charAt(index)=='<' && write)                    write = false;                if (write)                    output.append(input.charAt(index));                if (input.charAt(index)=='>' && (!write))                    write = true;            }        }        return output.toString();

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -