⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 checklinks.java

📁 1、锁定某个主题抓取; 2、能够产生日志文本文件
💻 JAVA
字号:
import java.awt.*;
import javax.swing.*;
import java.net.*;
import java.io.*;

/**
  *main class with GUI
  */
public class CheckLinks extends javax.swing.JFrame implements
             Runnable,ISpiderReportable {

  /**
   * 
   */
  public CheckLinks()
  {
    //{{初始化界面
    //主窗体设置
    setTitle("网页爬行");
    getContentPane().setLayout(null);
    setSize(805,320);
    setVisible(false);
    setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
    
    //URL输入框
    label1.setText("输入起始URL:");
    getContentPane().add(label1);
    label1.setBounds(12,12,84,12);
    url.setText("http://www.scut.edu.cn/");
    getContentPane().add(url);
    url.setBounds(12,36,288,24);
    
    //主题词输入框
    labelKeyWord.setText("输入主题训练站点:(多个URL以  ;  隔开)");
    getContentPane().add(labelKeyWord);
    labelKeyWord.setBounds(300,12,840,12);
    getContentPane().add(key1);
    key1.setBounds(420,36,300,240);
    key1.setText("http://money.163.com/07/0531/04/3FPUCI8R002524SK.html;http://money.163.com/07/0530/17/3FOQHGN300251OGL.html;http://money.163.com/07/0531/03/3FPQLA7H002524SO.html;http://money.163.com/07/0531/06/3FQ5PBOI002524SJ.html;http://money.163.com/07/0531/06/3FQ5PBOI002524SJ.html;http://money.163.com/07/0530/23/3FPDKOI2002524SS.html;");
    key1.setLineWrap(true);
    
    //开始按钮
    begin.setText("开始");
    begin.setActionCommand("Begin");
    getContentPane().add(begin);
    begin.setBounds(270,280,84,24);
    
    //友好爬行选项
    getContentPane().add(optionCheckRobots);
    optionCheckRobots.setBounds(28,280,110,24);
    getContentPane().add(optionCheckMetaTag);
    optionCheckMetaTag.setBounds(140,280,110,24);    
    
    //错误连接滚动窗口
    errorScroll.setAutoscrolls(true);
    errorScroll.setHorizontalScrollBarPolicy(javax.swing.
                ScrollPaneConstants.HORIZONTAL_SCROLLBAR_ALWAYS);
    errorScroll.setVerticalScrollBarPolicy(javax.swing.
                ScrollPaneConstants.VERTICAL_SCROLLBAR_ALWAYS);
    errorScroll.setOpaque(true);
    getContentPane().add(errorScroll);
    errorScroll.setBounds(12,120,384,156);
    errors.setEditable(false);
    errorScroll.getViewport().add(errors);
    errors.setBounds(0,0,366,138);
    
    //显示当前链接标签
    current.setText("当前处理: ");
    getContentPane().add(current);
    current.setBounds(12,72,500,12);
    
    //取得连接数显示
    goodLinksLabel.setText("取得链接: 0");
    getContentPane().add(goodLinksLabel);
    goodLinksLabel.setBounds(12,96,192,12);
    
    //死链接数显示
    badLinksLabel.setText("不良链接: 0");
    getContentPane().add(badLinksLabel);
    badLinksLabel.setBounds(216,96,96,12);
    //}}

    //{{初始化界面
    //}}

    //{{注册监听器
    SymAction lSymAction = new SymAction();
    begin.addActionListener(lSymAction);
    //}}
  }

  /**
   * 主方法
   */
  static public void main(String args[])
  {
    (new CheckLinks()).setVisible(true);
  }

  /**
   * 
   */
  public void addNotify()
  {
    // Record the size of the window prior to calling parent's
    // addNotify.
    Dimension size = getSize();

    super.addNotify();

    if ( frameSizeAdjusted )
      return;
    frameSizeAdjusted = true;

    // Adjust size of frame according to the insets and menu bar
    Insets insets = getInsets();
    javax.swing.JMenuBar menuBar = getRootPane().getJMenuBar();
    int menuBarHeight = 0;
    if ( menuBar != null )
      menuBarHeight = menuBar.getPreferredSize().height;
    setSize(insets.left + insets.right + size.width, insets.top +
                          insets.bottom + size.height + 
                          menuBarHeight);
  }

  // Used by addNotify
  boolean frameSizeAdjusted = false;

  //{{DECLARE_CONTROLS
  javax.swing.JLabel label1 = new javax.swing.JLabel();

  /**
   * The begin or cancel button
   */
  javax.swing.JButton begin = new javax.swing.JButton();

  /**
   * The URL being processed
   */
  javax.swing.JTextField url = new javax.swing.JTextField();
  
  /**
   * The Key word query
   **/
   javax.swing.JTextArea key1 = new javax.swing.JTextArea();
   javax.swing.JLabel labelKeyWord = new javax.swing.JLabel();
   
   /**
    *The Option CheckBox
    **/
    javax.swing.JLabel optionCheckBoxLabel = new javax.swing.JLabel();
    javax.swing.JCheckBox optionCheckRobots = new javax.swing.JCheckBox("检查robots.txt",true);
    javax.swing.JCheckBox optionCheckMetaTag = new javax.swing.JCheckBox("检查meta tag",true);
    
  /**
   * Scroll the errors.
   */
  javax.swing.JScrollPane errorScroll =
        new javax.swing.JScrollPane();

  /**
   * A place to store the errors created
   */
  javax.swing.JTextArea errors = new javax.swing.JTextArea();
  javax.swing.JLabel current = new javax.swing.JLabel();
  javax.swing.JLabel goodLinksLabel = new javax.swing.JLabel();
  javax.swing.JLabel badLinksLabel = new javax.swing.JLabel();
  //}}

  //{{DECLARE_MENUS
  //}}

  /**
   * The background spider thread
   */
  protected Thread backgroundThread;

  /**
   * The spider object being used
   */
  protected Spider spider;

  /**
   * The URL that the spider began with
   */
  protected URL base;

  /**
   * How many bad links have been found
   */
  protected int badLinksCount = 0;

  /**
   * How many good links have been found
   */
  protected int goodLinksCount = 0; 


  /**
   * Internal class used to dispatch events
   * 
   */
  class SymAction implements java.awt.event.ActionListener {
    public void actionPerformed(java.awt.event.ActionEvent event)
    {
      Object object = event.getSource();
      if ( object == begin )
        begin_actionPerformed(event);
    }
  }

  /**
   * Called when the begin or cancel buttons are clicked
   * 
   * @param event The event associated with the button.
   */
  void begin_actionPerformed(java.awt.event.ActionEvent event)
  {
    if ( backgroundThread==null ) {
      begin.setLabel("停止");
      backgroundThread = new Thread(this);
      backgroundThread.start();
      goodLinksCount=0;
      badLinksCount=0;
    } else {
      spider.cancel();
    }

  }

  /**
   * Perform the background thread operation. This method
   * actually starts the background thread.
   */
  public void run()
  {
    try {
      errors.setText("");
      spider = new Spider(this);
      spider.clear();
      base = new URL(url.getText());
      
      spider.addURL(base);
      //spider.setKeyWord(key1.getText());
      spider.setCheckRobots(optionCheckRobots.isSelected());
      spider.setCheckMetaTag(optionCheckMetaTag.isSelected());
      spider.setTrainingUrl(key1.getText());
      spider.begin();
      Runnable doLater = new Runnable()
      {
        public void run()
        {
          begin.setText("开始");
        }
      };
      SwingUtilities.invokeLater(doLater);
      backgroundThread=null;

    } catch ( MalformedURLException e ) {
      UpdateErrors err = new UpdateErrors();
      err.msg = "无效地址.";
      SwingUtilities.invokeLater(err);

    }
  }

  /**
   * Called by the spider when a URL is found. It is here
   * that links are validated.
   * 
   * @param base The page that the link was found on.
   * @param url The actual link address.
   */
  public boolean spiderFoundURL(URL base,URL url)
  {
    UpdateCurrentStats cs = new UpdateCurrentStats();
    cs.msg = url.toString();
    SwingUtilities.invokeLater(cs);

    if ( !checkLink(url) ) {
      UpdateErrors err = new UpdateErrors();
      err.msg = "[错误超链接]"+url+"(on page " + base + ")\n";
      SwingUtilities.invokeLater(err);
      badLinksCount++;
      return false;
    }

    goodLinksCount++;
    //只读取与base 不同站点的网址(应该改成所有网址吗?)
    if ( false )//url.getHost().equalsIgnoreCase(base.getHost()) )
      return false;
    else
      return true;
  }

  /**
   * Called when a URL error is found
   * 
   * @param url The URL that resulted in an error.
   */
  public void spiderURLError(URL url)
  {
  /*
  badLinksCount++;
  UpdateErrors err = new UpdateErrors();
      err.msg = url+"(connection error)\n";
      */
  }

  /********@Author Kelven.JU********/
 public void spiderOutputPageScore(URL url, double score)
 	{
      UpdateErrors psc = new UpdateErrors();
      psc.msg = "[完成](得分"+score+") ->"+url+"(on page " + base + ")\n";
      SwingUtilities.invokeLater(psc);
 	}
 	
  /**
   * 打开链接判断是否良好
   **/
  protected boolean checkLink(URL url)
  {
    try {
      URLConnection connection = url.openConnection();
      connection.connect();
      return true;
    } catch ( IOException e ) {
      return false;
    }
  }

  /**
   * Called when the spider finds an e-mail address
   * 
   * @param email The email address the spider found.
   */
  public void spiderFoundEMail(String email)
  {
  }
  /**
   * Internal class used to update the error information
   * in a Thread-Safe way
   * 
   */

  class UpdateErrors implements Runnable {
    public String msg;
    public void run()
    {
      errors.append(""+msg);
    }
  }
  /**
   * Used to update the current status information
   * in a "Thread-Safe" way
   * 
   */

  class UpdateCurrentStats implements Runnable {
    public String msg;
    public void run()
    {
      current.setText("当前处理: " + msg );
      goodLinksLabel.setText("取得链接: " + goodLinksCount);
      badLinksLabel.setText("不良链接: " + badLinksCount);
    }
  }
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -