readme

来自「c语言实现的web http报文分析」· 代码 · 共 1,242 行 · 第 1/5 页
TXT
1,242 行
The Webalizer - A web server log file analysis toolCopyright 1997-2000 by Bradford L. Barrett (brad@mrunix.net)Distributed under the GNU GPL.  See the files "COPYING" and"Copyright" supplied with the distribution for additional info.What is The Webalizer?----------------------The Webalizer is a web server log file analysis program which producesusage statistics in HTML format for viewing with a browser.  The resultsare presented in both columnar and graphical format, which facilitatesinterpretation.  Yearly, monthly, daily and hourly usage statistics arepresented, along with the ability to display usage by site, URL, referrer,user agent (browser), search string, entry/exit page, username and country(some information is only available if supported and present in the logfiles being processed).  Processed data may also be exported into mostdatabase and spreadsheet programs that support tab delimited data formats.The Webalizer supports CLF (common log format) log files, as well asCombined log formats as defined by NCSA and others, and variationsof these which it attempts to handle intelligently.  In addition, wu-ftpdxferlog formatted logs and squid proxy logs are supported.Gzip compressed logs may now be used as input directly.  Any log filenamethat ends with a '.gz' extension will be assumed to be in gzip format anduncompressed on the fly as it is being read.  In addition, the Webalizeralso supports DNS lookup capabilities if enabled at compile time.  Seethe file DNS.README for additional information.This documentation applies to The Webalizer Version 2.01Running the Webalizer---------------------The Webalizer was designed to be run from a Unix command line prompt oras a cron job.  There are several command line options which will modifythe results it produces, and configuration files can be used as well.The format of the command line is:webalizer [options ...] [log-file]Where 'options' can be one or more of the supported command lineswitches described below.  'log-file' is the name of the log fileto process (see below for more detailed information).  If a dash("-") is specified for the log-file name, STDIN will be used.Once executed, the general flow of the program follows:o A default configuration file is scanned for.  A file named  'webalizer.conf' is searched for in the current directory, and if  found, it's configuration data is parsed.  If the file is not  present in the current directory,  the file '/etc/webalizer.conf'  is searched for and, if found, is used instead.o Any command line arguments given to the program are parsed.  This  may include the specification of a configuration file, which is  processed at the time it is encountered.o If a log file was specified, it is opened and made ready for  processing.  If no log file was given, or the filename '-' is  specified on the command line, STDIN is used for input.o If an output directory was specified, the program does a 'chdir' to  that directory in preparation for generating output.  If no output  directory was given, the current directory is used.o If a non-zero number of DNS Children processes were specified, they  will be started, and the specified log file will be processed,  either creating or updateing the specified DNS cache file.o If no hostname was given, the program attempts to get the hostname  using a uname system call.  If that fails, 'localhost' is used.o A history file is searched for.  This file keeps previous month  totals used on the main index.html page.  The default file is  named 'webalizer.hist', kept in the specified output directory,  however may be changed using the "HistoryName" configuration file  keyword.o If incremental processing was specified, a data file is searched for  and loaded if found, containing the 'internal state' data of the  program at the end of a previous run.  The default file is named  'webalizer.current', kept in the specified output directory, however  may be changed using the "IncrementalName" configuration file keyword.o Main processing begins on the log file.  If the log spans multiple  months, a separate HTML document is created for each month.o After main processing, the main 'index.html' page is created, which  has totals by month and links to each months HTML document.o A new history file is saved to disk, which includes totals generated  by The Webalizer during the current run.o If incremental processing was specified, a data file is written that  contains the 'internal state' data at the end of this run.Incremental Processing----------------------Version 1.2x of The Webalizer adds incremental run capability.  Simplyput, this allows processing large log files by breaking them up intosmaller pieces, and processing these pieces instead.  What this meansin real terms is that you can now rotate your log files as often as youwant, and still be able to produce monthly usage statistics without theloss of any detail.  This is accomplished by saving and restoring allrelevant internal data to a disk file between runs.  Doing so allows theprogram to 'start where it left off' so to speak, and allows thepreservation of detail from one run to the next.Some special precautions need to be taken when using the incrementalrun capability of The Webalizer.  Configuration options should not bechanged between runs, as that could cause corruption of the internalstored data.  For example, changing the MangleAgents level will causedifferent representations of user agents to be stored, producing invalidresults in the user agents section of the report.  If you need to changeconfiguration options, do it at the end of the month after normalprocessing of the previous month and before processing the current month.You may also want to delete the 'webalizer.current' file as well (orwhatever name was specified using the "IncrementalName" configurationoption).The Webalizer also attempts to prevent data duplication by keepingtrack of the timestamp of the last record processed.  This timestampis then compared to current records being processed, and any recordsthat were logged previous to that timestamp are ignored.  This, intheory, should allow you to re-process logs that have already beenprocessed, or process logs that contain a mix of processed/not yetprocessed records, and not produce duplication of statistics.  Theonly time this may break is if you have duplicate timestamps in twoseparate log files... any records in the second log file that do havethe same timestamp as the last record in the previous log file processed,will be discarded as if they had already been processed.  There arelots of ways to prevent this however, for example, stopping the webserver before rotating logs will prevent this situation.  This setupalso necessitates that you always process logs in chronological order,otherwise data loss will occur as a result of the timestamp compare.Output Produced---------------The Webalizer produces several reports (html) and graphics for eachmonth processed.  In addition, a summary page is generated for thecurrent and previous months (up to 12), a history file is createdand if incremental mode is used, the current month's processed data.The exact location and names of these files can be changed usingconfiguration files and command line options.  The files produced,(default names) are:index.html              - Main summary page (extension may be changed)usage.png               - Yearly graph displayed on the main index pageusage_YYYYMM.html       - Monthly summary page (extension may be changed)usage_YYYYMM.png        - Monthly usage graph for specified month/yeardaily_usage_YYYYMM.png  - Daily usage graph for specified month/yearhourly_usage_YYYYMM.png - Hourly usage graph for specified month/yearsite_YYYYMM.html        - All sites listing (if enabled)url_YYYYMM.html         - All urls listing (if enabled)ref_YYYYMM.html         - All referrers listing (if enabled)agent_YYYYMM.html       - All user agents listing (if enabled)search_YYYYMM.html      - All search strings listing (if enabled)webalizer.hist          - Previous month history (may be changed)webalizer.current       - Incremental Data (may be changed)site_YYYYMM.tab         - tab delimited sites fileurl_YYYYMM.tab          - tab delimited urls fileref_YYYYMM.tab          - tab delimited referrers fileagent_YYYYMM.tab        - tab delimited user agents fileuser_YYYYMM.tab         - tab delimited usernames filesearch_YYYYMM.tab       - tab delimited search string fileThe yearly (index) report shows statistics for a 12 month period, andlinks to each month.  The monthly report has detailed statistics forthat month with additional links to any URL's and referrers found.The various totals shown are explained below.Hits  Any request made to the server which is logged, is considered a 'hit'.The requests can be for anything... html pages, graphic images, audiofiles, CGI scripts, etc...  Each valid line in the server log iscounted as a hit.  This number represents the total number of requeststhat were made to the server during the specified report period.Files  Some requests made to the server, require that the server then sendsomething back to the requesting client, such as a html page or graphicimage.  When this happens, it is considered a 'file' and the filestotal is incremented.  The relationship between 'hits' and 'files' canbe thought of as 'incoming requests' and 'outgoing responses'.Pages  Pages are, well, pages!  Generally, any HTML document, or anythingthat generates an HTML document, would be considered a page.  Thisdoes not include the other stuff that goes into a document, such asgraphic images, audio clips, etc...  This number represents the numberof 'pages' requested only, and does not include the other 'stuff' thatis in the page.  What actually constitutes a 'page' can vary fromserver to server.  The default action is to treat anything with theextension '.htm', '.html' or '.cgi' as a page.  A lot of sites willprobably define other extensions, such as '.phtml', '.php3' and '.pl'as pages as well.  Some people consider this number as the number of'pure' hits... I'm not sure if I totally agree with that viewpoint.Some other programs (and people :) refer to this as 'Pageviews'.Sites  Each request made to the server comes from a unique 'site', which canbe referenced by a name or ultimately, an IP address.  The 'sites'number shows how many unique IP addresses made requests to the serverduring the reporting time period.  This DOES NOT mean the number ofunique individual users (real people) that visited, which is impossibleto determine using just logs and the HTTP protocol (however, thisnumber might be about as close as you will get).Visits  Whenever a request is made to the server from a given IP address(site), the amount of time since a previous request by the addressis calculated (if any).  If the time difference is greater than apre-configured 'visit timeout' value (or has never made a request before),it is considered a 'new visit', and this total is incremented (bothfor the site, and the IP address).  The default timeout value is 30minutes (can be changed), so if a user visits your site at 1:00 inthe afternoon, and then returns at 3:00, two visits would be registered.Note: in the 'Top Sites' table, the visits total should be discountedon 'Grouped' records, and thought of as the "Minimum number of visits"that came from that grouping instead.  Note: Visits only occur onPageType requests, that is, for any request whose URL is one of the'page' types defined with the PageType option.  Due to the limitationof the HTTP protocol, log rotations and other factors, this numbershould not be taken as absolutely accurate,  rather, it should beconsidered a pretty close "guess".KBytes  The KBytes (kilobytes) value shows the amount of data, in KB, thatwas sent out by the server during the specified reporting period.  Thisvalue is generated directly from the log file, so it is up to theweb server to produce accurate numbers in the logs  (some web serversdo stupid things when it comes to reporting the number of bytes).  Ingeneral, this should be a fairly accurate representation of the amountof outgoing traffic the server had, regardless of the web serversreporting quirks.
readme - 源码说明

本页面展示了「c语言实现的web http报文分析」中的 readme 源码文件，采用编程语言编写，共 1,242 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与http相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?