⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 todo

📁 100 病毒源碼,原始碼,無毒 ......
💻
字号:
TODO:Get someone to make spiffy graphics and a logo for the web page. :-)Do whatever else is needed to make it usable as an apache logging coprocess?Profiling=========From gprof, significant time is being spent in:5.7%		adns__timeouts()3.8%		find() (mostly less(), thus strcmp())2.3%		sendto()1.0%		read_ipaddr()1.0%		domptr(01.0%		memchr() in fgetln()Cache efficiency================On an arbitrarily chosen day, a reports system reporting on 81different web servers got a 64% DB file cache hit rate.  That numberis higher than might be expected.  The common use of web proxies likeAOL's may be responsible for the high success rate, though that's justa guess.It would be helpful to do a histogram of the age of cache entries thatwere used, to help determine the optimum cache entry lifetime forexpire-ip-db.  dns-terror would need more instrumentation (probably #ifdef'd)to track that.  Is it a normal distribution, or fairly flat with no dropoff?Parallelism===========dns-terror -oz and analog each fully utilize a single CPU; rcp doesnot use much CPU.  How to take advantage of N CPUs?The process of generating a single report is not easy to parallelize,without incurring enough coordination and system call overhead topossibly counteract the speed gain.  If multiple reports are beinggenerated, there could be several processes reading from a work queue.This could be approached in two main ways.  One way would be to haveeach reader process go through the whole cycle of programs to producea report: rcp, dns-terror, optionally getdominfo, analog.  dns-terrorand getdominfo would use file locking (flock, man DB_File) on the DB filesto prevent corruption from multiple simultaneous writers.  There couldbe more reader processes than CPUs, to try to get more CPU utilizationwhen several rcp's happen to be occurring simultaneously, byincreasing the chance that there will be another process that isrunning dns-terror or analog.  There would be some redundant fetchingof external data if multiple dns-terror or getdominfo processes ransimultaneously.  Probably the increase in parallelism would more thanoffset that time waste.The other way would be to have each process do only one stage ofreport generation.  There might be a first-stage control process thatgenerates the initial queue of log files to rcp, from a database.  Oneprocess loops rcp'ing log files.  When it finishes each one, it addsit to a queue that is read by another process that runs dns-terror oneach log file.  When it finishes one, it adds that log file name to aqueue that is read by a process that repeatedly runs analog.  Withthis approach, the queues could even be text files that are writtenwith line buffering, since each queue will have only one reader, soread buffering won't be a problem.But this approach doesn't parallelize beyond 2-4 CPUs.  A refinementof this second approach is to have multiple processes at each stage,with DB file locking, and making sure that the queue items are onlywritten and read at record boundaries.  There would be the sameinefficiency as with the first approach regarding some redundantfetching of data by multiple dns-terror and getdominfo processes.Having multiple processes reading and/or writing log filessimultaneously shouldn't be a problem, as the I/O bandwidth ofan ultra-wide SCSI RAID can handle that.Another issue is how and when gzip is run.  A log file can be zippedbefore being fetched from the remote machine, or at the start, middle,or end of the report generation process, or the log file can bediscarded without ever zipping it (though we wouldn't do that).Zipping it last, for archiving, minimizes the amount of CPU time spentzipping and unzipping, at the cost of more disk I/O.  If a log file isnot zipped on the remote machine prior to the rcp, it could betransferred with rsync -z or scp -C to reduce transfer times.However, that is effectively gzipping the file on the remote machine,then unzipping it and rezipping it locally.  If rsync or scp had anoption to gzip the file on the fly and keep it in gzipped form on thedestination machine, that might be desirable.  It would be nicebecause the time when the log files are ready to be fetched would bethe same, without having to account for gzipping time that depends onthe log file sizes.On multiprocessor machines, running gzip in a pipe as a separateprocess might be a win over using zlib, unless the system calloverhead outweighs the gain of multiprocessing.  If we are using thepipelining approaches outlined above, we might want to use zlibanyway.Possible ways to parallelize dns-terror:Parallelize and partition the work with fork() to increase CPUutilization from 25-50% to closer to 100%, so we can be doingprocessing while waiting for I/O (e.g., a cache lookup).  Or, moresimply, we could start several copies at once, processing differentlog files.  But they would have to either use different DB files (andhence duplicate effort) or else use DB syncing and locking.Here's a parallel design to consider:For each N-line chunk of logs (could be the whole file):parent stores an in-core map (key=ipaddr,value=exists) to make a listof the distinct IP addresses it has read.  When it's done N (or all)lines, it hands off 1/C of them to each child that it's forked, viaperhaps shared memory or Unix domain sockets, and signals.  Thechildren resolve them by looking up whether they're resolved in theon-disk DB file.  They write to either that file, by locking it, ortheir own DB file, which are all combined by the parent at the end.Or they could just append the results to a stack or socket in memory,and the parent writes them out to the DB file.  But remember, most ofthem we've already seen, and never go to DNS for.  So those lookups areprobably what we need to parallelize the most.Here's another parallelizing idea:Split the work into N buckets, each handled by one process, accordingto the last octet of the IP address.  Either the modulus or theremainder should work, I think.  For -o, getting the lines output inorder would require some sort of coordination--shared memory orsemaphores, perhaps.Are TCP DNS connections faster than UDP?========================================It's hard to know:Try adns_qf_usevc (TCP) in the query flags.  Unfortunately, after afew dozen queries, I get this:adns warning: TCP connection lost: read: Connection reset by peer (NS=127.0.0.1)And then nothing happens....Sample IIS 4 log================#Software: Microsoft Internet Information Server 4.0#Version: 1.0#Date: 1999-08-16 00:02:07#Fields: date time c-ip cs-username s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-win32-status sc-bytes cs-bytes time-taken s-port cs-version cs(User-Agent) cs(Cookie) cs(Referer)1999-08-16 00:02:07 208.206.40.191 - W3SVC3 FLEXNET17 208.192.104.93 HEAD /default.htm - 200 0 280 19 0 80 HTTP/1.0 - - -1999-08-16 00:07:06 208.206.40.191 - W3SVC3 FLEXNET17 208.192.104.93 HEAD /default.htm - 200 0 280 19 0 80 HTTP/1.0 - - -

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -