📄 prog-guide.sgml
字号:
<!doctype linuxdoc system><article><title>Squid v1.2 Programmers Guide</title><author>Duane Wessels, Squid Developers<abstract>Squid is a WWW Cache application developed by the National Laboratoryfor Applied Network Research and members of the Web Caching community.Squid is implemented as a single, non-blocking process based arounda BSD select() loop. This document describes the operation of the Squidsource code and is intended to be used by others who wish to customizeor improve it.</abstract><toc><!-- %%%% Chapter : INTRODUCTION %%%% --><sect>Introduction<P>The Squid source code has evolved more from empirical observation andtinkering, rather than a solid design process. It carries a legacy ofbeing ``touched'' by numerous individuals, each with somewhat differenttechniques and terminology. <P>Squid is a single-process proxy server. Every request is handled bythe main process, with the exception of FTP. However, Squid does notuse a ``threads package'' such has Pthreads. While this might be easier to code, it suffers from portability and performance problems.Instead Squid maintains data structures and state information foreach active request.<P>The code is often difficult to follow because there are no explicitstate variables for the active requests. Instead, thread executionprogresses as a sequence of ``callback functions'' which get executedwhen I/O is ready to occur, or some other event has happened. Asa callback function completes, it is responsible for registering thenext callback function for subsequent I/O.<P>Note there is only a pseudo-consistent naming scheme. In most cases functions are named like <tt/moduleFooBar()/. However, thereare also some functions named like <tt/module_foo_bar()/.<P>Note that the Squid source changes rapidly, and some parts of thisdocument may become out-of-date. If you find any inconsistencies, pleasefeel free to notify<url url="mailto:squid-dev@nlanr.net"name="the Squid Developers">.<sect1>Conventions<P>Function names and file names will be written in a courier font, suchas <tt/store.c/ and <tt/storeRegister()/. Data structures and theirmembers will be written in an italicized font, such as <em/StoreEntry/.<sect1>The Big Picture<P>Squid consists of the following major components<sect2>Client Side<P><em/Files:/ <tt/client_side.c/<P> Here new client connections are accepted, parsed, and processed. This is where we determine if the request is a cache HIT, REFRESH, MISS, etc. With HTTP/1.1 we may have multiple requests from a single TCP connection. Per-connection state information is held in a data structure called <em/ConnStateData/. Per-request state information is stored in the <em/clientHttpRequest/ structure. <sect2>Server Side<P><em/Files:/ <tt/proto.c/, <tt/http.c/, <tt/ftp.c/, <tt/gopher.c/, <tt/wais.c/, <tt/ssl.c/, <tt/pass.c/<P> These routines are responsible for forwarding cache misses to other servers, depending on the protocol. Cache misses may be forwarded to either origin servers, or other proxy caches. Note that all requests (FTP, Gopher) to other proxies are sent as HTTP requests. <tt/gopher.c/ is somewhat complex and gross because it must convert from the Gopher protocol to HTTP. Wais and Gopher don't receive much attention because they comprise a relatively insignificant portion of Internet traffic.<P> <tt/ssl.c/ handles SSL requests (the CONNECT method) and <tt/pass.c/ (``passthrough'') handles uncachable requests which the cache doesn't really care about. These two modules basically pass bits back and forth between client and server. Note they do not use a <em/StoreEntry/ to do so. About the only difference between the two is that the SSL module sends a special ``connection established'' message.<sect2>Storage Manager<P><em/Files:/ <tt/store_clean.c/, <tt/store_client.c/, <tt/store_dir.c/, <tt/store_key_md5.c/, <tt/store_log.c/, <tt/store_rebuild.c/, <tt/store_swapin.c/, <tt/store_swapmeta.c/, <tt/store_swapout.c/, <tt/store.c/<P> The Storage Manager is the glue between client and server sides. Every object saved in the cache is allocated a <em/StoreEntry/ structure. While the object is being accessed, it also has a <em/MemObject/ structure.<P> Squid can quickly locate cached objects because it keeps (in memory) a hash table of all <em/StoreEntry/'s. The keys for the hash table are MD5 checksums of the objects URI. In addition there is also a doubly-linked list of <em/StoreEntry/'s used for the LRU replacement algorithm. When an entry is accessed, it is moved to the head of the LRU list. When Squid needs to replace cached objects, it takes objects from the tail of the LRU list.<P> Objects are saved to disk in a two-level directory structure. For each object the <em/StoreEntry/ includes a 4-byte <em/fileno/ field. This file number is converted to a disk pathname by a simple algorithm which evenly distributes the files across all cache directories. A cache swap file consists of two parts: the cache metadata, and the object data. Note the object data includes the full HTTP reply---headers and body. The HTTP reply headers are not the same as the cache metadata.<P> Client-side requests register themselves with a <em/StoreEntry/ to be notified when new data arrives. Multiple clients may receive data via a single <em/StoreEntry/. For POST and PUT request, this process works in reverse. Server-side functions are notified when additional data is read from the client.<sect2>Peer Selection<P><em/Files:/ <tt/peer_select.c/<P> These functions are responsible for selecting one (or none) of the neighbor caches as the appropriate forwarding location.<sect2>Access Control<P><em/Files:/ <tt/acl.c/<P> These functions are responsible for allowing or denying a request, based on a number of different parameters. These parameters include the client's IP address, the hostname of the requested resource, the request method, etc. Some of the necessary information may not be immediately available, for example the origin server's IP address. In these cases, the ACL routines initiate lookups for the necessary information and continues the access control checks when the information is available.<sect2>Network Communication<P><em/Files:/ <tt/comm.c/<P> These are the routines for communicating over TCP and UDP network sockets. Here is where sockets are opened, closed, read, and written. In addition, note that the heart of Squid (<tt/comm_select()/ or <tt/comm_poll()/) exists here, even though it handles all file descriptors, not just network sockets. These routines do not support queuing multiple blocks of data for writing. Consequently, a callback occurs for every write request.<sect2>File/Disk I/O<P><em/Files:/ <tt/disk.c/<P> Routines for reading and writing disk files (and FIFOs). Reasons for separating network and disk I/O functions are partly historical, and partly because of different behaviors. For example, we don't worry about getting a ``No space left on device'' error for network sockets. The disk I/O routines support queuing of multiple blocks for writing. In some cases, it is possible to merge multiple blocks into a single write request. The write callback does not necessarily occur for every write request.<sect2>Neighbors<P><em/Files:/ <tt/neighbors.c/<P> Maintains the list of neighbor caches. Sends and receives ICP messages to neighbors. Decides which neighbors to query for a given request. File: <tt/neighbors.c/.<sect2>IP/FQDN Cache<P><em/Files:/ <tt/ipcache.c/, <tt/fqdncache.c/<P> A cache of name-to-address and address-to-name lookups. These are hash tables keyed on the names and addresses. <tt/ipcache_nbgethostbyname()/ and <tt/fqdncache_nbgethostbyaddr()/ implement the non-blocking lookups. Files: <tt/ipcache.c/, <tt/fqdncache.c/.<sect2>Cache Manager<P><em/Files:/ <tt/objcache.c/, <tt/stat.c/<P> This provides access to certain information needed by the cache administrator. A companion program, <em/cachemgr.cgi/ can be used to make this information available via a Web browser. Cache manager requests to Squid are made with a special URL of the form<verb> cache_object://hostname/operation</verb> The cache manager provides essentially ``read-only'' access to information. It does not provide a method for configuring Squid while it is running.<sect2>Network Measurement Database<P><em/Files:/ <tt/net_db.c/<P> In a number of situation, Squid finds it useful to know the estimated network round-trip time (RTT) between itself and origin servers. A particularly useful is example is the peer selection algorithm. By making RTT measurements, a Squid cache will know if it, or one if its neighbors, is closest to a given origin server. The actual measurements are made with the <em/pinger/ program, described below. The measured values are stored in a database indexed under two keys. The primary index field is the /24 prefix of the origin server's IP address. Secondly, a hash table of fully-qualified host names have have data structures with links to the appropriate network entry. This allows Squid to quickly look up measurements when given either an IP address, or a host name. The /24 prefix aggregation is used to reduce the overall database size. File: <tt/net_db.c/.<sect2>Redirectors<P><em/Files:/ <tt/redirect.c/<P> Squid has the ability to rewrite requests from clients. After checking the access controls, but before checking for cache hits, requested URLs may optionally be written to an external <em/redirector/ process. This program, which can be highly customized, may return a new URL to replace the original request. Common applications for this feature are extended access controls and local mirroring. File: <tt/redirect.c/.<sect2>Autonomous System Numbers<P><em/Files:/ <tt/asn.c/<P> Squid supports Autonomous System (AS) numbers as another access control element. The routines in <tt/asn.c/ query databases which map AS numbers into lists of CIDR prefixes. These results are stored in a radix tree which allows fast searching of the AS number for a given IP address. <sect2>Asynchronous I/O Operations<P><em/Files:/ <tt/async_io.c/, <tt/aiops.c/<P> These routines in <tt/async_io.c/ and <tt/aiops.c/ implement blocking disk operations in a set of thread (child) processes.<sect2>Configuration File Parsing<P><em/Files:/ <tt/cf.data.pre/, <tt/cf_gen.c/, <tt/cf_parser.c/, <tt/cache_cf.c/<P> The primary configuration file specification is in the file <tt/cf.data.pre/. A simple utility program, <tt/cf_gen/, reads the <tt/cf.data.pre/ file and generates <tt/cf_parser.c/ and <tt/squid.conf/. <tt/cf_parser.c/ is included directly into <tt/cache_cf.c/ at compile time.<sect2>Callback Data Database<P><em/Files:/ <tt/cbdata.c/<P> Squid's extensive use of callback functions makes it very susceptible to memory access errors. Care must be taken so that the <tt/callback_data/ memory is still valid when the callback function is executed. The routines in <tt/cbdata.c/ provide a uniform method for managing callback data memory, canceling callbacks, and preventing erroneous memory accesses.<sect2>Debugging<P><em/Files:/ <tt/debug.c/<P> Squid includes extensive debugging statements to assist in tracking down bugs and strange behavior. Every debug statement is assigned a section and level. Usually, every debug statement in the same source file has the same section. Levels are chosen depending on how much output will be generated, or how useful the provided information will be. The <em/debug_options/ line in the configuration file determines which debug statements will be shown and which will not. The <em/debug_options/ line assigns a maximum level for every section. If a given debug statement has a level less than or equal to the configured level for that section, it will be shown. This description probably sounds more complicated than it really is. File: <em/debug.c/. Note that <tt/debug()/ itself is a macro.<sect2>Error Generation<P><em/Files:/ <tt/errorpage.c/<P> The routines in <tt/errorpage.c/ generate error messages from a template file and specific request parameters. This allows for customized error messages and multilingual support.<sect2>Event Queue<P><em/Files:/ <tt/event.c/<P> The routines in <tt/event.c/ maintain a linked-list event queue for functions to be executed at a future time. The event queue is used for periodic functions such as performing cache replacement, cleaning swap directories, as well as one-time
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -