📄 prog-guide.sgml

📁 -
💻 SGML
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
<!doctype linuxdoc system><article><title>Squid v1.2 Programmers Guide</title><author>Duane Wessels, Squid Developers<abstract>Squid is a WWW Cache application developed by the National Laboratoryfor Applied Network Research and members of the Web Caching community.Squid is implemented as a single, non-blocking process based arounda BSD select() loop.  This document describes the operation of the Squidsource code and is intended to be used by others who wish to customizeor improve it.</abstract><toc><!-- %%%% Chapter : INTRODUCTION %%%% --><sect>Introduction<P>The Squid source code has evolved more from empirical observation andtinkering, rather than a solid design process.  It carries a legacy ofbeing ``touched'' by numerous individuals, each with somewhat differenttechniques and terminology.  <P>Squid is a single-process proxy server.  Every request is handled bythe main process, with the exception of FTP.  However, Squid does notuse a ``threads package'' such has Pthreads.  While this might be easier to code, it suffers from portability and performance problems.Instead Squid maintains data structures and state information foreach active request.<P>The code is often difficult to follow because there are no explicitstate variables for the active requests.  Instead, thread executionprogresses as a sequence of ``callback functions'' which get executedwhen I/O is ready to occur, or some other event has happened.  Asa callback function completes, it is responsible for registering thenext callback function for subsequent I/O.<P>Note there is only a pseudo-consistent naming scheme.  In most cases functions are named like <tt/moduleFooBar()/.  However, thereare also some functions named like <tt/module_foo_bar()/.<P>Note that the Squid source changes rapidly, and some parts of thisdocument may become out-of-date.  If you find any inconsistencies, pleasefeel free to notify<url url="mailto:squid-dev@nlanr.net"name="the Squid Developers">.<sect1>Conventions<P>Function names and file names will be written in a courier font, suchas <tt/store.c/ and <tt/storeRegister()/.  Data structures and theirmembers will be written in an italicized font, such as <em/StoreEntry/.<sect1>The Big Picture<P>Squid consists of the following major components<sect2>Client Side<P><em/Files:/ <tt/client_side.c/<P>    Here new client connections are accepted, parsed, and processed.    This is where we determine if the request is a cache HIT,    REFRESH, MISS, etc.  With HTTP/1.1 we may have multiple requests    from a single TCP connection.  Per-connection state information    is held in a data structure called <em/ConnStateData/.  Per-request    state information is stored in the <em/clientHttpRequest/ structure.    <sect2>Server Side<P><em/Files:/    <tt/proto.c/,    <tt/http.c/,    <tt/ftp.c/,    <tt/gopher.c/,    <tt/wais.c/,    <tt/ssl.c/,    <tt/pass.c/<P>    These routines are responsible for forwarding cache misses    to other servers, depending on the protocol.  Cache misses    may be forwarded to either origin servers, or other proxy caches.    Note that all requests (FTP, Gopher) to other    proxies are sent as HTTP requests.      <tt/gopher.c/ is somewhat complex and gross because it must    convert from the Gopher protocol to HTTP.  Wais and Gopher don't    receive much attention because they comprise a relatively insignificant    portion of Internet traffic.<P>    <tt/ssl.c/ handles SSL requests (the CONNECT method) and    <tt/pass.c/ (``passthrough'') handles uncachable requests which    the cache doesn't really care about.   These two modules basically    pass bits back and forth between client and server.  Note they do    not use a <em/StoreEntry/ to do so.  About the only difference    between the two is that the SSL module sends a special ``connection    established'' message.<sect2>Storage Manager<P><em/Files:/    <tt/store_clean.c/,    <tt/store_client.c/,    <tt/store_dir.c/,    <tt/store_key_md5.c/,    <tt/store_log.c/,    <tt/store_rebuild.c/,    <tt/store_swapin.c/,    <tt/store_swapmeta.c/,    <tt/store_swapout.c/,    <tt/store.c/<P>    The Storage Manager is the glue between client and server sides.    Every object saved in the cache is allocated a <em/StoreEntry/    structure.  While the object is being accessed, it also has a     <em/MemObject/ structure.<P>    Squid can quickly locate cached objects because it keeps (in memory) a hash    table of all <em/StoreEntry/'s.  The keys for the hash    table are MD5 checksums of the objects URI.  In addition there is    also a doubly-linked list of <em/StoreEntry/'s used for the LRU    replacement algorithm.  When an entry is accessed, it is moved to    the head of the LRU list.  When Squid needs to replace cached objects,    it takes objects from the tail of the LRU list.<P>    Objects are saved to disk in a two-level directory structure.  For    each object the <em/StoreEntry/ includes a 4-byte <em/fileno/    field.  This file number is converted to a disk pathname by a    simple algorithm which evenly distributes the files across all     cache directories.  A cache swap file consists of two parts:    the cache metadata, and the object data.  Note the object     data includes the full HTTP reply---headers and body.  The HTTP    reply headers are not the same as the cache metadata.<P>    Client-side requests register themselves with a <em/StoreEntry/    to be notified when new data arrives.  Multiple clients may    receive data via a single <em/StoreEntry/.  For POST and    PUT request, this process works in reverse.  Server-side functions    are notified when additional data is read from the client.<sect2>Peer Selection<P><em/Files:/	<tt/peer_select.c/<P>    These functions are responsible for selecting    one (or none) of the neighbor caches as the appropriate forwarding    location.<sect2>Access Control<P><em/Files:/	<tt/acl.c/<P>    These functions are responsible for allowing    or denying a request, based on a number of different parameters.    These parameters include the client's IP address, the hostname    of the requested resource, the request method, etc.    Some of the necessary information may not be immediately available,    for example the origin server's IP address.  In these cases,     the ACL routines initiate lookups for the necessary information and    continues the access control checks when the information is    available.<sect2>Network Communication<P><em/Files:/	<tt/comm.c/<P>    These are the routines for communicating over    TCP and UDP network sockets.  Here is where sockets are opened,    closed, read, and written.  In addition, note that the heart of    Squid (<tt/comm_select()/ or <tt/comm_poll()/) exists here, even    though it handles all file descriptors, not just network sockets.    These routines do not support queuing multiple    blocks of data for writing.  Consequently, a callback occurs    for every write request.<sect2>File/Disk I/O<P><em/Files:/	<tt/disk.c/<P>    Routines for reading and writing disk files (and FIFOs).    Reasons for separating network and    disk I/O functions are partly historical, and partly because of    different behaviors.  For example, we don't worry about getting a    ``No space left on device'' error for network sockets.  The disk    I/O routines support queuing of multiple blocks for writing.    In some cases, it is possible to merge multiple blocks into    a single write request.  The write callback does not necessarily    occur for every write request.<sect2>Neighbors<P><em/Files:/	<tt/neighbors.c/<P>    Maintains the list of neighbor caches.  Sends and receives     ICP messages to neighbors.  Decides which neighbors to    query for a given request.  File: <tt/neighbors.c/.<sect2>IP/FQDN Cache<P><em/Files:/	<tt/ipcache.c/, <tt/fqdncache.c/<P>    A cache of name-to-address and address-to-name lookups.  These are    hash tables keyed on the names and addresses.    <tt/ipcache_nbgethostbyname()/ and <tt/fqdncache_nbgethostbyaddr()/    implement the non-blocking lookups.  Files: <tt/ipcache.c/,    <tt/fqdncache.c/.<sect2>Cache Manager<P><em/Files:/	<tt/objcache.c/, <tt/stat.c/<P>	This provides access to certain information needed by the	cache administrator.  A companion program, <em/cachemgr.cgi/	can be used to make this information available via a Web	browser.  Cache manager requests to Squid are made with a 	special URL of the form<verb>	cache_object://hostname/operation</verb>	The cache manager provides essentially ``read-only'' access	to information.  It does not provide a method for configuring	Squid while it is running.<sect2>Network Measurement Database<P><em/Files:/	<tt/net_db.c/<P>	In a number of situation, Squid finds it useful to know the	estimated network round-trip time (RTT) between itself and	origin servers.  A particularly useful is example is	the peer selection algorithm.  By making RTT measurements, a	Squid cache will know if it, or one if its neighbors, is closest	to a given origin server.  The actual measurements are made	with the <em/pinger/ program, described below.  The measured	values are stored in a database indexed under two keys.  The 	primary index field is the /24 prefix of the origin server's	IP address.  Secondly, a hash table of fully-qualified host	names have have data structures with links to the appropriate	network entry.  This allows Squid to quickly look up measurements	when given either an IP address, or a host name.  The /24 prefix	aggregation is used to reduce the overall database size.  File:	<tt/net_db.c/.<sect2>Redirectors<P><em/Files:/	<tt/redirect.c/<P>	Squid has the ability to rewrite requests from clients.  After	checking the access controls, but before checking for cache hits,	requested URLs may optionally be written to an external	<em/redirector/ process.  This program, which can be highly	customized, may return a new URL to replace the original request.	Common applications for this feature are extended access controls	and local mirroring.  File: <tt/redirect.c/.<sect2>Autonomous System Numbers<P><em/Files:/	<tt/asn.c/<P>	Squid supports Autonomous System (AS) numbers as another 	access control element.  The routines in <tt/asn.c/	query databases which map AS numbers into lists of CIDR	prefixes.  These results are stored in a radix tree which	allows fast searching of the AS number for a given IP address.	<sect2>Asynchronous I/O Operations<P><em/Files:/	<tt/async_io.c/, <tt/aiops.c/<P>	These routines in <tt/async_io.c/ and <tt/aiops.c/ 	implement blocking disk operations in a set of thread (child)	processes.<sect2>Configuration File Parsing<P><em/Files:/	<tt/cf.data.pre/,	<tt/cf_gen.c/,	<tt/cf_parser.c/,	<tt/cache_cf.c/<P>	The primary configuration file specification is in the file	<tt/cf.data.pre/.  A simple utility program, <tt/cf_gen/,	reads the <tt/cf.data.pre/ file and generates <tt/cf_parser.c/	and <tt/squid.conf/.  <tt/cf_parser.c/ is included directly	into <tt/cache_cf.c/ at compile time.<sect2>Callback Data Database<P><em/Files:/	<tt/cbdata.c/<P>	Squid's extensive use of callback functions makes it very	susceptible to memory access errors.  Care must be taken	so that the <tt/callback_data/ memory is still valid when	the callback function is executed.  The routines in <tt/cbdata.c/	provide a uniform method for managing callback data memory,	canceling callbacks, and preventing erroneous memory accesses.<sect2>Debugging<P><em/Files:/	<tt/debug.c/<P>	Squid includes extensive debugging statements to assist in	tracking down bugs and strange behavior.  Every debug statement	is assigned a section and level.  Usually, every debug statement	in the same source file has the same section.  Levels are chosen	depending on how much output will be generated, or how useful the	provided information will be.  The <em/debug_options/ line 	in the configuration file determines which debug statements will	be shown and which will not.  The <em/debug_options/ line	assigns a maximum level for every section.  If a given debug	statement has a level less than or equal to the configured	level for that section, it will be shown.  This description	probably sounds more complicated than it really is.	File: <em/debug.c/.  Note that <tt/debug()/ itself is a macro.<sect2>Error Generation<P><em/Files:/	<tt/errorpage.c/<P>	The routines in <tt/errorpage.c/ generate error messages from	a template file and specific request parameters.  This allows	for customized error messages and multilingual support.<sect2>Event Queue<P><em/Files:/	<tt/event.c/<P>	The routines in <tt/event.c/ maintain a linked-list event	queue for functions to be executed at a future time.  The	event queue is used for periodic functions such as performing	cache replacement, cleaning swap directories, as well as one-time
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -