📄 dtrace_impl.h

📁 Sun Solaris 10 中的 DTrace 组件的源代码。请参看: http://www.sun.com/software/solaris/observability.jsp
💻 H
📖 第 1 页 / 共 4 页
字号:
 * offset and the _wrapped_ offset.  If a request is made to reserve some * amount of data, and the buffer has wrapped, the wrapped offset is * incremented until the wrapped offset minus the current offset is greater * than or equal to the reserve request.  This is done by repeatedly looking * up the ECB corresponding to the EPID at the current wrapped offset, and * incrementing the wrapped offset by the size of the data payload * corresponding to that ECB.  If this offset is greater than or equal to the * limit of the data buffer, the wrapped offset is set to 0.  Thus, the * current offset effectively "chases" the wrapped offset around the buffer. * Schematically: * *      base of data buffer --->  +------+--------------------+------+ *                                | EPID | data               | EPID | *                                +------+--------+------+----+------+ *                                | data          | EPID | data      | *                                +---------------+------+-----------+ *                                | data, cont.                      | *                                +------+---------------------------+ *                                | EPID | data                      | *           current offset --->  +------+---------------------------+ *                                | invalid data                     | *           wrapped offset --->  +------+--------------------+------+ *                                | EPID | data               | EPID | *                                +------+--------+------+----+------+ *                                | data          | EPID | data      | *                                +---------------+------+-----------+ *                                :                                  : *                                .                                  . *                                .        ... valid data ...        . *                                .                                  . *                                :                                  : *                                +------+-------------+------+------+ *                                | EPID | data        | EPID | data | *                                +------+------------++------+------+ *                                | data, cont.       | leftover     | *     limit of data buffer --->  +-------------------+--------------+ * * If the amount of requested buffer space exceeds the amount of space * available between the current offset and the end of the buffer: * *  (1)  all words in the data buffer between the current offset and the limit *       of the data buffer (marked "leftover", above) are set to *       DTRACE_EPIDNONE * *  (2)  the wrapped offset is set to zero * *  (3)  the iteration process described above occurs until the wrapped offset *       is greater than the amount of desired space. * * The wrapped offset is implemented by (re-)using the inactive offset. * In a "switch" buffer policy, the inactive offset stores the offset in * the inactive buffer; in a "ring" buffer policy, it stores the wrapped * offset. * * DTrace Scratch Buffering * * Some ECBs may wish to allocate dynamically-sized temporary scratch memory. * To accommodate such requests easily, scratch memory may be allocated in * the buffer beyond the current offset plus the needed memory of the current * ECB.  If there isn't sufficient room in the buffer for the requested amount * of scratch space, the allocation fails and an error is generated.  Scratch * memory is tracked in the dtrace_mstate_t and is automatically freed when * the ECB ceases processing.  Note that ring buffers cannot allocate their * scratch from the principal buffer -- lest they needlessly overwrite older, * valid data.  Ring buffers therefore have their own dedicated scratch buffer * from which scratch is allocated. */#define	DTRACEBUF_RING		0x0001		/* bufpolicy set to "ring" */#define	DTRACEBUF_FILL		0x0002		/* bufpolicy set to "fill" */#define	DTRACEBUF_NOSWITCH	0x0004		/* do not switch buffer */#define	DTRACEBUF_WRAPPED	0x0008		/* ring buffer has wrapped */#define	DTRACEBUF_DROPPED	0x0010		/* drops occurred */#define	DTRACEBUF_ERROR		0x0020		/* errors occurred */#define	DTRACEBUF_FULL		0x0040		/* "fill" buffer is full */#define	DTRACEBUF_CONSUMED	0x0080		/* buffer has been consumed */typedef struct dtrace_buffer {	uint64_t dtb_offset;			/* current offset in buffer */	uint64_t dtb_size;			/* size of buffer */	uint32_t dtb_flags;			/* flags */	uint32_t dtb_drops;			/* number of drops */	caddr_t dtb_tomax;			/* active buffer */	caddr_t dtb_xamot;			/* inactive buffer */	uint32_t dtb_xamot_flags;		/* inactive flags */	uint32_t dtb_xamot_drops;		/* drops in inactive buffer */	uint64_t dtb_xamot_offset;		/* offset in inactive buffer */	uint32_t dtb_errors;			/* number of errors */	uint32_t dtb_xamot_errors;		/* errors in inactive buffer */#ifndef _LP64	uint64_t dtb_pad1;#endif} dtrace_buffer_t;/* * DTrace Aggregation Buffers * * Aggregation buffers use much of the same mechanism as described above * ("DTrace Buffers").  However, because an aggregation is fundamentally a * hash, there exists dynamic metadata associated with an aggregation buffer * that is not associated with other kinds of buffers.  This aggregation * metadata is _only_ relevant for the in-kernel implementation of * aggregations; it is not actually relevant to user-level consumers.  To do * this, we allocate dynamic aggregation data (hash keys and hash buckets) * starting below the _limit_ of the buffer, and we allocate data from the * _base_ of the buffer.  When the aggregation buffer is copied out, _only_ the * data is copied out; the metadata is simply discarded.  Schematically, * aggregation buffers look like: * *      base of data buffer --->  +-------+------+-----------+-------+ *                                | aggid | key  | value     | aggid | *                                +-------+------+-----------+-------+ *                                | key                              | *                                +-------+-------+-----+------------+ *                                | value | aggid | key | value      | *                                +-------+------++-----+------+-----+ *                                | aggid | key  | value       |     | *                                +-------+------+-------------+     | *                                |                ||                | *                                |                ||                | *                                |                \/                | *                                :                                  : *                                .                                  . *                                .                                  . *                                .                                  . *                                :                                  : *                                |                /\                | *                                |                ||   +------------+ *                                |                ||   |            | *                                +---------------------+            | *                                | hash keys                        | *                                | (dtrace_aggkey structures)       | *                                |                                  | *                                +----------------------------------+ *                                | hash buckets                     | *                                | (dtrace_aggbuffer structure)     | *                                |                                  | *     limit of data buffer --->  +----------------------------------+ * * * As implied above, just as we assure that ECBs always store a constant * amount of data, we assure that a given aggregation -- identified by its * aggregation ID -- always stores data of a constant quantity and type. * As with EPIDs, this allows the aggregation ID to serve as the metadata for a * given record. * * Note that the size of the dtrace_aggkey structure must be sizeof (uintptr_t) * aligned.  (If this the structure changes such that this becomes false, an * assertion will fail in dtrace_aggregate().) */typedef struct dtrace_aggkey {	uint32_t dtak_hashval;			/* hash value */	uint32_t dtak_action:4;			/* action -- 4 bits */	uint32_t dtak_size:28;			/* size -- 28 bits */	caddr_t dtak_data;			/* data pointer */	struct dtrace_aggkey *dtak_next;	/* next in hash chain */} dtrace_aggkey_t;typedef struct dtrace_aggbuffer {	uintptr_t dtagb_hashsize;		/* number of buckets */	uintptr_t dtagb_free;			/* free list of keys */	dtrace_aggkey_t **dtagb_hash;		/* hash table */} dtrace_aggbuffer_t;/* * DTrace Speculations * * Speculations have a per-CPU buffer and a global state.  Once a speculation * buffer has been comitted or discarded, it cannot be reused until all CPUs * have taken the same action (commit or discard) on their respective * speculative buffer.  However, because DTrace probes may execute in arbitrary * context, other CPUs cannot simply be cross-called at probe firing time to * perform the necessary commit or discard.  The speculation states thus * optimize for the case that a speculative buffer is only active on one CPU at * the time of a commit() or discard() -- for if this is the case, other CPUs * need not take action, and the speculation is immediately available for * reuse.  If the speculation is active on multiple CPUs, it must be * asynchronously cleaned -- potentially leading to a higher rate of dirty * speculative drops.  The speculation states are as follows: * *  DTRACESPEC_INACTIVE       <= Initial state; inactive speculation *  DTRACESPEC_ACTIVE         <= Allocated, but not yet speculatively traced to *  DTRACESPEC_ACTIVEONE      <= Speculatively traced to on one CPU *  DTRACESPEC_ACTIVEMANY     <= Speculatively traced to on more than one CPU *  DTRACESPEC_COMMITTING     <= Currently being commited on one CPU *  DTRACESPEC_COMMITTINGMANY <= Currently being commited on many CPUs *  DTRACESPEC_DISCARDING     <= Currently being discarded on many CPUs * * The state transition diagram is as follows: * *     +----------------------------------------------------------+ *     |                                                          | *     |                      +------------+                      | *     |  +-------------------| COMMITTING |<-----------------+   | *     |  |                   +------------+                  |   | *     |  | copied spec.            ^             commit() on |   | discard() on *     |  | into principal          |              active CPU |   | active CPU *     |  |                         | commit()                |   | *     V  V                         |                         |   | * +----------+                 +--------+                +-----------+ * | INACTIVE |---------------->| ACTIVE |--------------->| ACTIVEONE | * +----------+  speculation()  +--------+  speculate()   +-----------+ *     ^  ^                         |                         |   | *     |  |                         | discard()               |   | *     |  | asynchronously          |            discard() on |   | speculate() *     |  | cleaned                 V            inactive CPU |   | on inactive *     |  |                   +------------+                  |   | CPU *     |  +-------------------| DISCARDING |<-----------------+   | *     |                      +------------+                      | *     | asynchronously             ^                             | *     | copied spec.               |       discard()             | *     | into principal             +------------------------+    | *     |                                                     |    V *  +----------------+             commit()              +------------+ *  | COMMITTINGMANY |<----------------------------------| ACTIVEMANY | *  +----------------+                                   +------------+ */typedef enum dtrace_speculation_state {	DTRACESPEC_INACTIVE = 0,	DTRACESPEC_ACTIVE,	DTRACESPEC_ACTIVEONE,	DTRACESPEC_ACTIVEMANY,	DTRACESPEC_COMMITTING,	DTRACESPEC_COMMITTINGMANY,	DTRACESPEC_DISCARDING} dtrace_speculation_state_t;typedef struct dtrace_speculation {	dtrace_speculation_state_t dtsp_state;	/* current speculation state */	int dtsp_cleaning;			/* non-zero if being cleaned */	dtrace_buffer_t *dtsp_buffer;		/* speculative buffer */} dtrace_speculation_t;/* * DTrace Dynamic Variables * * The dynamic variable problem is obviously decomposed into two subproblems: * allocating new dynamic storage, and freeing old dynamic storage.  The * presence of the second problem makes the first much more complicated -- or * rather, the absence of the second renders the first trivial.  This is the * case with aggregations, for which there is effectively no deallocation of * dynamic storage.  (Or more accurately, all dynamic storage is deallocated * when a snapshot is taken of the aggregation.)  As DTrace dynamic variables * allow for both dynamic allocation and dynamic deallocation, the * implementation of dynamic variables is quite a bit more complicated than * that of their aggregation kin. * * We observe that allocating new dynamic storage is tricky only because the * size can vary -- the allocation problem is much easier if allocation sizes * are uniform.  We further observe that in D, the size of dynamic variables is * actually _not_ dynamic -- dynamic variable sizes may be determined by static * analysis of DIF text.  (This is true even of putatively dynamically-sized * objects like strings and stacks, the sizes of which are dictated by the * "stringsize" and "stackframes" variables, respectively.)  We exploit this by * performing this analysis on all DIF before enabling any probes.  For each * dynamic load or store, we calculate the dynamically-allocated size plus the * size of the dtrace_dynvar structure plus the storage required to key the * data.  For all DIF, we take the largest value and dub it the _chunksize_. * We then divide dynamic memory into two parts:  a hash table that is wide * enough to have every chunk in its own bucket, and a larger region of equal * chunksize units.  Whenever we wish to dynamically allocate a variable, we * always allocate a single chunk of memory.  Depending on the uniformity of * allocation, this will waste some amount of memory -- but it eliminates the * non-determinism inherent in traditional heap fragmentation. * * Dynamic objects are allocated by storing a non-zero value to them; they are * deallocated by storing a zero value to them.  Dynamic variables are * complicated enormously by being shared between CPUs.  In particular, * consider the following scenario: * *                 CPU A                                 CPU B *  +---------------------------------+   +---------------------------------+ *  |                                 |   |                                 | *  | allocates dynamic object a[123] |   |                                 | *  | by storing the value 345 to it  |   |                                 | *  |                               --------->                              | *  |                                 |   | wishing to load from object     | *  |                                 |   | a[123], performs lookup in      | *  |                                 |   | dynamic variable space          | *  |                               <---------                              | *  | deallocates object a[123] by    |   |                                 | *  | storing 0 to it                 |   |                                 | *  |                                 |   |                                 | *  | allocates dynamic object b[567] |   | performs load from a[123]       | *  | by storing the value 789 to it  |   |                                 | *  :                                 :   :                                 : *  .                                 .   .                                 . * * This is obviously a race in the D program, but there are nonetheless only * two valid values for CPU B's load from a[123]:  345 or 0.  Most importantly, * CPU B may _not_ see the value 789 for a[123]. * * There are essentially two ways to deal with this: * *  (1)  Explicitly spin-lock variables.  That is, if CPU B wishes to load *       from a[123], it needs to lock a[123] and hold the lock for the *       duration that it wishes to manipulate it. * *  (2)  Avoid reusing freed chunks until it is known that no CPU is referring *       to them. * * The implementation of (1) is rife with complexity, because it requires the * user of a dynamic variable to explicitly decree when they are done using it. * Were all variables by value, this perhaps wouldn't be debilitating -- but * dynamic variables of non-scalar types are tracked by reference.  That is, if * a dynamic variable is, say, a string, and that variable is to be traced to, * say, the principal buffer, the DIF emulation code returns to the main * dtrace_probe() loop a pointer to the underlying storage, not the contents of * the storage.  Further, code calling on DIF emulation would have to be aware * that the DIF emulation has returned a reference to a dynamic variable that * has been potentially locked.  The variable would have to be unlocked after * the main dtrace_probe() loop is finished with the variable, and the main
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -