📄 dtrace_impl.h
字号:
* offset and the _wrapped_ offset. If a request is made to reserve some * amount of data, and the buffer has wrapped, the wrapped offset is * incremented until the wrapped offset minus the current offset is greater * than or equal to the reserve request. This is done by repeatedly looking * up the ECB corresponding to the EPID at the current wrapped offset, and * incrementing the wrapped offset by the size of the data payload * corresponding to that ECB. If this offset is greater than or equal to the * limit of the data buffer, the wrapped offset is set to 0. Thus, the * current offset effectively "chases" the wrapped offset around the buffer. * Schematically: * * base of data buffer ---> +------+--------------------+------+ * | EPID | data | EPID | * +------+--------+------+----+------+ * | data | EPID | data | * +---------------+------+-----------+ * | data, cont. | * +------+---------------------------+ * | EPID | data | * current offset ---> +------+---------------------------+ * | invalid data | * wrapped offset ---> +------+--------------------+------+ * | EPID | data | EPID | * +------+--------+------+----+------+ * | data | EPID | data | * +---------------+------+-----------+ * : : * . . * . ... valid data ... . * . . * : : * +------+-------------+------+------+ * | EPID | data | EPID | data | * +------+------------++------+------+ * | data, cont. | leftover | * limit of data buffer ---> +-------------------+--------------+ * * If the amount of requested buffer space exceeds the amount of space * available between the current offset and the end of the buffer: * * (1) all words in the data buffer between the current offset and the limit * of the data buffer (marked "leftover", above) are set to * DTRACE_EPIDNONE * * (2) the wrapped offset is set to zero * * (3) the iteration process described above occurs until the wrapped offset * is greater than the amount of desired space. * * The wrapped offset is implemented by (re-)using the inactive offset. * In a "switch" buffer policy, the inactive offset stores the offset in * the inactive buffer; in a "ring" buffer policy, it stores the wrapped * offset. * * DTrace Scratch Buffering * * Some ECBs may wish to allocate dynamically-sized temporary scratch memory. * To accommodate such requests easily, scratch memory may be allocated in * the buffer beyond the current offset plus the needed memory of the current * ECB. If there isn't sufficient room in the buffer for the requested amount * of scratch space, the allocation fails and an error is generated. Scratch * memory is tracked in the dtrace_mstate_t and is automatically freed when * the ECB ceases processing. Note that ring buffers cannot allocate their * scratch from the principal buffer -- lest they needlessly overwrite older, * valid data. Ring buffers therefore have their own dedicated scratch buffer * from which scratch is allocated. */#define DTRACEBUF_RING 0x0001 /* bufpolicy set to "ring" */#define DTRACEBUF_FILL 0x0002 /* bufpolicy set to "fill" */#define DTRACEBUF_NOSWITCH 0x0004 /* do not switch buffer */#define DTRACEBUF_WRAPPED 0x0008 /* ring buffer has wrapped */#define DTRACEBUF_DROPPED 0x0010 /* drops occurred */#define DTRACEBUF_ERROR 0x0020 /* errors occurred */#define DTRACEBUF_FULL 0x0040 /* "fill" buffer is full */#define DTRACEBUF_CONSUMED 0x0080 /* buffer has been consumed */typedef struct dtrace_buffer { uint64_t dtb_offset; /* current offset in buffer */ uint64_t dtb_size; /* size of buffer */ uint32_t dtb_flags; /* flags */ uint32_t dtb_drops; /* number of drops */ caddr_t dtb_tomax; /* active buffer */ caddr_t dtb_xamot; /* inactive buffer */ uint32_t dtb_xamot_flags; /* inactive flags */ uint32_t dtb_xamot_drops; /* drops in inactive buffer */ uint64_t dtb_xamot_offset; /* offset in inactive buffer */ uint32_t dtb_errors; /* number of errors */ uint32_t dtb_xamot_errors; /* errors in inactive buffer */#ifndef _LP64 uint64_t dtb_pad1;#endif} dtrace_buffer_t;/* * DTrace Aggregation Buffers * * Aggregation buffers use much of the same mechanism as described above * ("DTrace Buffers"). However, because an aggregation is fundamentally a * hash, there exists dynamic metadata associated with an aggregation buffer * that is not associated with other kinds of buffers. This aggregation * metadata is _only_ relevant for the in-kernel implementation of * aggregations; it is not actually relevant to user-level consumers. To do * this, we allocate dynamic aggregation data (hash keys and hash buckets) * starting below the _limit_ of the buffer, and we allocate data from the * _base_ of the buffer. When the aggregation buffer is copied out, _only_ the * data is copied out; the metadata is simply discarded. Schematically, * aggregation buffers look like: * * base of data buffer ---> +-------+------+-----------+-------+ * | aggid | key | value | aggid | * +-------+------+-----------+-------+ * | key | * +-------+-------+-----+------------+ * | value | aggid | key | value | * +-------+------++-----+------+-----+ * | aggid | key | value | | * +-------+------+-------------+ | * | || | * | || | * | \/ | * : : * . . * . . * . . * : : * | /\ | * | || +------------+ * | || | | * +---------------------+ | * | hash keys | * | (dtrace_aggkey structures) | * | | * +----------------------------------+ * | hash buckets | * | (dtrace_aggbuffer structure) | * | | * limit of data buffer ---> +----------------------------------+ * * * As implied above, just as we assure that ECBs always store a constant * amount of data, we assure that a given aggregation -- identified by its * aggregation ID -- always stores data of a constant quantity and type. * As with EPIDs, this allows the aggregation ID to serve as the metadata for a * given record. * * Note that the size of the dtrace_aggkey structure must be sizeof (uintptr_t) * aligned. (If this the structure changes such that this becomes false, an * assertion will fail in dtrace_aggregate().) */typedef struct dtrace_aggkey { uint32_t dtak_hashval; /* hash value */ uint32_t dtak_action:4; /* action -- 4 bits */ uint32_t dtak_size:28; /* size -- 28 bits */ caddr_t dtak_data; /* data pointer */ struct dtrace_aggkey *dtak_next; /* next in hash chain */} dtrace_aggkey_t;typedef struct dtrace_aggbuffer { uintptr_t dtagb_hashsize; /* number of buckets */ uintptr_t dtagb_free; /* free list of keys */ dtrace_aggkey_t **dtagb_hash; /* hash table */} dtrace_aggbuffer_t;/* * DTrace Speculations * * Speculations have a per-CPU buffer and a global state. Once a speculation * buffer has been comitted or discarded, it cannot be reused until all CPUs * have taken the same action (commit or discard) on their respective * speculative buffer. However, because DTrace probes may execute in arbitrary * context, other CPUs cannot simply be cross-called at probe firing time to * perform the necessary commit or discard. The speculation states thus * optimize for the case that a speculative buffer is only active on one CPU at * the time of a commit() or discard() -- for if this is the case, other CPUs * need not take action, and the speculation is immediately available for * reuse. If the speculation is active on multiple CPUs, it must be * asynchronously cleaned -- potentially leading to a higher rate of dirty * speculative drops. The speculation states are as follows: * * DTRACESPEC_INACTIVE <= Initial state; inactive speculation * DTRACESPEC_ACTIVE <= Allocated, but not yet speculatively traced to * DTRACESPEC_ACTIVEONE <= Speculatively traced to on one CPU * DTRACESPEC_ACTIVEMANY <= Speculatively traced to on more than one CPU * DTRACESPEC_COMMITTING <= Currently being commited on one CPU * DTRACESPEC_COMMITTINGMANY <= Currently being commited on many CPUs * DTRACESPEC_DISCARDING <= Currently being discarded on many CPUs * * The state transition diagram is as follows: * * +----------------------------------------------------------+ * | | * | +------------+ | * | +-------------------| COMMITTING |<-----------------+ | * | | +------------+ | | * | | copied spec. ^ commit() on | | discard() on * | | into principal | active CPU | | active CPU * | | | commit() | | * V V | | | * +----------+ +--------+ +-----------+ * | INACTIVE |---------------->| ACTIVE |--------------->| ACTIVEONE | * +----------+ speculation() +--------+ speculate() +-----------+ * ^ ^ | | | * | | | discard() | | * | | asynchronously | discard() on | | speculate() * | | cleaned V inactive CPU | | on inactive * | | +------------+ | | CPU * | +-------------------| DISCARDING |<-----------------+ | * | +------------+ | * | asynchronously ^ | * | copied spec. | discard() | * | into principal +------------------------+ | * | | V * +----------------+ commit() +------------+ * | COMMITTINGMANY |<----------------------------------| ACTIVEMANY | * +----------------+ +------------+ */typedef enum dtrace_speculation_state { DTRACESPEC_INACTIVE = 0, DTRACESPEC_ACTIVE, DTRACESPEC_ACTIVEONE, DTRACESPEC_ACTIVEMANY, DTRACESPEC_COMMITTING, DTRACESPEC_COMMITTINGMANY, DTRACESPEC_DISCARDING} dtrace_speculation_state_t;typedef struct dtrace_speculation { dtrace_speculation_state_t dtsp_state; /* current speculation state */ int dtsp_cleaning; /* non-zero if being cleaned */ dtrace_buffer_t *dtsp_buffer; /* speculative buffer */} dtrace_speculation_t;/* * DTrace Dynamic Variables * * The dynamic variable problem is obviously decomposed into two subproblems: * allocating new dynamic storage, and freeing old dynamic storage. The * presence of the second problem makes the first much more complicated -- or * rather, the absence of the second renders the first trivial. This is the * case with aggregations, for which there is effectively no deallocation of * dynamic storage. (Or more accurately, all dynamic storage is deallocated * when a snapshot is taken of the aggregation.) As DTrace dynamic variables * allow for both dynamic allocation and dynamic deallocation, the * implementation of dynamic variables is quite a bit more complicated than * that of their aggregation kin. * * We observe that allocating new dynamic storage is tricky only because the * size can vary -- the allocation problem is much easier if allocation sizes * are uniform. We further observe that in D, the size of dynamic variables is * actually _not_ dynamic -- dynamic variable sizes may be determined by static * analysis of DIF text. (This is true even of putatively dynamically-sized * objects like strings and stacks, the sizes of which are dictated by the * "stringsize" and "stackframes" variables, respectively.) We exploit this by * performing this analysis on all DIF before enabling any probes. For each * dynamic load or store, we calculate the dynamically-allocated size plus the * size of the dtrace_dynvar structure plus the storage required to key the * data. For all DIF, we take the largest value and dub it the _chunksize_. * We then divide dynamic memory into two parts: a hash table that is wide * enough to have every chunk in its own bucket, and a larger region of equal * chunksize units. Whenever we wish to dynamically allocate a variable, we * always allocate a single chunk of memory. Depending on the uniformity of * allocation, this will waste some amount of memory -- but it eliminates the * non-determinism inherent in traditional heap fragmentation. * * Dynamic objects are allocated by storing a non-zero value to them; they are * deallocated by storing a zero value to them. Dynamic variables are * complicated enormously by being shared between CPUs. In particular, * consider the following scenario: * * CPU A CPU B * +---------------------------------+ +---------------------------------+ * | | | | * | allocates dynamic object a[123] | | | * | by storing the value 345 to it | | | * | ---------> | * | | | wishing to load from object | * | | | a[123], performs lookup in | * | | | dynamic variable space | * | <--------- | * | deallocates object a[123] by | | | * | storing 0 to it | | | * | | | | * | allocates dynamic object b[567] | | performs load from a[123] | * | by storing the value 789 to it | | | * : : : : * . . . . * * This is obviously a race in the D program, but there are nonetheless only * two valid values for CPU B's load from a[123]: 345 or 0. Most importantly, * CPU B may _not_ see the value 789 for a[123]. * * There are essentially two ways to deal with this: * * (1) Explicitly spin-lock variables. That is, if CPU B wishes to load * from a[123], it needs to lock a[123] and hold the lock for the * duration that it wishes to manipulate it. * * (2) Avoid reusing freed chunks until it is known that no CPU is referring * to them. * * The implementation of (1) is rife with complexity, because it requires the * user of a dynamic variable to explicitly decree when they are done using it. * Were all variables by value, this perhaps wouldn't be debilitating -- but * dynamic variables of non-scalar types are tracked by reference. That is, if * a dynamic variable is, say, a string, and that variable is to be traced to, * say, the principal buffer, the DIF emulation code returns to the main * dtrace_probe() loop a pointer to the underlying storage, not the contents of * the storage. Further, code calling on DIF emulation would have to be aware * that the DIF emulation has returned a reference to a dynamic variable that * has been potentially locked. The variable would have to be unlocked after * the main dtrace_probe() loop is finished with the variable, and the main
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -