📄 dtrace_impl.h
字号:
* dtrace_probe() loop would have to be careful to not call any further DIF * emulation while the variable is locked to avoid deadlock. More generally, * if one were to implement (1), DIF emulation code dealing with dynamic * variables could only deal with one dynamic variable at a time (lest deadlock * result). To sum, (1) exports too much subtlety to the users of dynamic * variables -- increasing maintenance burden and imposing serious constraints * on future DTrace development. * * The implementation of (2) is also complex, but the complexity is more * manageable. We need to be sure that when a variable is deallocated, it is * not placed on a traditional free list, but rather on a _dirty_ list. Once a * variable is on a dirty list, it cannot be found by CPUs performing a * subsequent lookup of the variable -- but it may still be in use by other * CPUs. To assure that all CPUs that may be seeing the old variable have * cleared out of probe context, a dtrace_sync() can be issued. Once the * dtrace_sync() has completed, it can be known that all CPUs are done * manipulating the dynamic variable -- the dirty list can be atomically * appended to the free list. Unfortunately, there's a slight hiccup in this * mechanism: dtrace_sync() may not be issued from probe context. The * dtrace_sync() must be therefore issued asynchronously from non-probe * context. For this we rely on the DTrace cleaner, a cyclic that runs at the * "cleanrate" frequency. To ease this implementation, we define several chunk * lists: * * - Dirty. Deallocated chunks, not yet cleaned. Not available. * * - Rinsing. Formerly dirty chunks that are currently being asynchronously * cleaned. Not available, but will be shortly. Dynamic variable * allocation may not spin or block for availability, however. * * - Clean. Clean chunks, ready for allocation -- but not on the free list. * * - Free. Available for allocation. * * Moreover, to avoid absurd contention, _each_ of these lists is implemented * on a per-CPU basis. This is only for performance, not correctness; chunks * may be allocated from another CPU's free list. The algorithm for allocation * then is this: * * (1) Attempt to atomically allocate from current CPU's free list. If list * is non-empty and allocation is successful, allocation is complete. * * (2) If the clean list is non-empty, atomically move it to the free list, * and reattempt (1). * * (3) If the dynamic variable space is in the CLEAN state, look for free * and clean lists on other CPUs by setting the current CPU to the next * CPU, and reattempting (1). If the next CPU is the current CPU (that * is, if all CPUs have been checked), atomically switch the state of * the dynamic variable space based on the following: * * - If no free chunks were found and no dirty chunks were found, * atomically set the state to EMPTY. * * - If dirty chunks were found, atomically set the state to DIRTY. * * - If rinsing chunks were found, atomically set the state to RINSING. * * (4) Based on state of dynamic variable space state, increment appropriate * counter to indicate dynamic drops (if in EMPTY state) vs. dynamic * dirty drops (if in DIRTY state) vs. dynamic rinsing drops (if in * RINSING state). Fail the allocation. * * The cleaning cyclic operates with the following algorithm: for all CPUs * with a non-empty dirty list, atomically move the dirty list to the rinsing * list. Perform a dtrace_sync(). For all CPUs with a non-empty rinsing list, * atomically move the rinsing list to the clean list. Perform another * dtrace_sync(). By this point, all CPUs have seen the new clean list; the * state of the dynamic variable space can be restored to CLEAN. * * There exist two final races that merit explanation. The first is a simple * allocation race: * * CPU A CPU B * +---------------------------------+ +---------------------------------+ * | | | | * | allocates dynamic object a[123] | | allocates dynamic object a[123] | * | by storing the value 345 to it | | by storing the value 567 to it | * | | | | * : : : : * . . . . * * Again, this is a race in the D program. It can be resolved by having a[123] * hold the value 345 or a[123] hold the value 567 -- but it must be true that * a[123] have only _one_ of these values. (That is, the racing CPUs may not * put the same element twice on the same hash chain.) This is resolved * simply: before the allocation is undertaken, the start of the new chunk's * hash chain is noted. Later, after the allocation is complete, the hash * chain is atomically switched to point to the new element. If this fails * (because of either concurrent allocations or an allocation concurrent with a * deletion), the newly allocated chunk is deallocated to the dirty list, and * the whole process of looking up (and potentially allocating) the dynamic * variable is reattempted. * * The final race is a simple deallocation race: * * CPU A CPU B * +---------------------------------+ +---------------------------------+ * | | | | * | deallocates dynamic object | | deallocates dynamic object | * | a[123] by storing the value 0 | | a[123] by storing the value 0 | * | to it | | to it | * | | | | * : : : : * . . . . * * Once again, this is a race in the D program, but it is one that we must * handle without corrupting the underlying data structures. Because * deallocations require the deletion of a chunk from the middle of a hash * chain, we cannot use a single-word atomic operation to remove it. For this, * we add a spin lock to the hash buckets that is _only_ used for deallocations * (allocation races are handled as above). Further, this spin lock is _only_ * held for the duration of the delete; before control is returned to the DIF * emulation code, the hash bucket is unlocked. */typedef struct dtrace_key { uint64_t dttk_value; /* data value or data pointer */ uint64_t dttk_size; /* 0 if by-val, >0 if by-ref */} dtrace_key_t;typedef struct dtrace_tuple { uint32_t dtt_nkeys; /* number of keys in tuple */ uint32_t dtt_pad; /* padding */ dtrace_key_t dtt_key[1]; /* array of tuple keys */} dtrace_tuple_t;typedef struct dtrace_dynvar { uint64_t dtdv_hashval; /* hash value -- 0 if free */ struct dtrace_dynvar *dtdv_next; /* next on list or hash chain */ void *dtdv_data; /* pointer to data */ dtrace_tuple_t dtdv_tuple; /* tuple key */} dtrace_dynvar_t;typedef enum dtrace_dynvar_op { DTRACE_DYNVAR_ALLOC, DTRACE_DYNVAR_NOALLOC, DTRACE_DYNVAR_DEALLOC} dtrace_dynvar_op_t;typedef struct dtrace_dynhash { dtrace_dynvar_t *dtdh_chain; /* hash chain for this bucket */ uintptr_t dtdh_lock; /* deallocation lock */#ifdef _LP64 uintptr_t dtdh_pad[6]; /* pad to avoid false sharing */#else uintptr_t dtdh_pad[14]; /* pad to avoid false sharing */#endif} dtrace_dynhash_t;typedef struct dtrace_dstate_percpu { dtrace_dynvar_t *dtdsc_free; /* free list for this CPU */ dtrace_dynvar_t *dtdsc_dirty; /* dirty list for this CPU */ dtrace_dynvar_t *dtdsc_rinsing; /* rinsing list for this CPU */ dtrace_dynvar_t *dtdsc_clean; /* clean list for this CPU */ uint64_t dtdsc_drops; /* number of capacity drops */ uint64_t dtdsc_dirty_drops; /* number of dirty drops */ uint64_t dtdsc_rinsing_drops; /* number of rinsing drops */#ifdef _LP64 uint64_t dtdsc_pad; /* pad to avoid false sharing */#else uint64_t dtdsc_pad[2]; /* pad to avoid false sharing */#endif} dtrace_dstate_percpu_t;typedef enum dtrace_dstate_state { DTRACE_DSTATE_CLEAN = 0, DTRACE_DSTATE_EMPTY, DTRACE_DSTATE_DIRTY, DTRACE_DSTATE_RINSING} dtrace_dstate_state_t;typedef struct dtrace_dstate { void *dtds_base; /* base of dynamic var. space */ size_t dtds_size; /* size of dynamic var. space */ size_t dtds_hashsize; /* number of buckets in hash */ size_t dtds_chunksize; /* size of each chunk */ dtrace_dynhash_t *dtds_hash; /* pointer to hash table */ dtrace_dstate_state_t dtds_state; /* current dynamic var. state */ dtrace_dstate_percpu_t *dtds_percpu; /* per-CPU dyn. var. state */} dtrace_dstate_t;/* * DTrace Variable State * * The DTrace variable state tracks user-defined variables in its dtrace_vstate * structure. Each DTrace consumer has exactly one dtrace_vstate structure, * but some dtrace_vstate structures may exist without a corresponding DTrace * consumer (see "DTrace Helpers", below). As described in <sys/dtrace.h>, * user-defined variables can have one of three scopes: * * DIFV_SCOPE_GLOBAL => global scope * DIFV_SCOPE_THREAD => thread-local scope (i.e. "self->" variables) * DIFV_SCOPE_LOCAL => clause-local scope (i.e. "this->" variables) * * The variable state tracks variables by both their scope and their allocation * type: * * - The dtvs_globals member points to an array of dtrace_globvar structures. * These structures contain both the variable metadata (dtrace_difv * structures) and the underlying storage for all statically allocated * DIFV_SCOPE_GLOBAL variables. * * - The dtvs_tlocals member points to an array of dtrace_difv structures for * DIFV_SCOPE_THREAD variables. As such, this array tracks _only_ the * variable metadata for DIFV_SCOPE_THREAD variables; the underlying storage * is allocated out of the dynamic variable space. * * - The dtvs_locals member points to an array of uint64_t's that represent * the underlying storage for DIFV_SCOPE_LOCAL variables. As * DIFV_SCOPE_LOCAL variables may only be scalars, there is no need to store * any variable metadata other than the number of clause-local variables. * * - The dtvs_dynvars member is the dynamic variable state associated with the * variable state. The dynamic variable state (described in "DTrace Dynamic * Variables", above) tracks all DIFV_SCOPE_THREAD variables and all * dynamically-allocated DIFV_SCOPE_GLOBAL variables. */typedef struct dtrace_globvar { uint64_t dtgv_data; /* data or pointer to it */ int dtgv_refcnt; /* reference count */ dtrace_difv_t dtgv_var; /* variable metadata */} dtrace_globvar_t;typedef struct dtrace_vstate { dtrace_globvar_t **dtvs_globals; /* statically-allocated glbls */ int dtvs_nglobals; /* number of globals */ dtrace_difv_t *dtvs_tlocals; /* thread-local metadata */ int dtvs_ntlocals; /* number of thread-locals */ uint64_t **dtvs_locals; /* clause-local data */ int dtvs_nlocals; /* number of clause-locals */ dtrace_dstate_t dtvs_dynvars; /* dynamic variable state */} dtrace_vstate_t;/* * DTrace Machine State * * In the process of processing a fired probe, DTrace needs to track and/or * cache some per-CPU state associated with that particular firing. This is * state that is always discarded after the probe firing has completed, and * much of it is not specific to any DTrace consumer, remaining valid across * all ECBs. This state is tracked in the dtrace_mstate structure. */#define DTRACE_MSTATE_ARGS 0x00000001#define DTRACE_MSTATE_PROBE 0x00000002#define DTRACE_MSTATE_EPID 0x00000004#define DTRACE_MSTATE_TIMESTAMP 0x00000008#define DTRACE_MSTATE_STACKDEPTH 0x00000010#define DTRACE_MSTATE_CALLER 0x00000020#define DTRACE_MSTATE_IPL 0x00000040#define DTRACE_MSTATE_FLTOFFS 0x00000080#define DTRACE_MSTATE_WALLTIMESTAMP 0x00000100typedef struct dtrace_mstate { uintptr_t dtms_scratch_base; /* base of scratch space */ uintptr_t dtms_scratch_ptr; /* current scratch pointer */ size_t dtms_scratch_size; /* scratch size */ uint32_t dtms_present; /* variables that are present */ uint64_t dtms_arg[5]; /* cached arguments */ dtrace_epid_t dtms_epid; /* current EPID */ uint64_t dtms_timestamp; /* cached timestamp */ hrtime_t dtms_walltimestamp; /* cached wall timestamp */ int dtms_stackdepth; /* cached stackdepth */ struct dtrace_probe *dtms_probe; /* current probe */ uintptr_t dtms_caller; /* cached caller */ int dtms_ipl; /* cached interrupt pri lev */ int dtms_fltoffs; /* faulting DIFO offset */} dtrace_mstate_t;#define DTRACE_COND_OWNER 0x1#define DTRACE_COND_USERMODE 0x2#define DTRACE_PROBEKEY_MAXDEPTH 8 /* max glob recursion depth *//* * DTrace Activity * * Each DTrace consumer is in one of several states, which (for purposes of * avoiding yet-another overloading of the noun "state") we call the current * _activity_. The activity transitions on dtrace_go() (from DTRACIOCGO), on * dtrace_stop() (from DTRACIOCSTOP) and on the exit() action. Activities may * only transition in one direction; the activity transition diagram is a * directed acyclic graph. The activity transition diagram is as follows: * * * +----------+ +--------+ +--------+ * | INACTIVE |------------------>| WARMUP |------------------>| ACTIVE | * +----------+ dtrace_go(), +--------+ dtrace_go(), +--------+ * before BEGIN | after BEGIN | | | * | | | | * exit() action | | | | * from BEGIN ECB | | | | * | | | | * v | | | * +----------+ exit() action | | | * | DRAINING |<-------------------+ | | * +----------+ | | * | | | * dtrace_stop(), | | | * before END | | | * | | | * v | | * +---------+ +----------+ | | * | STOPPED |<------------------| COOLDOWN |<----------------------+ | * +---------+ dtrace_stop(), +----------+ dtrace_stop(), | * after END before END | * | * +--------+ | * | KILLED |<--------------------------+ * +--------+ deadman timeout * * Note that once a DTrace consumer has stopped tracing, there is no way to
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -