📄 whatisrcu.txt
字号:
{ int cpu; for_each_possible_cpu(cpu) run_on(cpu); }Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.This is the great strength of classic RCU in a non-preemptive kernel:read-side overhead is precisely zero, at least on non-Alpha CPUs.And there is absolutely no way that rcu_read_lock() can possiblyparticipate in a deadlock cycle!The implementation of synchronize_rcu() simply schedules itself on eachCPU in turn. The run_on() primitive can be implemented straightforwardlyin terms of the sched_setaffinity() primitive. Of course, a somewhat less"toy" implementation would restore the affinity upon completion ratherthan just leaving all tasks running on the last CPU, but when I said"toy", I meant -toy-!So how the heck is this supposed to work???Remember that it is illegal to block while in an RCU read-side criticalsection. Therefore, if a given CPU executes a context switch, we knowthat it must have completed all preceding RCU read-side critical sections.Once -all- CPUs have executed a context switch, then -all- precedingRCU read-side critical sections will have completed.So, suppose that we remove a data item from its structure and then invokesynchronize_rcu(). Once synchronize_rcu() returns, we are guaranteedthat there are no RCU read-side critical sections holding a referenceto that data item, so we can safely reclaim it.Quick Quiz #2: Give an example where Classic RCU's read-side overhead is -negative-.Quick Quiz #3: If it is illegal to block in an RCU read-side critical section, what the heck do you do in PREEMPT_RT, where normal spinlocks can block???6. ANALOGY WITH READER-WRITER LOCKINGAlthough RCU can be used in many different ways, a very common use ofRCU is analogous to reader-writer locking. The following unifieddiff shows how closely related RCU and reader-writer locking can be. @@ -13,15 +14,15 @@ struct list_head *lp; struct el *p; - read_lock(); - list_for_each_entry(p, head, lp) { + rcu_read_lock(); + list_for_each_entry_rcu(p, head, lp) { if (p->key == key) { *result = p->data; - read_unlock(); + rcu_read_unlock(); return 1; } } - read_unlock(); + rcu_read_unlock(); return 0; } @@ -29,15 +30,16 @@ { struct el *p; - write_lock(&listmutex); + spin_lock(&listmutex); list_for_each_entry(p, head, lp) { if (p->key == key) { - list_del(&p->list); - write_unlock(&listmutex); + list_del_rcu(&p->list); + spin_unlock(&listmutex); + synchronize_rcu(); kfree(p); return 1; } } - write_unlock(&listmutex); + spin_unlock(&listmutex); return 0; }Or, for those who prefer a side-by-side listing: 1 struct el { 1 struct el { 2 struct list_head list; 2 struct list_head list; 3 long key; 3 long key; 4 spinlock_t mutex; 4 spinlock_t mutex; 5 int data; 5 int data; 6 /* Other data fields */ 6 /* Other data fields */ 7 }; 7 }; 8 spinlock_t listmutex; 8 spinlock_t listmutex; 9 struct el head; 9 struct el head; 1 int search(long key, int *result) 1 int search(long key, int *result) 2 { 2 { 3 struct list_head *lp; 3 struct list_head *lp; 4 struct el *p; 4 struct el *p; 5 5 6 read_lock(); 6 rcu_read_lock(); 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { 8 if (p->key == key) { 8 if (p->key == key) { 9 *result = p->data; 9 *result = p->data;10 read_unlock(); 10 rcu_read_unlock();11 return 1; 11 return 1;12 } 12 }13 } 13 }14 read_unlock(); 14 rcu_read_unlock();15 return 0; 15 return 0;16 } 16 } 1 int delete(long key) 1 int delete(long key) 2 { 2 { 3 struct el *p; 3 struct el *p; 4 4 5 write_lock(&listmutex); 5 spin_lock(&listmutex); 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { 7 if (p->key == key) { 7 if (p->key == key) { 8 list_del(&p->list); 8 list_del_rcu(&p->list); 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); 10 synchronize_rcu();10 kfree(p); 11 kfree(p);11 return 1; 12 return 1;12 } 13 }13 } 14 }14 write_unlock(&listmutex); 15 spin_unlock(&listmutex);15 return 0; 16 return 0;16 } 17 }Either way, the differences are quite small. Read-side locking movesto rcu_read_lock() and rcu_read_unlock, update-side locking moves froma reader-writer lock to a simple spinlock, and a synchronize_rcu()precedes the kfree().However, there is one potential catch: the read-side and update-sidecritical sections can now run concurrently. In many cases, this willnot be a problem, but it is necessary to check carefully regardless.For example, if multiple independent list updates must be seen asa single atomic update, converting to RCU will require special care.Also, the presence of synchronize_rcu() means that the RCU version ofdelete() can now block. If this is a problem, there is a callback-basedmechanism that never blocks, namely call_rcu(), that can be used inplace of synchronize_rcu().7. FULL LIST OF RCU APIsThe RCU APIs are documented in docbook-format header comments in theLinux-kernel source code, but it helps to have a full list of theAPIs, since there does not appear to be a way to categorize themin docbook. Here is the list, by category.Markers for RCU read-side critical sections: rcu_read_lock rcu_read_unlock rcu_read_lock_bh rcu_read_unlock_bh srcu_read_lock srcu_read_unlockRCU pointer/list traversal: rcu_dereference list_for_each_rcu (to be deprecated in favor of list_for_each_entry_rcu) list_for_each_entry_rcu list_for_each_continue_rcu (to be deprecated in favor of new list_for_each_entry_continue_rcu) hlist_for_each_entry_rcuRCU pointer update: rcu_assign_pointer list_add_rcu list_add_tail_rcu list_del_rcu list_replace_rcu hlist_del_rcu hlist_add_head_rcuRCU grace period: synchronize_net synchronize_sched synchronize_rcu synchronize_srcu call_rcu call_rcu_bhSee the comment headers in the source code (or the docbook generatedfrom them) for more information.8. ANSWERS TO QUICK QUIZZESQuick Quiz #1: Why is this argument naive? How could a deadlock occur when using this algorithm in a real-world Linux kernel? [Referring to the lock-based "toy" RCU algorithm.]Answer: Consider the following sequence of events: 1. CPU 0 acquires some unrelated lock, call it "problematic_lock", disabling irq via spin_lock_irqsave(). 2. CPU 1 enters synchronize_rcu(), write-acquiring rcu_gp_mutex. 3. CPU 0 enters rcu_read_lock(), but must wait because CPU 1 holds rcu_gp_mutex. 4. CPU 1 is interrupted, and the irq handler attempts to acquire problematic_lock. The system is now deadlocked. One way to avoid this deadlock is to use an approach like that of CONFIG_PREEMPT_RT, where all normal spinlocks become blocking locks, and all irq handlers execute in the context of special tasks. In this case, in step 4 above, the irq handler would block, allowing CPU 1 to release rcu_gp_mutex, avoiding the deadlock. Even in the absence of deadlock, this RCU implementation allows latency to "bleed" from readers to other readers through synchronize_rcu(). To see this, consider task A in an RCU read-side critical section (thus read-holding rcu_gp_mutex), task B blocked attempting to write-acquire rcu_gp_mutex, and task C blocked in rcu_read_lock() attempting to read_acquire rcu_gp_mutex. Task A's RCU read-side latency is holding up task C, albeit indirectly via task B. Realtime RCU implementations therefore use a counter-based approach where tasks in RCU read-side critical sections cannot be blocked by tasks executing synchronize_rcu().Quick Quiz #2: Give an example where Classic RCU's read-side overhead is -negative-.Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT kernel where a routing table is used by process-context code, but can be updated by irq-context code (for example, by an "ICMP REDIRECT" packet). The usual way of handling this would be to have the process-context code disable interrupts while searching the routing table. Use of RCU allows such interrupt-disabling to be dispensed with. Thus, without RCU, you pay the cost of disabling interrupts, and with RCU you don't. One can argue that the overhead of RCU in this case is negative with respect to the single-CPU interrupt-disabling approach. Others might argue that the overhead of RCU is merely zero, and that replacing the positive overhead of the interrupt-disabling scheme with the zero-overhead RCU scheme does not constitute negative overhead. In real life, of course, things are more complex. But even the theoretical possibility of negative overhead for a synchronization primitive is a bit unexpected. ;-)Quick Quiz #3: If it is illegal to block in an RCU read-side critical section, what the heck do you do in PREEMPT_RT, where normal spinlocks can block???Answer: Just as PREEMPT_RT permits preemption of spinlock critical sections, it permits preemption of RCU read-side critical sections. It also permits spinlocks blocking while in RCU read-side critical sections. Why the apparent inconsistency? Because it is it possible to use priority boosting to keep the RCU grace periods short if need be (for example, if running short of memory). In contrast, if blocking waiting for (say) network reception, there is no way to know what should be boosted. Especially given that the process we need to boost might well be a human being who just went out for a pizza or something. And although a computer-operated cattle prod might arouse serious interest, it might also provoke serious objections. Besides, how does the computer know what pizza parlor the human being went to???ACKNOWLEDGEMENTSMy thanks to the people who helped make this human-readable, includingJon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.For more information, see http://www.rdrop.com/users/paulmck/RCU.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -