📄 whatisrcu.txt

📁 linux 内核源代码
💻 TXT
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
	{		int cpu;		for_each_possible_cpu(cpu)			run_on(cpu);	}Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.This is the great strength of classic RCU in a non-preemptive kernel:read-side overhead is precisely zero, at least on non-Alpha CPUs.And there is absolutely no way that rcu_read_lock() can possiblyparticipate in a deadlock cycle!The implementation of synchronize_rcu() simply schedules itself on eachCPU in turn.  The run_on() primitive can be implemented straightforwardlyin terms of the sched_setaffinity() primitive.  Of course, a somewhat less"toy" implementation would restore the affinity upon completion ratherthan just leaving all tasks running on the last CPU, but when I said"toy", I meant -toy-!So how the heck is this supposed to work???Remember that it is illegal to block while in an RCU read-side criticalsection.  Therefore, if a given CPU executes a context switch, we knowthat it must have completed all preceding RCU read-side critical sections.Once -all- CPUs have executed a context switch, then -all- precedingRCU read-side critical sections will have completed.So, suppose that we remove a data item from its structure and then invokesynchronize_rcu().  Once synchronize_rcu() returns, we are guaranteedthat there are no RCU read-side critical sections holding a referenceto that data item, so we can safely reclaim it.Quick Quiz #2:	Give an example where Classic RCU's read-side		overhead is -negative-.Quick Quiz #3:  If it is illegal to block in an RCU read-side		critical section, what the heck do you do in		PREEMPT_RT, where normal spinlocks can block???6.  ANALOGY WITH READER-WRITER LOCKINGAlthough RCU can be used in many different ways, a very common use ofRCU is analogous to reader-writer locking.  The following unifieddiff shows how closely related RCU and reader-writer locking can be.	@@ -13,15 +14,15 @@		struct list_head *lp;		struct el *p;	-	read_lock();	-	list_for_each_entry(p, head, lp) {	+	rcu_read_lock();	+	list_for_each_entry_rcu(p, head, lp) {			if (p->key == key) {				*result = p->data;	-			read_unlock();	+			rcu_read_unlock();				return 1;			}		}	-	read_unlock();	+	rcu_read_unlock();		return 0;	 }	@@ -29,15 +30,16 @@	 {		struct el *p;	-	write_lock(&listmutex);	+	spin_lock(&listmutex);		list_for_each_entry(p, head, lp) {			if (p->key == key) {	-			list_del(&p->list);	-			write_unlock(&listmutex);	+			list_del_rcu(&p->list);	+			spin_unlock(&listmutex);	+			synchronize_rcu();				kfree(p);				return 1;			}		}	-	write_unlock(&listmutex);	+	spin_unlock(&listmutex);		return 0;	 }Or, for those who prefer a side-by-side listing: 1 struct el {                          1 struct el { 2   struct list_head list;             2   struct list_head list; 3   long key;                          3   long key; 4   spinlock_t mutex;                  4   spinlock_t mutex; 5   int data;                          5   int data; 6   /* Other data fields */            6   /* Other data fields */ 7 };                                   7 }; 8 spinlock_t listmutex;                8 spinlock_t listmutex; 9 struct el head;                      9 struct el head; 1 int search(long key, int *result)    1 int search(long key, int *result) 2 {                                    2 { 3   struct list_head *lp;              3   struct list_head *lp; 4   struct el *p;                      4   struct el *p; 5                                      5 6   read_lock();                       6   rcu_read_lock(); 7   list_for_each_entry(p, head, lp) { 7   list_for_each_entry_rcu(p, head, lp) { 8     if (p->key == key) {             8     if (p->key == key) { 9       *result = p->data;             9       *result = p->data;10       read_unlock();                10       rcu_read_unlock();11       return 1;                     11       return 1;12     }                               12     }13   }                                 13   }14   read_unlock();                    14   rcu_read_unlock();15   return 0;                         15   return 0;16 }                                   16 } 1 int delete(long key)                 1 int delete(long key) 2 {                                    2 { 3   struct el *p;                      3   struct el *p; 4                                      4 5   write_lock(&listmutex);            5   spin_lock(&listmutex); 6   list_for_each_entry(p, head, lp) { 6   list_for_each_entry(p, head, lp) { 7     if (p->key == key) {             7     if (p->key == key) { 8       list_del(&p->list);            8       list_del_rcu(&p->list); 9       write_unlock(&listmutex);      9       spin_unlock(&listmutex);                                       10       synchronize_rcu();10       kfree(p);                     11       kfree(p);11       return 1;                     12       return 1;12     }                               13     }13   }                                 14   }14   write_unlock(&listmutex);         15   spin_unlock(&listmutex);15   return 0;                         16   return 0;16 }                                   17 }Either way, the differences are quite small.  Read-side locking movesto rcu_read_lock() and rcu_read_unlock, update-side locking moves froma reader-writer lock to a simple spinlock, and a synchronize_rcu()precedes the kfree().However, there is one potential catch: the read-side and update-sidecritical sections can now run concurrently.  In many cases, this willnot be a problem, but it is necessary to check carefully regardless.For example, if multiple independent list updates must be seen asa single atomic update, converting to RCU will require special care.Also, the presence of synchronize_rcu() means that the RCU version ofdelete() can now block.  If this is a problem, there is a callback-basedmechanism that never blocks, namely call_rcu(), that can be used inplace of synchronize_rcu().7.  FULL LIST OF RCU APIsThe RCU APIs are documented in docbook-format header comments in theLinux-kernel source code, but it helps to have a full list of theAPIs, since there does not appear to be a way to categorize themin docbook.  Here is the list, by category.Markers for RCU read-side critical sections:	rcu_read_lock	rcu_read_unlock	rcu_read_lock_bh	rcu_read_unlock_bh	srcu_read_lock	srcu_read_unlockRCU pointer/list traversal:	rcu_dereference	list_for_each_rcu		(to be deprecated in favor of					 list_for_each_entry_rcu)	list_for_each_entry_rcu	list_for_each_continue_rcu	(to be deprecated in favor of new					 list_for_each_entry_continue_rcu)	hlist_for_each_entry_rcuRCU pointer update:	rcu_assign_pointer	list_add_rcu	list_add_tail_rcu	list_del_rcu	list_replace_rcu	hlist_del_rcu	hlist_add_head_rcuRCU grace period:	synchronize_net	synchronize_sched	synchronize_rcu	synchronize_srcu	call_rcu	call_rcu_bhSee the comment headers in the source code (or the docbook generatedfrom them) for more information.8.  ANSWERS TO QUICK QUIZZESQuick Quiz #1:	Why is this argument naive?  How could a deadlock		occur when using this algorithm in a real-world Linux		kernel?  [Referring to the lock-based "toy" RCU		algorithm.]Answer:		Consider the following sequence of events:		1.	CPU 0 acquires some unrelated lock, call it			"problematic_lock", disabling irq via			spin_lock_irqsave().		2.	CPU 1 enters synchronize_rcu(), write-acquiring			rcu_gp_mutex.		3.	CPU 0 enters rcu_read_lock(), but must wait			because CPU 1 holds rcu_gp_mutex.		4.	CPU 1 is interrupted, and the irq handler			attempts to acquire problematic_lock.		The system is now deadlocked.		One way to avoid this deadlock is to use an approach like		that of CONFIG_PREEMPT_RT, where all normal spinlocks		become blocking locks, and all irq handlers execute in		the context of special tasks.  In this case, in step 4		above, the irq handler would block, allowing CPU 1 to		release rcu_gp_mutex, avoiding the deadlock.		Even in the absence of deadlock, this RCU implementation		allows latency to "bleed" from readers to other		readers through synchronize_rcu().  To see this,		consider task A in an RCU read-side critical section		(thus read-holding rcu_gp_mutex), task B blocked		attempting to write-acquire rcu_gp_mutex, and		task C blocked in rcu_read_lock() attempting to		read_acquire rcu_gp_mutex.  Task A's RCU read-side		latency is holding up task C, albeit indirectly via		task B.		Realtime RCU implementations therefore use a counter-based		approach where tasks in RCU read-side critical sections		cannot be blocked by tasks executing synchronize_rcu().Quick Quiz #2:	Give an example where Classic RCU's read-side		overhead is -negative-.Answer:		Imagine a single-CPU system with a non-CONFIG_PREEMPT		kernel where a routing table is used by process-context		code, but can be updated by irq-context code (for example,		by an "ICMP REDIRECT" packet).	The usual way of handling		this would be to have the process-context code disable		interrupts while searching the routing table.  Use of		RCU allows such interrupt-disabling to be dispensed with.		Thus, without RCU, you pay the cost of disabling interrupts,		and with RCU you don't.		One can argue that the overhead of RCU in this		case is negative with respect to the single-CPU		interrupt-disabling approach.  Others might argue that		the overhead of RCU is merely zero, and that replacing		the positive overhead of the interrupt-disabling scheme		with the zero-overhead RCU scheme does not constitute		negative overhead.		In real life, of course, things are more complex.  But		even the theoretical possibility of negative overhead for		a synchronization primitive is a bit unexpected.  ;-)Quick Quiz #3:  If it is illegal to block in an RCU read-side		critical section, what the heck do you do in		PREEMPT_RT, where normal spinlocks can block???Answer:		Just as PREEMPT_RT permits preemption of spinlock		critical sections, it permits preemption of RCU		read-side critical sections.  It also permits		spinlocks blocking while in RCU read-side critical		sections.		Why the apparent inconsistency?  Because it is it		possible to use priority boosting to keep the RCU		grace periods short if need be (for example, if running		short of memory).  In contrast, if blocking waiting		for (say) network reception, there is no way to know		what should be boosted.  Especially given that the		process we need to boost might well be a human being		who just went out for a pizza or something.  And although		a computer-operated cattle prod might arouse serious		interest, it might also provoke serious objections.		Besides, how does the computer know what pizza parlor		the human being went to???ACKNOWLEDGEMENTSMy thanks to the people who helped make this human-readable, includingJon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.For more information, see http://www.rdrop.com/users/paulmck/RCU.
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -