📄 whatisrcu.txt
字号:
What is RCU?RCU is a synchronization mechanism that was added to the Linux kernelduring the 2.5 development effort that is optimized for read-mostlysituations. Although RCU is actually quite simple once you understand it,getting there can sometimes be a challenge. Part of the problem is thatmost of the past descriptions of RCU have been written with the mistakenassumption that there is "one true way" to describe RCU. Instead,the experience has been that different people must take different pathsto arrive at an understanding of RCU. This document provides severaldifferent paths, as follows:1. RCU OVERVIEW2. WHAT IS RCU'S CORE API?3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?4. WHAT IF MY UPDATING THREAD CANNOT BLOCK?5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?6. ANALOGY WITH READER-WRITER LOCKING7. FULL LIST OF RCU APIs8. ANSWERS TO QUICK QUIZZESPeople who prefer starting with a conceptual overview should focus onSection 1, though most readers will profit by reading this section atsome point. People who prefer to start with an API that they can thenexperiment with should focus on Section 2. People who prefer to startwith example uses should focus on Sections 3 and 4. People who need tounderstand the RCU implementation should focus on Section 5, then diveinto the kernel source code. People who reason best by analogy shouldfocus on Section 6. Section 7 serves as an index to the docbook APIdocumentation, and Section 8 is the traditional answer key.So, start with the section that makes the most sense to you and yourpreferred method of learning. If you need to know everything abouteverything, feel free to read the whole thing -- but if you are reallythat type of person, you have perused the source code and will thereforenever need this document anyway. ;-)1. RCU OVERVIEWThe basic idea behind RCU is to split updates into "removal" and"reclamation" phases. The removal phase removes references to data itemswithin a data structure (possibly by replacing them with references tonew versions of these data items), and can run concurrently with readers.The reason that it is safe to run the removal phase concurrently withreaders is the semantics of modern CPUs guarantee that readers will seeeither the old or the new version of the data structure rather than apartially updated reference. The reclamation phase does the work of reclaiming(e.g., freeing) the data items removed from the data structure during theremoval phase. Because reclaiming data items can disrupt any readersconcurrently referencing those data items, the reclamation phase mustnot start until readers no longer hold references to those data items.Splitting the update into removal and reclamation phases permits theupdater to perform the removal phase immediately, and to defer thereclamation phase until all readers active during the removal phase havecompleted, either by blocking until they finish or by registering acallback that is invoked after they finish. Only readers that are activeduring the removal phase need be considered, because any reader startingafter the removal phase will be unable to gain a reference to the removeddata items, and therefore cannot be disrupted by the reclamation phase.So the typical RCU update sequence goes something like the following:a. Remove pointers to a data structure, so that subsequent readers cannot gain a reference to it.b. Wait for all previous readers to complete their RCU read-side critical sections.c. At this point, there cannot be any readers who hold references to the data structure, so it now may safely be reclaimed (e.g., kfree()d).Step (b) above is the key idea underlying RCU's deferred destruction.The ability to wait until all readers are done allows RCU readers touse much lighter-weight synchronization, in some cases, absolutely nosynchronization at all. In contrast, in more conventional lock-basedschemes, readers must use heavy-weight synchronization in order toprevent an updater from deleting the data structure out from under them.This is because lock-based updaters typically update data items in place,and must therefore exclude readers. In contrast, RCU-based updaterstypically take advantage of the fact that writes to single alignedpointers are atomic on modern CPUs, allowing atomic insertion, removal,and replacement of data items in a linked structure without disruptingreaders. Concurrent RCU readers can then continue accessing the oldversions, and can dispense with the atomic operations, memory barriers,and communications cache misses that are so expensive on present-daySMP computer systems, even in absence of lock contention.In the three-step procedure shown above, the updater is performing boththe removal and the reclamation step, but it is often helpful for anentirely different thread to do the reclamation, as is in fact the casein the Linux kernel's directory-entry cache (dcache). Even if the samethread performs both the update step (step (a) above) and the reclamationstep (step (c) above), it is often helpful to think of them separately.For example, RCU readers and updaters need not communicate at all,but RCU provides implicit low-overhead communication between readersand reclaimers, namely, in step (b) above.So how the heck can a reclaimer tell when a reader is done, giventhat readers are not doing any sort of synchronization operations???Read on to learn about how RCU's API makes this easy.2. WHAT IS RCU'S CORE API?The core RCU API is quite small:a. rcu_read_lock()b. rcu_read_unlock()c. synchronize_rcu() / call_rcu()d. rcu_assign_pointer()e. rcu_dereference()There are many other members of the RCU API, but the rest can beexpressed in terms of these five, though most implementations insteadexpress synchronize_rcu() in terms of the call_rcu() callback API.The five core RCU APIs are described below, the other 18 will be enumeratedlater. See the kernel docbook documentation for more info, or look directlyat the function header comments.rcu_read_lock() void rcu_read_lock(void); Used by a reader to inform the reclaimer that the reader is entering an RCU read-side critical section. It is illegal to block while in an RCU read-side critical section, though kernels built with CONFIG_PREEMPT_RCU can preempt RCU read-side critical sections. Any RCU-protected data structure accessed during an RCU read-side critical section is guaranteed to remain unreclaimed for the full duration of that critical section. Reference counts may be used in conjunction with RCU to maintain longer-term references to data structures.rcu_read_unlock() void rcu_read_unlock(void); Used by a reader to inform the reclaimer that the reader is exiting an RCU read-side critical section. Note that RCU read-side critical sections may be nested and/or overlapping.synchronize_rcu() void synchronize_rcu(void); Marks the end of updater code and the beginning of reclaimer code. It does this by blocking until all pre-existing RCU read-side critical sections on all CPUs have completed. Note that synchronize_rcu() will -not- necessarily wait for any subsequent RCU read-side critical sections to complete. For example, consider the following sequence of events: CPU 0 CPU 1 CPU 2 ----------------- ------------------------- --------------- 1. rcu_read_lock() 2. enters synchronize_rcu() 3. rcu_read_lock() 4. rcu_read_unlock() 5. exits synchronize_rcu() 6. rcu_read_unlock() To reiterate, synchronize_rcu() waits only for ongoing RCU read-side critical sections to complete, not necessarily for any that begin after synchronize_rcu() is invoked. Of course, synchronize_rcu() does not necessarily return -immediately- after the last pre-existing RCU read-side critical section completes. For one thing, there might well be scheduling delays. For another thing, many RCU implementations process requests in batches in order to improve efficiencies, which can further delay synchronize_rcu(). Since synchronize_rcu() is the API that must figure out when readers are done, its implementation is key to RCU. For RCU to be useful in all but the most read-intensive situations, synchronize_rcu()'s overhead must also be quite small. The call_rcu() API is a callback form of synchronize_rcu(), and is described in more detail in a later section. Instead of blocking, it registers a function and argument which are invoked after all ongoing RCU read-side critical sections have completed. This callback variant is particularly useful in situations where it is illegal to block or where update-side performance is critically important. However, the call_rcu() API should not be used lightly, as use of the synchronize_rcu() API generally results in simpler code. In addition, the synchronize_rcu() API has the nice property of automatically limiting update rate should grace periods be delayed. This property results in system resilience in face of denial-of-service attacks. Code using call_rcu() should limit update rate in order to gain this same sort of resilience. See checklist.txt for some approaches to limiting the update rate.rcu_assign_pointer() typeof(p) rcu_assign_pointer(p, typeof(p) v); Yes, rcu_assign_pointer() -is- implemented as a macro, though it would be cool to be able to declare a function in this manner. (Compiler experts will no doubt disagree.) The updater uses this function to assign a new value to an RCU-protected pointer, in order to safely communicate the change in value from the updater to the reader. This function returns the new value, and also executes any memory-barrier instructions required for a given CPU architecture. Perhaps just as important, it serves to document (1) which pointers are protected by RCU and (2) the point at which a given structure becomes accessible to other CPUs. That said, rcu_assign_pointer() is most frequently used indirectly, via the _rcu list-manipulation primitives such as list_add_rcu().rcu_dereference() typeof(p) rcu_dereference(p); Like rcu_assign_pointer(), rcu_dereference() must be implemented as a macro. The reader uses rcu_dereference() to fetch an RCU-protected pointer, which returns a value that may then be safely dereferenced. Note that rcu_deference() does not actually dereference the pointer, instead, it protects the pointer for later dereferencing. It also executes any needed memory-barrier instructions for a given CPU architecture. Currently, only Alpha needs memory barriers within rcu_dereference() -- on other CPUs, it compiles to nothing, not even a compiler directive. Common coding practice uses rcu_dereference() to copy an RCU-protected pointer to a local variable, then dereferences this local variable, for example as follows: p = rcu_dereference(head.next); return p->data; However, in this case, one could just as easily combine these into one statement: return rcu_dereference(head.next)->data; If you are going to be fetching multiple fields from the RCU-protected structure, using the local variable is of course preferred. Repeated rcu_dereference() calls look ugly and incur unnecessary overhead on Alpha CPUs. Note that the value returned by rcu_dereference() is valid only within the enclosing RCU read-side critical section. For example, the following is -not- legal: rcu_read_lock(); p = rcu_dereference(head.next); rcu_read_unlock(); x = p->address; rcu_read_lock(); y = p->data; rcu_read_unlock(); Holding a reference from one RCU read-side critical section to another is just as illegal as holding a reference from one lock-based critical section to another! Similarly, using a reference outside of the critical section in which it was acquired is just as illegal as doing so with normal locking. As with rcu_assign_pointer(), an important function of rcu_dereference() is to document which pointers are protected by RCU, in particular, flagging a pointer that is subject to changing at any time, including immediately after the rcu_dereference(). And, again like rcu_assign_pointer(), rcu_dereference() is typically used indirectly, via the _rcu list-manipulation primitives, such as list_for_each_entry_rcu().The following diagram shows how each API communicates among thereader, updater, and reclaimer. rcu_assign_pointer() +--------+ +---------------------->| reader |---------+ | +--------+ | | | | | | | Protect: | | | rcu_read_lock() | | | rcu_read_unlock() | rcu_dereference() | | +---------+ | | | updater |<---------------------+ | +---------+ V | +-----------+ +----------------------------------->| reclaimer | +-----------+ Defer: synchronize_rcu() & call_rcu()The RCU infrastructure observes the time sequence of rcu_read_lock(),rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations inorder to determine when (1) synchronize_rcu() invocations may returnto their callers and (2) call_rcu() callbacks may be invoked. Efficientimplementations of the RCU infrastructure make heavy use of batching inorder to amortize their overhead over many uses of the corresponding APIs.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -