📄 kernel-locking.tmpl
字号:
performance. If you're going so far as to use a softirq, you probably care about scalable performance enough to justify the extra complexity. </para> <para> You'll need to use <function>spin_lock()</function> and <function>spin_unlock()</function> for shared data. </para> </sect2> <sect2 id="lock-softirqs-different"> <title>Different Softirqs</title> <para> You'll need to use <function>spin_lock()</function> and <function>spin_unlock()</function> for shared data, whether it be a timer (which can be running on a different CPU), bottom half, tasklet or the same or another softirq. </para> </sect2> </sect1> </chapter> <chapter id="hardirq-context"> <title>Hard IRQ Context</title> <para> Hardware interrupts usually communicate with a bottom half, tasklet or softirq. Frequently this involves putting work in a queue, which the BH/softirq will take out. </para> <sect1 id="hardirq-softirq"> <title>Locking Between Hard IRQ and Softirqs/Tasklets/BHs</title> <para> If a hardware irq handler shares data with a softirq, you have two concerns. Firstly, the softirq processing can be interrupted by a hardware interrupt, and secondly, the critical region could be entered by a hardware interrupt on another CPU. This is where <function>spin_lock_irq()</function> is used. It is defined to disable interrupts on that cpu, then grab the lock. <function>spin_unlock_irq()</function> does the reverse. </para> <para> This works perfectly for UP as well: the spin lock vanishes, and this macro simply becomes <function>local_irq_disable()</function> (<filename class=headerfile>include/asm/smp.h</filename>), which protects you from the softirq/tasklet/BH being run. </para> <para> <function>spin_lock_irqsave()</function> (<filename>include/linux/spinlock.h</filename>) is a variant which saves whether interrupts were on or off in a flags word, which is passed to <function>spin_lock_irqrestore()</function>. This means that the same code can be used inside an hard irq handler (where interrupts are already off) and in softirqs (where the irq disabling is required). </para> </sect1> </chapter> <chapter id="common-techniques"> <title>Common Techniques</title> <para> This section lists some common dilemmas and the standard solutions used in the Linux kernel code. If you use these, people will find your code simpler to understand. </para> <para> If I could give you one piece of advice: never sleep with anyone crazier than yourself. But if I had to give you advice on locking: <emphasis>keep it simple</emphasis>. </para> <para> Lock data, not code. </para> <para> Be reluctant to introduce new locks. </para> <para> Strangely enough, this is the exact reverse of my advice when you <emphasis>have</emphasis> slept with someone crazier than yourself. </para> <sect1 id="techniques-no-writers"> <title>No Writers in Interrupt Context</title> <para> There is a fairly common case where an interrupt handler needs access to a critical region, but does not need write access. In this case, you do not need to use <function>read_lock_irq()</function>, but only <function>read_lock()</function> everywhere (since if an interrupt occurs, the irq handler will only try to grab a read lock, which won't deadlock). You will still need to use <function>write_lock_irq()</function>. </para> <para> Similar logic applies to locking between softirqs/tasklets/BHs which never need a write lock, and user context: <function>read_lock()</function> and <function>write_lock_bh()</function>. </para> </sect1> <sect1 id="techniques-deadlocks"> <title>Deadlock: Simple and Advanced</title> <para> There is a coding bug where a piece of code tries to grab a spinlock twice: it will spin forever, waiting for the lock to be released (spinlocks, rwlocks and semaphores are not recursive in Linux). This is trivial to diagnose: not a stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem. </para> <para> For a slightly more complex case, imagine you have a region shared by a bottom half and user context. If you use a <function>spin_lock()</function> call to protect it, it is possible that the user context will be interrupted by the bottom half while it holds the lock, and the bottom half will then spin forever trying to get the same lock. </para> <para> Both of these are called deadlock, and as shown above, it can occur even with a single CPU (although not on UP compiles, since spinlocks vanish on kernel compiles with <symbol>CONFIG_SMP</symbol>=n. You'll still get data corruption in the second example). </para> <para> This complete lockup is easy to diagnose: on SMP boxes the watchdog timer or compiling with <symbol>DEBUG_SPINLOCKS</symbol> set (<filename>include/linux/spinlock.h</filename>) will show this up immediately when it happens. </para> <para> A more complex problem is the so-called `deadly embrace', involving two or more locks. Say you have a hash table: each entry in the table is a spinlock, and a chain of hashed objects. Inside a softirq handler, you sometimes want to alter an object from one place in the hash to another: you grab the spinlock of the old hash chain and the spinlock of the new hash chain, and delete the object from the old one, and insert it in the new one. </para> <para> There are two problems here. First, if your code ever tries to move the object to the same chain, it will deadlock with itself as it tries to lock it twice. Secondly, if the same softirq on another CPU is trying to move another object in the reverse direction, the following could happen: </para> <table> <title>Consequences</title> <tgroup cols=2 align=left> <thead> <row> <entry>CPU 1</entry> <entry>CPU 2</entry> </row> </thead> <tbody> <row> <entry>Grab lock A -> OK</entry> <entry>Grab lock B -> OK</entry> </row> <row> <entry>Grab lock B -> spin</entry> <entry>Grab lock A -> spin</entry> </row> </tbody> </tgroup> </table> <para> The two CPUs will spin forever, waiting for the other to give up their lock. It will look, smell, and feel like a crash. </para> <sect2 id="techs-deadlock-prevent"> <title>Preventing Deadlock</title> <para> Textbooks will tell you that if you always lock in the same order, you will never get this kind of deadlock. Practice will tell you that this approach doesn't scale: when I create a new lock, I don't understand enough of the kernel to figure out where in the 5000 lock hierarchy it will fit. </para> <para> The best locks are encapsulated: they never get exposed in headers, and are never held around calls to non-trivial functions outside the same file. You can read through this code and see that it will never deadlock, because it never tries to grab another lock while it has that one. People using your code don't even need to know you are using a lock. </para> <para> A classic problem here is when you provide callbacks or hooks: if you call these with the lock held, you risk simple deadlock, or a deadly embrace (who knows what the callback will do?). Remember, the other programmers are out to get you, so don't do this. </para> </sect2> <sect2 id="techs-deadlock-overprevent"> <title>Overzealous Prevention Of Deadlocks</title> <para> Deadlocks are problematic, but not as bad as data corruption. Code which grabs a read lock, searches a list, fails to find what it wants, drops the read lock, grabs a write lock and inserts the object has a race condition. </para> <para> If you don't see why, please stay the fuck away from my code. </para> </sect2> </sect1> <sect1 id="per-cpu"> <title>Per-CPU Data</title> <para> A great technique for avoiding locking which is used fairly widely is to duplicate information for each CPU. For example, if you wanted to keep a count of a common condition, you could use a spin lock and a single counter. Nice and simple. </para> <para> If that was too slow [it's probably not], you could instead use a counter for each CPU [don't], then none of them need an exclusive lock [you're wasting your time here]. To make sure the CPUs don't have to synchronize caches all the time, align the counters to cache boundaries by appending `__cacheline_aligned' to the declaration (<filename class=headerfile>include/linux/cache.h</filename>). [Can't you think of anything better to do?] </para> <para> They will need a read lock to access their own counters, however. That way you can use a write lock to grant exclusive access to all of them at once, to tally them up. </para> </sect1> <sect1 id="brlock"> <title>Big Reader Locks</title> <para> A classic example of per-CPU information is Ingo's `big reader' locks (<filename class=headerfile>linux/include/brlock.h</filename>). These use the Per-CPU Data techniques described above to create a lock which is very fast to get a read lock, but agonizingly slow for a write lock. </para> <para> Fortunately, there are a limited number of these locks available: you have to go through a strict interview process to get one. </para> </sect1> <sect1 id="lock-avoidance-rw"> <title>Avoiding Locks: Read And Write Ordering</title> <para> Sometimes it is possible to avoid locking. Consider the following case from the 2.2 firewall code, which inserted an element into a single linked list in user context: </para> <programlisting> new->next = i->next; i->next = new; </programlisting> <para> Here the author (Alan Cox, who knows what he's doing) assumes that the pointer assignments are atomic. This is important, because networking packets would traverse this list on bottom halves without a lock. Depending on their exact timing, they would either see the new element in the list with a valid <structfield>next</structfield> pointer, or it would not be in the list yet. A lock is still required against other CPUs inserting or deleting from the list, of course. </para> <para> Of course, the writes <emphasis>must</emphasis> be in this order, otherwise the new element appears in the list with an invalid <structfield>next</structfield> pointer, and any other CPU iterating at the wrong time will jump through it into garbage. Because modern CPUs reorder, Alan's code actually read as follows: </para> <programlisting> new->next = i->next; wmb(); i->next = new; </programlisting> <para> The <function>wmb()</function> is a write memory barrier (<filename class=headerfile>include/asm/system.h</filename>): neither the compiler nor the CPU will allow any writes to memory after the <function>wmb()</function> to be visible to other hardware before any of the writes before the <function>wmb()</function>. </para> <para> As i386 does not do write reordering, this bug would never show up on that platform. On other SMP platforms, however, it will. </para> <para> There is also <function>rmb()</function> for read ordering: to ensure any previous variable reads occur before following reads. The simple <function>mb()</function> macro combines both <function>rmb()</function> and <function>wmb()</function>. </para> <para> Some atomic operations are defined to act as a memory barrier (ie. as per the <function>mb()</function> macro, but if in doubt, be explicit. <!-- Rusty Russell 2 May 2001, 2.4.4 --> Also, spinlock operations act as partial barriers: operations after gaining a spinlock will never be moved to precede the <function>spin_lock()</function> call, and operations before releasing a spinlock will never be moved after the <function>spin_unlock()</function> call. <!-- Manfred Spraul <manfreds@colorfullife.com> 24 May 2000 2.3.99-pre9 --> </para> </sect1> <sect1 id="lock-avoidance-atomic-ops"> <title>Avoiding Locks: Atomic Operations</title> <para> There are a number of atomic operations defined in <filename class=headerfile>include/asm/atomic.h</filename>: these are guaranteed to be seen atomically from all CPUs in the system, thus avoiding races. If your shared data consists of a single counter, say, these operations might be simpler than using spinlocks (although for anything non-trivial using spinlocks is clearer). </para> <para> Note that the atomic operations are defined to act as both read and write barriers on all platforms. </para> </sect1> <sect1 id="ref-counts"> <title>Protecting A Collection of Objects: Reference Counts</title> <para> Locking a collection of objects is fairly easy: you get a single spinlock, and you make sure you grab it before searching, adding or deleting an object. </para> <para> The purpose of this lock is not to protect the individual objects: you might have a separate lock inside each one for that. It is to protect the <emphasis>data structure containing the objects</emphasis> from race conditions. Often the same lock is used to protect the contents of all the objects as well, for simplicity, but they are inherently orthogonal (and many other big words designed to confuse). </para> <para> Changing this to a read-write lock will often help markedly if
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -