📄 kernel-locking.tmpl
字号:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []><book id="LKLockingGuide"> <bookinfo> <title>Unreliable Guide To Locking</title> <authorgroup> <author> <firstname>Rusty</firstname> <surname>Russell</surname> <affiliation> <address> <email>rusty@rustcorp.com.au</email> </address> </affiliation> </author> </authorgroup> <copyright> <year>2003</year> <holder>Rusty Russell</holder> </copyright> <legalnotice> <para> This documentation is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. </para> <para> This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. </para> <para> You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA </para> <para> For more details see the file COPYING in the source distribution of Linux. </para> </legalnotice> </bookinfo> <toc></toc> <chapter id="intro"> <title>Introduction</title> <para> Welcome, to Rusty's Remarkably Unreliable Guide to Kernel Locking issues. This document describes the locking systems in the Linux Kernel in 2.6. </para> <para> With the wide availability of HyperThreading, and <firstterm linkend="gloss-preemption">preemption </firstterm> in the Linux Kernel, everyone hacking on the kernel needs to know the fundamentals of concurrency and locking for <firstterm linkend="gloss-smp"><acronym>SMP</acronym></firstterm>. </para> </chapter> <chapter id="races"> <title>The Problem With Concurrency</title> <para> (Skip this if you know what a Race Condition is). </para> <para> In a normal program, you can increment a counter like so: </para> <programlisting> very_important_count++; </programlisting> <para> This is what they would expect to happen: </para> <table> <title>Expected Results</title> <tgroup cols="2" align="left"> <thead> <row> <entry>Instance 1</entry> <entry>Instance 2</entry> </row> </thead> <tbody> <row> <entry>read very_important_count (5)</entry> <entry></entry> </row> <row> <entry>add 1 (6)</entry> <entry></entry> </row> <row> <entry>write very_important_count (6)</entry> <entry></entry> </row> <row> <entry></entry> <entry>read very_important_count (6)</entry> </row> <row> <entry></entry> <entry>add 1 (7)</entry> </row> <row> <entry></entry> <entry>write very_important_count (7)</entry> </row> </tbody> </tgroup> </table> <para> This is what might happen: </para> <table> <title>Possible Results</title> <tgroup cols="2" align="left"> <thead> <row> <entry>Instance 1</entry> <entry>Instance 2</entry> </row> </thead> <tbody> <row> <entry>read very_important_count (5)</entry> <entry></entry> </row> <row> <entry></entry> <entry>read very_important_count (5)</entry> </row> <row> <entry>add 1 (6)</entry> <entry></entry> </row> <row> <entry></entry> <entry>add 1 (6)</entry> </row> <row> <entry>write very_important_count (6)</entry> <entry></entry> </row> <row> <entry></entry> <entry>write very_important_count (6)</entry> </row> </tbody> </tgroup> </table> <sect1 id="race-condition"> <title>Race Conditions and Critical Regions</title> <para> This overlap, where the result depends on the relative timing of multiple tasks, is called a <firstterm>race condition</firstterm>. The piece of code containing the concurrency issue is called a <firstterm>critical region</firstterm>. And especially since Linux starting running on SMP machines, they became one of the major issues in kernel design and implementation. </para> <para> Preemption can have the same effect, even if there is only one CPU: by preempting one task during the critical region, we have exactly the same race condition. In this case the thread which preempts might run the critical region itself. </para> <para> The solution is to recognize when these simultaneous accesses occur, and use locks to make sure that only one instance can enter the critical region at any time. There are many friendly primitives in the Linux kernel to help you do this. And then there are the unfriendly primitives, but I'll pretend they don't exist. </para> </sect1> </chapter> <chapter id="locks"> <title>Locking in the Linux Kernel</title> <para> If I could give you one piece of advice: never sleep with anyone crazier than yourself. But if I had to give you advice on locking: <emphasis>keep it simple</emphasis>. </para> <para> Be reluctant to introduce new locks. </para> <para> Strangely enough, this last one is the exact reverse of my advice when you <emphasis>have</emphasis> slept with someone crazier than yourself. And you should think about getting a big dog. </para> <sect1 id="lock-intro"> <title>Three Main Types of Kernel Locks: Spinlocks, Mutexes and Semaphores</title> <para> There are three main types of kernel locks. The fundamental type is the spinlock (<filename class="headerfile">include/asm/spinlock.h</filename>), which is a very simple single-holder lock: if you can't get the spinlock, you keep trying (spinning) until you can. Spinlocks are very small and fast, and can be used anywhere. </para> <para> The second type is a mutex (<filename class="headerfile">include/linux/mutex.h</filename>): it is like a spinlock, but you may block holding a mutex. If you can't lock a mutex, your task will suspend itself, and be woken up when the mutex is released. This means the CPU can do something else while you are waiting. There are many cases when you simply can't sleep (see <xref linkend="sleeping-things"/>), and so have to use a spinlock instead. </para> <para> The third type is a semaphore (<filename class="headerfile">include/asm/semaphore.h</filename>): it can have more than one holder at any time (the number decided at initialization time), although it is most commonly used as a single-holder lock (a mutex). If you can't get a semaphore, your task will be suspended and later on woken up - just like for mutexes. </para> <para> Neither type of lock is recursive: see <xref linkend="deadlock"/>. </para> </sect1> <sect1 id="uniprocessor"> <title>Locks and Uniprocessor Kernels</title> <para> For kernels compiled without <symbol>CONFIG_SMP</symbol>, and without <symbol>CONFIG_PREEMPT</symbol> spinlocks do not exist at all. This is an excellent design decision: when no-one else can run at the same time, there is no reason to have a lock. </para> <para> If the kernel is compiled without <symbol>CONFIG_SMP</symbol>, but <symbol>CONFIG_PREEMPT</symbol> is set, then spinlocks simply disable preemption, which is sufficient to prevent any races. For most purposes, we can think of preemption as equivalent to SMP, and not worry about it separately. </para> <para> You should always test your locking code with <symbol>CONFIG_SMP</symbol> and <symbol>CONFIG_PREEMPT</symbol> enabled, even if you don't have an SMP test box, because it will still catch some kinds of locking bugs. </para> <para> Semaphores still exist, because they are required for synchronization between <firstterm linkend="gloss-usercontext">user contexts</firstterm>, as we will see below. </para> </sect1> <sect1 id="usercontextlocking"> <title>Locking Only In User Context</title> <para> If you have a data structure which is only ever accessed from user context, then you can use a simple semaphore (<filename>linux/asm/semaphore.h</filename>) to protect it. This is the most trivial case: you initialize the semaphore to the number of resources available (usually 1), and call <function>down_interruptible()</function> to grab the semaphore, and <function>up()</function> to release it. There is also a <function>down()</function>, which should be avoided, because it will not return if a signal is received. </para> <para> Example: <filename>linux/net/core/netfilter.c</filename> allows registration of new <function>setsockopt()</function> and <function>getsockopt()</function> calls, with <function>nf_register_sockopt()</function>. Registration and de-registration are only done on module load and unload (and boot time, where there is no concurrency), and the list of registrations is only consulted for an unknown <function>setsockopt()</function> or <function>getsockopt()</function> system call. The <varname>nf_sockopt_mutex</varname> is perfect to protect this, especially since the setsockopt and getsockopt calls may well sleep. </para> </sect1> <sect1 id="lock-user-bh"> <title>Locking Between User Context and Softirqs</title> <para> If a <firstterm linkend="gloss-softirq">softirq</firstterm> shares data with user context, you have two problems. Firstly, the current user context can be interrupted by a softirq, and secondly, the critical region could be entered from another CPU. This is where <function>spin_lock_bh()</function> (<filename class="headerfile">include/linux/spinlock.h</filename>) is used. It disables softirqs on that CPU, then grabs the lock. <function>spin_unlock_bh()</function> does the reverse. (The '_bh' suffix is a historical reference to "Bottom Halves", the old name for software interrupts. It should really be called spin_lock_softirq()' in a perfect world). </para> <para> Note that you can also use <function>spin_lock_irq()</function> or <function>spin_lock_irqsave()</function> here, which stop hardware interrupts as well: see <xref linkend="hardirq-context"/>. </para> <para> This works perfectly for <firstterm linkend="gloss-up"><acronym>UP </acronym></firstterm> as well: the spin lock vanishes, and this macro simply becomes <function>local_bh_disable()</function> (<filename class="headerfile">include/linux/interrupt.h</filename>), which protects you from the softirq being run. </para> </sect1> <sect1 id="lock-user-tasklet"> <title>Locking Between User Context and Tasklets</title> <para> This is exactly the same as above, because <firstterm linkend="gloss-tasklet">tasklets</firstterm> are actually run from a softirq. </para> </sect1> <sect1 id="lock-user-timers"> <title>Locking Between User Context and Timers</title> <para> This, too, is exactly the same as above, because <firstterm linkend="gloss-timers">timers</firstterm> are actually run from a softirq. From a locking point of view, tasklets and timers are identical. </para> </sect1> <sect1 id="lock-tasklets"> <title>Locking Between Tasklets/Timers</title> <para> Sometimes a tasklet or timer might want to share data with another tasklet or timer. </para> <sect2 id="lock-tasklets-same"> <title>The Same Tasklet/Timer</title> <para> Since a tasklet is never run on two CPUs at once, you don't need to worry about your tasklet being reentrant (running twice at once), even on SMP. </para> </sect2> <sect2 id="lock-tasklets-different"> <title>Different Tasklets/Timers</title> <para> If another tasklet/timer wants to share data with your tasklet or timer , you will both need to use <function>spin_lock()</function> and <function>spin_unlock()</function> calls. <function>spin_lock_bh()</function> is unnecessary here, as you are already in a tasklet, and none will be run on the same CPU. </para> </sect2> </sect1> <sect1 id="lock-softirqs"> <title>Locking Between Softirqs</title> <para> Often a softirq might want to share data with itself or a tasklet/timer. </para> <sect2 id="lock-softirqs-same"> <title>The Same Softirq</title> <para> The same softirq can run on the other CPUs: you can use a per-CPU array (see <xref linkend="per-cpu"/>) for better performance. If you're going so far as to use a softirq, you probably care about scalable performance enough
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -