📄 kernel-locking.tmpl
字号:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN"[]><book id="LKLockingGuide"> <bookinfo> <title>Unreliable Guide To Locking</title> <authorgroup> <author> <firstname>Paul</firstname> <othername>Rusty</othername> <surname>Russell</surname> <affiliation> <address> <email>rusty@rustcorp.com.au</email> </address> </affiliation> </author> </authorgroup> <copyright> <year>2000</year> <holder>Paul Russell</holder> </copyright> <legalnotice> <para> This documentation is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. </para> <para> This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. </para> <para> You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA </para> <para> For more details see the file COPYING in the source distribution of Linux. </para> </legalnotice> </bookinfo> <toc></toc> <chapter id="intro"> <title>Introduction</title> <para> Welcome, to Rusty's Remarkably Unreliable Guide to Kernel Locking issues. This document describes the locking systems in the Linux Kernel as we approach 2.4. </para> <para> It looks like <firstterm linkend="gloss-smp"><acronym>SMP</acronym> </firstterm> is here to stay; so everyone hacking on the kernel these days needs to know the fundamentals of concurrency and locking for SMP. </para> <sect1 id="races"> <title>The Problem With Concurrency</title> <para> (Skip this if you know what a Race Condition is). </para> <para> In a normal program, you can increment a counter like so: </para> <programlisting> very_important_count++; </programlisting> <para> This is what they would expect to happen: </para> <table> <title>Expected Results</title> <tgroup cols=2 align=left> <thead> <row> <entry>Instance 1</entry> <entry>Instance 2</entry> </row> </thead> <tbody> <row> <entry>read very_important_count (5)</entry> <entry></entry> </row> <row> <entry>add 1 (6)</entry> <entry></entry> </row> <row> <entry>write very_important_count (6)</entry> <entry></entry> </row> <row> <entry></entry> <entry>read very_important_count (6)</entry> </row> <row> <entry></entry> <entry>add 1 (7)</entry> </row> <row> <entry></entry> <entry>write very_important_count (7)</entry> </row> </tbody> </tgroup> </table> <para> This is what might happen: </para> <table> <title>Possible Results</title> <tgroup cols=2 align=left> <thead> <row> <entry>Instance 1</entry> <entry>Instance 2</entry> </row> </thead> <tbody> <row> <entry>read very_important_count (5)</entry> <entry></entry> </row> <row> <entry></entry> <entry>read very_important_count (5)</entry> </row> <row> <entry>add 1 (6)</entry> <entry></entry> </row> <row> <entry></entry> <entry>add 1 (6)</entry> </row> <row> <entry>write very_important_count (6)</entry> <entry></entry> </row> <row> <entry></entry> <entry>write very_important_count (6)</entry> </row> </tbody> </tgroup> </table> <para> This overlap, where what actually happens depends on the relative timing of multiple tasks, is called a race condition. The piece of code containing the concurrency issue is called a critical region. And especially since Linux starting running on SMP machines, they became one of the major issues in kernel design and implementation. </para> <para> The solution is to recognize when these simultaneous accesses occur, and use locks to make sure that only one instance can enter the critical region at any time. There are many friendly primitives in the Linux kernel to help you do this. And then there are the unfriendly primitives, but I'll pretend they don't exist. </para> </sect1> </chapter> <chapter id="locks"> <title>Two Main Types of Kernel Locks: Spinlocks and Semaphores</title> <para> There are two main types of kernel locks. The fundamental type is the spinlock (<filename class=headerfile>include/asm/spinlock.h</filename>), which is a very simple single-holder lock: if you can't get the spinlock, you keep trying (spinning) until you can. Spinlocks are very small and fast, and can be used anywhere. </para> <para> The second type is a semaphore (<filename class=headerfile>include/asm/semaphore.h</filename>): it can have more than one holder at any time (the number decided at initialization time), although it is most commonly used as a single-holder lock (a mutex). If you can't get a semaphore, your task will put itself on the queue, and be woken up when the semaphore is released. This means the CPU will do something else while you are waiting, but there are many cases when you simply can't sleep (see <xref linkend="sleeping-things">), and so have to use a spinlock instead. </para> <para> Neither type of lock is recursive: see <xref linkend="techniques-deadlocks">. </para> <sect1 id="uniprocessor"> <title>Locks and Uniprocessor Kernels</title> <para> For kernels compiled without <symbol>CONFIG_SMP</symbol>, spinlocks do not exist at all. This is an excellent design decision: when no-one else can run at the same time, there is no reason to have a lock at all. </para> <para> You should always test your locking code with <symbol>CONFIG_SMP</symbol> enabled, even if you don't have an SMP test box, because it will still catch some (simple) kinds of deadlock. </para> <para> Semaphores still exist, because they are required for synchronization between <firstterm linkend="gloss-usercontext">user contexts</firstterm>, as we will see below. </para> </sect1> <sect1 id="rwlocks"> <title>Read/Write Lock Variants</title> <para> Both spinlocks and semaphores have read/write variants: <type>rwlock_t</type> and <structname>struct rw_semaphore</structname>. These divide users into two classes: the readers and the writers. If you are only reading the data, you can get a read lock, but to write to the data you need the write lock. Many people can hold a read lock, but a writer must be sole holder. </para> <para> This means much smoother locking if your code divides up neatly along reader/writer lines. All the discussions below also apply to read/write variants. </para> </sect1> <sect1 id="usercontextlocking"> <title>Locking Only In User Context</title> <para> If you have a data structure which is only ever accessed from user context, then you can use a simple semaphore (<filename>linux/asm/semaphore.h</filename>) to protect it. This is the most trivial case: you initialize the semaphore to the number of resources available (usually 1), and call <function>down_interruptible()</function> to grab the semaphore, and <function>up()</function> to release it. There is also a <function>down()</function>, which should be avoided, because it will not return if a signal is received. </para> <para> Example: <filename>linux/net/core/netfilter.c</filename> allows registration of new <function>setsockopt()</function> and <function>getsockopt()</function> calls, with <function>nf_register_sockopt()</function>. Registration and de-registration are only done on module load and unload (and boot time, where there is no concurrency), and the list of registrations is only consulted for an unknown <function>setsockopt()</function> or <function>getsockopt()</function> system call. The <varname>nf_sockopt_mutex</varname> is perfect to protect this, especially since the setsockopt and getsockopt calls may well sleep. </para> </sect1> <sect1 id="lock-user-bh"> <title>Locking Between User Context and BHs</title> <para> If a <firstterm linkend="gloss-bh">bottom half</firstterm> shares data with user context, you have two problems. Firstly, the current user context can be interrupted by a bottom half, and secondly, the critical region could be entered from another CPU. This is where <function>spin_lock_bh()</function> (<filename class=headerfile>include/linux/spinlock.h</filename>) is used. It disables bottom halves on that CPU, then grabs the lock. <function>spin_unlock_bh()</function> does the reverse. </para> <para> This works perfectly for <firstterm linkend="gloss-up"><acronym>UP </acronym></firstterm> as well: the spin lock vanishes, and this macro simply becomes <function>local_bh_disable()</function> (<filename class=headerfile>include/asm/softirq.h</filename>), which protects you from the bottom half being run. </para> </sect1> <sect1 id="lock-user-tasklet"> <title>Locking Between User Context and Tasklets/Soft IRQs</title> <para> This is exactly the same as above, because <function>local_bh_disable()</function> actually disables all softirqs and <firstterm linkend="gloss-tasklet">tasklets</firstterm> on that CPU as well. It should really be called `local_softirq_disable()', but the name has been preserved for historical reasons. Similarly, <function>spin_lock_bh()</function> would now be called spin_lock_softirq() in a perfect world. </para> </sect1> <sect1 id="lock-bh"> <title>Locking Between Bottom Halves</title> <para> Sometimes a bottom half might want to share data with another bottom half (especially remember that timers are run off a bottom half). </para> <sect2 id="lock-bh-same"> <title>The Same BH</title> <para> Since a bottom half is never run on two CPUs at once, you don't need to worry about your bottom half being run twice at once, even on SMP. </para> </sect2> <sect2 id="lock-bh-different"> <title>Different BHs</title> <para> Since only one bottom half ever runs at a time once, you don't need to worry about race conditions with other bottom halves. Beware that things might change under you, however, if someone changes your bottom half to a tasklet. If you want to make your code future-proof, pretend you're already running from a tasklet (see below), and doing the extra locking. Of course, if it's five years before that happens, you're gonna look like a damn fool. </para> </sect2> </sect1> <sect1 id="lock-tasklets"> <title>Locking Between Tasklets</title> <para> Sometimes a tasklet might want to share data with another tasklet, or a bottom half. </para> <sect2 id="lock-tasklets-same"> <title>The Same Tasklet</title> <para> Since a tasklet is never run on two CPUs at once, you don't need to worry about your tasklet being reentrant (running twice at once), even on SMP. </para> </sect2> <sect2 id="lock-tasklets-different"> <title>Different Tasklets</title> <para> If another tasklet (or bottom half, such as a timer) wants to share data with your tasklet, you will both need to use <function>spin_lock()</function> and <function>spin_unlock()</function> calls. <function>spin_lock_bh()</function> is unnecessary here, as you are already in a tasklet, and none will be run on the same CPU. </para> </sect2> </sect1> <sect1 id="lock-softirqs"> <title>Locking Between Softirqs</title> <para> Often a <firstterm linkend="gloss-softirq">softirq</firstterm> might want to share data with itself, a tasklet, or a bottom half. </para> <sect2 id="lock-softirqs-same"> <title>The Same Softirq</title> <para> The same softirq can run on the other CPUs: you can use a per-CPU array (see <xref linkend="per-cpu">) for better
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -