📄 smp.tex
字号:
From: michael@Physik.Uni-Dortmund.DE (Michael Dirkmann)thanks for your information. Attached is the tex-code of yourSMP-documentation :-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\documentclass[]{article}\parindent0.0cm\parskip0.2cm\begin{document}\begin{center}\LARGE \bfAn Implementation Of Multiprocessor Linux\normalsize\end{center}{ \itThis document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than the Intel MP v1.1 architecture for Pentium and 486 processors.}\hfill Alan Cox, 1995The author wishes to thank Caldera Inc. ( http://www.caldera.com )whose donation of an ASUS dual Pentium board made this project possible, and Thomas Radke, whose initial work on multiprocessor Linux formed the backbone of this project.\section{Background: The Intel MP specification.}Most IBM PC style multiprocessor motherboards combine Intel 486 or Pentium processors and glue chipsets with a hardware/software specification. The specification places much of the onus for hard work on the chipset and hardware rather than the operating system.The Intel Pentium processors have a wide variety of inbuilt facilities for supporting multiprocessing, including hardware cache coherency, built in interprocessor interrupt handling and a set of atomic test and set, exchange and similar operations. The cache coherency in particular makes the operating system's job far easier.The specification defines a detailed configuration structure in ROM that the boot up processor can read to find the full configuration of the processors and buses. It also defines a procedure for starting up the other processors.\section{Mutual Exclusion Within A Single Processor Linux Kernel}For any kernel to function in a sane manner it has to provide internal locking and protection of its own tables to prevent two processes updating them at once and for example allocating the same memory block. There are two strategies for this within current Unix and Unixlike kernels. Traditional Unix systems from the earliest of days use a scheme of 'Coarse Grained Locking' where the entire kernel is protected by a small number of locks only. Some modern systems use fine grained locking. Because fine grained locking has more overhead it is normally used only on multiprocessor kernels and real time kernels. In a real time kernel the fine grained locking reduces the amount of time locks are held and reduces the critical (to real time programming at least) latency times.Within the Linux kernel certain guarantees are made. No process running in kernel mode will be pre-empted by another kernel mode process unless it voluntarily sleeps. This ensures that blocks of kernel code are effectively atomic with respect to other processes and greatly simplifies many operations. Secondly interrupts may pre-empt a kernel running process, but will always return to that process. A process in kernel mode may disable interrupts on the processor and guarantee such an interruption will not occur. The final guarantee is that an interrupt will not be pre-empted by a kernel task. That is interrupts will run to completion or be pre-empted by other interrupts only.The SMP kernel chooses to continue these basic guarantees in order to make initial implementation and deployment easier. A single lock is maintained across all processors. This lock is required to access the kernel space. Any processor may hold it and once it is held may also re-enter the kernel for interrupts and other services whenever it likes until the lock is relinquished. This lock ensures that a kernel mode process will not be pre-empted and ensures that blocking interrupts in kernel mode behaves correctly. This is guaranteed because only the processor holding the lock can be in kernel mode, only kernel mode processes can disable interrupts and only the processor holding the lock may handle an interrupt.Such a choice is however poor for performance. In the longer term it is necessary to move to finer grained parallelism in order to get the best system performance. This can be done hierarchically by gradually refining the locks to cover smaller areas. With the current kernel highly CPU bound process sets perform well but I/O bound task sets can easily degenerate to near single processor performance levels. This refinement will be needed to get the best from Linux/SMP.\subsection{Changes To The Portable Kernel Components}The kernel changes are split into generic SMP support changes and architecture specific changes necessary to accommodate each different processor type Linux is ported to.\subsubsection{Initialisation}The first problem with a multiprocessor kernel is starting the other processors up. Linux/SMP defines that a single processor enters the normal kernel entry point start\_kernel(). Other processors are assumed not to be started or to have been captured elsewhere. The first processor begins the normal Linux initialisation sequences and sets up paging, interrupts and trap handlers. After it has obtained the processor information about the boot CPU, the architecture specific function {\tt \bf{void smp\_store\_cpu\_info(int processor\_id) }}is called to store any information about the processor into a per processor array. This includes things like the bogomips speed ratings.Having completed the kernel initialisation the architecture specific function{\tt \bf void smp\_boot\_cpus(void) }is called and is expected to start up each other processor and cause it to enter start\_kernel() with its paging registers and other control information correctly loaded. Each other processor skips the setup except for calling the trap and irq initialisation functions that are needed on some processors to set each CPU up correctly. These functions will probably need to be modified in existing kernels to cope with this.Each additional CPU then calls the architecture specific function{\tt \bf void smp\_callin(void)}which does any final setup and then spins the processor while the boot up processor forks off enough idle threads for each processor. This is necessary because the scheduler assumes there is always something to run. Having generated these threads and forked init the architecture specific {\tt \bf void smp\_commence(void)}function is invoked. This does any final setup and indicates to the system that multiprocessor mode is now active. All the processors spinning in the smp\_callin() function are now released to run the idle processes, which they will run when they have no real work to process.\subsubsection{Scheduling}The kernel scheduler implements a simple but very effective task scheduler. The basic structure of this scheduler is unchanged in the multiprocessor kernel. A processor field is added to each task, and this maintains the number of the processor executing a given task, or a magic constant (NO\_PROC\_ID) indicating the job is not allocated to a processor. Each processor executes the scheduler itself and will select the next task to run from all runnable processes not allocated to a different processor. The algorithm used by the selection is otherwise unchanged. This is actually inadequate for the final system because there are advantages to keeping a process on the same CPU, especially on processor boards with per processor second level caches.Throughout the kernel the variable 'current' is used as a global for the current process. In Linux/SMP this becomes a macro which expands to current\_set[smp\_processor\_id()]. This enables almost the entire kernel to be unaware of the array of running processors, but still allows the SMP aware kernel modules to see all of the running processes.The fork system call is modified to generate multiple processes with a process id of zero until the SMP kernel starts up properly. This is necessary because process number 1 must be init, and it is desirable that all the system threads are process 0. The final area within the scheduling of processes that does cause problems is the fact the uniprocessor kernel hard codes tests for the idle threads as task[0] and the init process as task[1]. Because there are multiple idle threads it is necessary to replace these with tests that the process id is 0 and a search for process ID 1, respectively.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -