📄 00000045.htm
字号:
<HTML><HEAD> <TITLE>BBS水木清华站∶精华区</TITLE></HEAD><BODY><CENTER><H1>BBS水木清华站∶精华区</H1></CENTER>发信人: clamor (clamor), 信区: Linux <BR>标 题: Linux Kernel Internals-2(Process and IM) <BR>发信站: BBS 水木清华站 (Tue Dec 19 21:28:56 2000) <BR> <BR>2. Process and Interrupt Management <BR>2.1 Task Structure and Process Table <BR>Every process under Linux is dynamically allocated a 'struct task_struct' st <BR>ructure. The maximum number of processes that can be created on the Linux sy <BR>stem is limited only by the amount of physical memory present, and is equal <BR>to (see kernel/fork.c:fork_init()): <BR> /* <BR> * The default maximum number of threads is set to a safe <BR> * value: the thread structures can take up at most half <BR> * of memory. <BR> */ <BR> max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 2; <BR>which on IA32 architecture basically means 'num_physpages/4' so, for example <BR> on 512M machine you can create 32k threads which is a considerable improvem <BR>ent over the 4k-epsilon limit for older (2.2 and earlier) kernels. Moreover, <BR> this can be changed at runtime using KERN_MAX_THREADS sysctl(2) or simply u <BR>sing procfs interface to kernel tunables: <BR># cat /proc/sys/kernel/threads-max <BR>32764 <BR># echo 100000 /proc/sys/kernel/threads-max <BR># cat /proc/sys/kernel/threads-max <BR>100000 <BR># gdb -q vmlinux /proc/kcore <BR>Core was generated by `BOOT_IMAGE=240ac18 ro root=306 video=matrox:vesa:0x11 <BR>8'. <BR>#0 0x0 in ?? () <BR>(gdb) p max_threads <BR>$1 = 100000 <BR>The set of processes on the Linux system is represented as a collection of ' <BR>struct task_struct' structures which are linked in two ways: <BR>1. as a hashtable, hashed by pid <BR>2. as a circular, doubly-linked list using p-next_task and p-prev_task point <BR>ers <BR>The hashtable is called pidhash[] and is defined in include/linux/sched.h: <BR>/* PID hashing. (shouldnt this be dynamic?) */ <BR>#define PIDHASH_SZ (4096 2) <BR>extern struct task_struct *pidhash[PIDHASH_SZ]; <BR>#define pid_hashfn(x) ((((x) 8) ^ (x)) & (PIDHASH_SZ - 1)) <BR>The tasks are hashed by their pid value and the above hashing function is su <BR>pposed to distribute the elements uniformly in their domain (0 to PID_MAX-1) <BR>. The hashtable is used to quickly find a task by given pid, using find_task <BR>_pid() inline from include/linux/sched.h: <BR>static inline struct task_struct *find_task_by_pid(int pid) <BR>{ <BR> struct task_struct *p, **htable = &pidhash[pid_hashfn(pid)]; <BR> for(p = *htable; p && p-pid != pid; p = p-pidhash_next) <BR> ; <BR> return p; <BR>} <BR>The tasks on each hashlist (i.e. hashed to the same value) are linked by p-p <BR>idhash_next/pidhash_pprev which are used by hash_pid() and unhash_pid() to i <BR>nsert and remove a given process into the hashtable. These are done under pr <BR>otection of the rw spinlock called 'tasklist_lock' taken for WRITE. <BR>The circular doubly-linked list that uses p-next_task/prev_task is maintaine <BR>d so that one could go through all tasks on the system easily. This is achie <BR>ved by for_each_task() macro from include/linux/sched.h: <BR>#define for_each_task(p) \ <BR> for (p = &init_task ; (p = p-next_task) != &init_task ; ) <BR>The users of for_each_task() should take tasklist_lock for READ. Note that f <BR>or_each_task() is using init_task to mark the beginning (and end) of the lis <BR>t - this is safe because the idle task (pid 0) never exits. <BR>The modifiers of the process hashtable or/and the process table links, notab <BR>ly fork, exit and ptrace must take the tasklist_lock for WRITE. What is more <BR> interesting is that the writers must also disable interrupts on the local c <BR>pu. The reason for this is not trivial. The send_sigio() walks the task list <BR> and thus takes tasklist_lock for READ and it is called from kill_fasync() i <BR>n the interrupt context. This is why writers must disable the interrupts whi <BR>le the readers don't need to. <BR>Now that we understand how the task_struct structures are linked together, l <BR>et us examine the members of task_struct. They loosely corresponds to the me <BR>mbers of UNIX 'struct proc' and 'struct user' combined together. <BR>The other versions of UNIX separated the task state information into part wh <BR>ich should be kept memory-resident at all times (called 'proc structure' whi <BR>ch includes process state, scheduling information etc.) and part which is on <BR>ly needed when the process is running (called 'u area' which includes file d <BR>escriptor table, disk quota information etc.). The only reason for such ugly <BR> design was that memory was a very scarce resource. Modern operating systems <BR> (well, only Linux at the moment but others, e.g. FreeBSD seem to improve in <BR> this direction towards Linux) do not need such separation and therefore mai <BR>ntain process state in a kernel memory-resident data structure at all times. <BR> <BR>The task_struct structure is declared in include/linux/sched.h and is curren <BR>tly 1680 bytes in size. <BR>The state field is declared as: <BR>volatile long state; /* -1 unrunnable, 0 runnable, 0 stopped */ <BR>#define TASK_RUNNING 0 <BR>#define TASK_INTERRUPTIBLE 1 <BR>#define TASK_UNINTERRUPTIBLE 2 <BR>#define TASK_ZOMBIE 4 <BR>#define TASK_STOPPED 8 <BR>#define TASK_EXCLUSIVE 32 <BR>Why is TASK_EXCLUSIVE defined as 32 and not 16? Because 16 was used up by TA <BR>SK_SWAPPING and I forgot to shift TASK_EXCLUSIVE up when I removed all refer <BR>ences to TASK_SWAPPING (sometime in 2.3.x). <BR>The volatile in p-state declaration means it can be modified asynchronously <BR>(from interrupt handler): <BR>1. TASK_RUNNING means the task is "supposed to be" on the run queue. The rea <BR>son it may not yet be on the runqueue is that marking task as TASK_RUNNING a <BR>nd placing it on the runqueue is not atomic, however if you look at the queu <BR>e under protection of runqueue_lock then every TASK_RUNNING is on the runque <BR>ue. The converse is not true. Namely, drivers can mark themselves (or rather <BR> the process context they run in) as TASK_INTERRUPTIBLE (or UNINTERRUPTIBLE) <BR> and then call schedule() which removes it from the runqueue (unless there i <BR>s a pending signal, in which case it is left on the runqueue). speaking not <BR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -