📄 00000045.htm
字号:
true because setting state=TASK_RUNNING and placing task on the runq by wake <BR>_up_process() is not atomic so you can see (very briefly) TASK_RUNNING tasks <BR> not yet on the runq. TASK_INTERRUPTIBLE means the task is sleeping but can <BR>be woken up by a signal or by expiry of a timer. TASK_UNINTERRUPTIBLE same a <BR>s TASK_INTERRUPTIBLE, except it cannot be woken up. TASK_ZOMBIE task has ter <BR>minated but has not had its status collected (wait()-ed for) by the parent ( <BR>natural or by adoption). TASK_STOPPED task was stopped either due to job con <BR>trol signals or due to ptrace(2). TASK_EXCLUSIVE this is not a separate stat <BR>e but can be OR-ed to either one of the TASK_INTERRUPTIBLE or TASK_UNINTERRU <BR>PTIBLE. This means that when this task is sleeping on a wait queue with many <BR> other tasks, it will be woken up alone instead of causing "thundering herd" <BR> problem by waking up all the waiters. <BR>Task flags contain information about the process states which are not mutual <BR>ly exclusive: <BR>unsigned long flags; /* per process flags, defined below */ <BR>/* <BR> * Per process flags <BR> */ <BR>#define PF_ALIGNWARN 0x00000001 /* Print alignment warning msgs */ <BR> /* Not implemented yet, only for 486 <BR>*/ <BR>#define PF_STARTING 0x00000002 /* being created */ <BR>#define PF_EXITING 0x00000004 /* getting shut down */ <BR>#define PF_FORKNOEXEC 0x00000040 /* forked but didn't exec */ <BR>#define PF_SUPERPRIV 0x00000100 /* used super-user privileges */ <BR>#define PF_DUMPCORE 0x00000200 /* dumped core */ <BR>#define PF_SIGNALED 0x00000400 /* killed by a signal */ <BR>#define PF_MEMALLOC 0x00000800 /* Allocating memory */ <BR>#define PF_VFORK 0x00001000 /* Wake up parent in mm_release */ <BR>#define PF_USEDFPU 0x00100000 /* task used FPU this quantum (SMP) <BR>*/ <BR>The fields p-has_cpu,p-processor, p-counter, p-priority, p-policy and p-rt_p <BR>riority are related to the scheduler and will be looked at later. <BR>The fields p-mm and p-active_mm point to the process' address space describe <BR>d by mm_struct structure and to the active address space if the process does <BR>n't have a real one (e.g. kernel threads) - this is to minimize TLB flushes <BR>on switching address spaces when the task is scheduled out. So, if we are sc <BR>heduling-in the kernel thread (which has no p-mm) then its next-active_mm wi <BR>ll be set to the prev-active_mm of the task that was scheduled-out which wil <BR>l be the same as prev-mm if prev-mm != NULL. The address space can be shared <BR> between threads if CLONE_VM flag is passed to the clone(2) system call or b <BR>y means of vfork(2) system call. <BR>The fields p-exec_domain and p-personality related to the personality of the <BR> task, i.e. to the way certain system calls behave in order to emulate "pers <BR>onality" of foreign flavours of UNIX. <BR>The field p-fs contains filesystem information, which under Linux means thre <BR>e pieces of information: <BR>1. root directory's dentry and mountpoint <BR>2. alternate root directory's dentry and mountpoint <BR>3. current working directory's dentry and mountpoint <BR>Also, this structure includes a reference count because it can be shared bet <BR>ween cloned tasks when CLONE_FS flags are passed to the clone(2) system call <BR>. <BR>The field p-files contains the file descriptor table. This also can be share <BR>d between tasks if CLONE_FILES is specified with clone(2) system call. <BR>The field p-sig contains signal handlers and can be shared between cloned ta <BR>sks by means of CLONE_SIGHAND flag passed to the clone(2) system call. <BR>2.2 Creation and termination of tasks and kernel threads <BR>Different books on operating systems define a "process" in different ways, s <BR>tarting from "instance of a program in execution" and ending with "that whic <BR>h is produced by clone(2) or fork(2) system calls". Under Linux, there are t <BR>hree kinds of processes: <BR>· Idle Thread <BR>· Kernel Threads <BR>· User Tasks <BR>The idle thread is created at compile time for the first CPU and then it is <BR>"manually" created for each CPU by means of arch-specific fork_by_hand() in <BR>arch/i386/kernel/smpboot.c which unrolls fork system call by hand (on some a <BR>rchs). Idle tasks share one init_task structure but have a private TSS struc <BR>ture in per-CPU array init_tss. Idle tasks all have pid = 0 and no other tas <BR>k can share pid, i.e. use CLONE_PID flag to clone(2). <BR>Kernel threads are created using kernel_thread() function which invokes the <BR>clone system call in kernel mode. Kernel threads usually have no user addres <BR>s space, i.e. p-mm = NULL because they explicitly do exit_mm(), e.g. via dae <BR>monize() function. Kernel threads can always access kernel address space dir <BR>ectly. They are allocated pid numbers in the low range. Running at processor <BR>'s ring 0 implies that the kernel threads enjoy all the io privileges and ca <BR>nnot be pre-empted by the scheduler. <BR>User tasks are created by means of clone(2) or fork(2) system calls, both of <BR> which internally invoke kernel/fork.c:do_fork(). <BR>Let us understand what happens when a user process makes a fork(2) system ca <BR>ll. Although the fork(2) system call is architecture-dependent due to the di <BR>fferent ways of passing user stack and registers, the actual underlying func <BR>tion do_fork() that does the job is portable and is located at kernel/fork.c <BR>. <BR>The following steps are done: <BR>1. Local variable retval is set to -ENOMEM as it is the value errno is set t <BR>o if fork(2) fails to allocate a new task structure <BR>2. if CLONE_PID is set in clone_flags then return an error (-EPERM) unless t <BR>he caller is the idle thread (during boot only). So, normal user threads can <BR>not pass CLONE_PID to clone(2) and expect it to succeed. For fork(2) it is i <BR>rrelevant as clone_flags is set to SIFCHLD - this is only relevant when do_f <BR>ork() is invoked from sys_clone() which passes the clone_flags from the valu <BR>e requested from userspace <BR>3. current-vfork_sem is initialised (it is later cleared in the child). This <BR> is used by sys_vfork() (vfork(2) system call, corresponds to clone_flags = <BR>CLONE_VFORK|CLONE_VM|SIGCHLD) to make the parent sleep until the child does <BR>mm_release() for example as a result of execing another program or exit(2)-i <BR>ng <BR>4. A new task structure is allocated using arch-dependent alloc_task_struct( <BR>) macro, on x86 it is just a gfp at GFP_KERNEL priority. This is the first r <BR>eason why fork(2) system call may sleep. If this allocation fails we return <BR>-ENOMEM <BR>5. All the values from current process' task structure are copied into the n <BR>ew one, using structure assignment *p = *current. Perhaps this should be rep <BR>laced by a memset? Later on, the fields that should not be inherited by the <BR>child are set to the correct values <BR>6. Big kernel lock is taken as the rest of the code would otherwise be non-r <BR>eentrant <BR>7. If the parent has user resources (a concept of UID, Linux is flexible eno <BR>ugh to make it a question rather than a fact), then verify if the user excee <BR>ded RLIMIT_NPROC soft limit - if so, fail with -EAGAIN, if not, increment th <BR>e count of processes by given uid p-user-count <BR>8. If the system-wide number of tasks exceeds the value of the tunable max_t <BR>hreads, fail with -EAGAIN <BR>9. If the binary being executed belongs to a modularised execution domain, i <BR>ncrement the corresponding module's reference count <BR>10. If the binary being executed belongs to a modularised binary format, inc <BR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -