📄 todo
字号:
directory by default? umask will put some restrictions on what can be seen. Alternatively, have an environment variable that sets the state location. If people want it globally visible they can set it to a global location.dnotify in monitor This has been implemented, but I pulled it out because I'm not convinced it is a good idea. Signals into GTK seem to cause some trouble when running from valgrind etc. Polling is not too expensive, and is nice and simple. It also allows easier ways to handle corner cases like cleaning up state files left over after a compiler is terminated. Could set up dnotify on the state directory so that we don't have to keep polling it. This would slightly reduce our CPU usage when idle, and might allow for faster updates when busy. We still have to scan the whole directory though, so we don't want to do it too often. I'm not sure how to nicely integrate this into GNOME though. dnotify sends us a signal, which doesn't seem to fit in well with the GNOME system. Perhaps the dummy pipe trick? Or perhaps we can jump out of the signal? We can't call GTK code from inside. state changes are "committed" by renaming the file, so we'd want to listen for DN_RENAME I think. We need to make sure not to get into a loop by reacting to our own delete events.SSH connection hoarding It might be nice to hold open SSH connections to avoid the network and CPU overhead of opening new ones. However, fsh is far too slow, probably because of being written in Python. It's only going to work on systems which can pass file descriptors and therefore needs to be optional. Probably this only works on Unix. Building the kernel between the three x2000s seems to make localhost thrash. A few jobs (but not many) get passed out to the other machines. Perhaps for C++ or something with really large files fsh would be better because the cost of starting Python would be amortized across more work. I don't think this needs to be done in distcc. It can be a completely separate project to just rewrite fsh into C. Indeed you could even be compatible with the Python implementation and just write the short-lived client bit in C.Masquerade It might be nice to automatically create the directory and symlinks. However we don't know what compiler names they'll want to hook... Probably the best that we can do is provide clear instructions for users or package distributors to set this up.Packaging Perhaps build RPMS and .debs? Is it easy to build a static (or LSB-compliant?) .rpm on Debian? What about an apt repository?Statistics Accumulate statistics on how many jobs are built on various machines. Want to be able to do something like "watch ccache -s". Perhaps just dump files into a status directory where they can be examined? Ignore (or delete) files over ~60s old. This avoids problems with files hanging around from interrupted compilations. refactor name handling Common function that looks at file extensions and returns information about them - what is the preprocessed form of this extension? - does this need preprocessing? - is this a source file?check that EINTR is handled in all casescheck that all lengths are unsigned 32-bit I think this is done, but it's worth checking a bit more.abort when cpp fails The same SIGCHLD handling approach used to feed the compiler from a fifo might be used to abort early if the preprocessor fails. This will happen reasonably often, whenever there is a problem with an include, ifdef, comment, etc. It might save waiting for a long connection to complete. One complication is that we know the compiler ought to consume all its input but we don't know when cpp ought to finish. So the sigchld handler will have to check if it failed or not. If it failed, then abort compilation. If it did not fail, then keep going with the connection or whatever. This is probably not worthwhile at the moment because connections generally seem faster than waiting for cpp.feed compiler from fifo Probably quite desirable, because it allows the compiler to start work sooner. This was originally removed because of some hitches to do with process termination. I think it can be put back in reliably, but only if this is fixed. Perhaps we need to write to the compiler in nonblocking mode? Perhaps it would be better to talk to both the compiler and network in nonblocking mode? It is pretty desirable to pull information from the network as soon as possible, so that the TCP windows and buffers can open right up. Check CVS to remember what originally went wrong here. Events that we need to consider: Client forks Compiler opens pipe Client exits Server opens pipe There are a few possibilities here: Client opens fifo, reads all input, and exits. The normal success case. Client never reads from fifo and just exits. Would happen if the compiler command line was wrong. Client reads from fifo but not the whole thing, and then exits. Opening the fifo is a synchronization point: in blocking mode neither the compiler or server can proceed past here until the other one opens it. If the compiler exits, then the server ought to be broken out of it by a SIGCHLD. But there is a race condition here: the SIGCHLD might happen just before the open() call. We need to either jump out of the signal handler and abort the compilation, or use a non-blocking open and a dummy pipe to break the select(). If we jump out with longjmp then this makes the code a bit convoluted. Alternatively the signal handler could just do a nonblocking open on the pipe, which would allow the open to complete, if it had not already. This was last supported in 0.12. That version doesn't handle the compiler exiting without opening the pipe though.streaming input output We could start sending the preprocessed source out before it is complete. This would require a protocol that allows us to send little chunks from various streams, followed by an EOF. This can certainly be done -- fsh and ssh do it. However, particularly if we want to allow for streaming more than one thing at a time, then getting all the timing conditions right to avoid deadlock caused by bubbles of data in TCP pipes. rsync has had trouble with this. It's even more hairy when running over ssh. So on the whole I am very skeptical about doing this. Even when refactored into a general 'distexec', this is more about batch than interactive processing.assemble on client May be useful if there is a cross compiler but no cross assembler, as is supposed to be the case for PPC AIX. See thread by Stuart D Gathman. Would also allow piping output back to client, if the protocol was changed to support that.web site http://user-mode-linux.sourceforge.net/thanks.htmlsendfile perhaps try sendfile to receive as well, if this works on any platforms.static linking cachegrind shows that a large fraction of client runtime is spent in the dynamic linker, which is kind of a waste. In principle using dietlibc might reduce the fixed overhead of the client. However, the nsswitch functions are always dynamically linked: even if we try to produce a static client it will include dlopen and eventually indirectly get libc, so it's probably not practical.testing How to use Debian's make-kpkg with distcc? Does it work with the masquerade feature? http://moin.conectiva.com.br/files/AptRpm/attachments/apt-0.5.5cnc4.1.tar.bz2coverage Try running with gcov. May require all tests to be run from the same directory (no chdir) so that the .da files can accumulate properly.slow networks Use Linux Traffic Control to simulate compilation across a slow network.scheduling onto localhost Where does local execution fit into the picture? Perhaps we could talk to a daemon on localhost to coordinate with other processes, though that's a bit yucky. However the client should use the same information and shared state as the daemon when deciding whether it can take on another job. At the moment we just use a fixed number of slots, by default 4, and this seems to work adequately.make "localhost" less magic Recognizing this magic string and treating it differently from 127.0.0.1 or the canonical name of the host is perhaps a bit strange. People do seem to get it wrong. I can't think of a better simple solution though.blacklist/lock by IP, not by name Means we need reliable addr-to-string for IPv4 and IPv6. Any downside to this? Would fix Zygo's open Debian bug.DNS multi-A-records build.foo.com expands to a list of all IP addresses for building. Need to choose an appropriate target that has the right compilers. Probably not a good idea. If we go to using DNS roundrobin records, or if people have the same HOSTS set on different machines, then we can't rely on the ordering of hosts. Perhaps we should always shuffle them? ssh is an interesting case because we probably want to open the connection using the hostname, so that the ssh config's "Host" sections can have the proper effect. Sometimes people use multi A records for machines with several routeable interfaces. In that case it would be bad to assume the machine can run multiple jobs, and it is better to let the resolver work out which address to use.DNS SRV records Can only be updated by zone administrator -- unless you have dynamic DNS, which is quite possible.better scheduler What's the best way to schedule jobs? Multiprocessor machines present a considerable complication, because we ought to schedule to them even if they're already busy. We don't know how many more jobs will arrive in the future. This might be the first of many, or it might be the last, or all jobs might be sequenced in this stage of compilation. Generic OS scheduling theory suggests (??) that we should schedule a job in the place where it is likely to complete fastest. In other words, we should put it on the fastest CPU that's not currently busy. We can't control the overall amount of concurrency -- that's down to Make. I think all we really want is to keep roughly the same number of jobs running on each machine. I would rather not require all clients to know the capabilities of the machines they might like to use, but it's probably acceptable. We could also take the current load of the CPUs into account, but I'm not sure if we could get the information back fast enough for it to make a difference. Note that loadavg on Linux includes processes stuck in D state, which are not necessarily using any CPU. We want to approximate all tasks on the network being in a single queue, from which the servers invite tasks as cycles become available. However, we also want to preserve the classic-TCP model of clients opening connections to servers, because this makes the security model straightforward, works over plain TCP, and also can work over SSH. http://www.cs.panam.edu/~meng/Course/CS6354/Notes/meng/master/node4.html Research this more. We "commit" to using a particular server at the last possible moment: when we start sending a job to it. This is almost certainly preferable to queueing up on a particular server when we don't know that it will be the next one free. One analogy for this is patients waiting in a medical center to see one of several doctors. They all wait in a common waiting room (the queue) until a doctor (server) is free. Normally the doctors would come into the waiting room to say "who's next?", but the constraint of running over TCP means that in our case the doctors cannot initiate the transaction.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -