📄 readme
字号:
ckpt: A process checkpoint librarywww.cs.wisc.edu/~zandy/ckptCopyright (c) 2002 Victor C. Zandy zandy@cs.wisc.eduCOPYING contains the distribution terms for ckpt (LGPL).Contents: 1 RELEASE NOTES 2 INSTALLATION 3 QUICK START 4 OVERVIEW 5 DOES IT CHECKPOINT EVERYTHING? 6 FILES INCLUDED WITH CKPT 7 ENVIRONMENT VARIABLES 8 LINKING PROGRAMS WITH THE CHECKPOINT LIBRARY 9 TRIGGERING A CHECKPOINT 10 RESTARTING A CHECKPOINT 11 DIAGNOSTICS 12 CKPT API 13 CONTACT 14 REFERENCES1. RELEASE NOTESWe support ckpt on x86 Linux 2.4. It may very well still work onLinux 2.2.Note that the ckpt api has changed since 1.3.2. INSTALLATIONEdit the user options in Makefile to set the compiler, compilerflags, and installation directories. Then run 'make install'.3. QUICK STARTHere is a brief, simple session. We checkpoint the program foo,which prints the positive integers. The ckpt libraries(libckpt.so and librestart.so) are installed in /home/me/lib andthe restart command (restart) is installed in /home/me/bin. Weuse csh shell syntax and prefix annotations with #.% setenv LD_PRELOAD /home/me/lib/libckpt.so% setenv CKPT_FILENAME foo.ckpt # checkpoint file% foo1234^Z # send SIGTSTP% unsetenv LD_PRELOAD # these environment variables% unsetenv CKPT_FILENAME # are not needed to restart % /home/me/bin/restart foo.ckpt564. OVERVIEWckpt is a set of libraries and programs for user-level processcheckpointing.A process linked with the ckpt library will checkpoint itselfwhen it receives a selected signal, SIGTSTP by default.Depending on options set in the environment, it either writesthe checkpoint to a file or sends it to a checkpoint server.The process can be restarted from the checkpoint at a latertime, possibly on a different machine of the samearchitecture and OS.Programs do not need to be relinked with the ckpt library aheadof time, although that is one of the linking options:1) compile time via the linker;2) the start of process execution via LD_PRELOAD [Preload];3) any time during process execution via hijacking [Hijack].ckpt exports a small API to the program with which it is linked.The API allows the program to receive a callback when acheckpoint is triggered or restarted. Programs do not need touse the API to get basic checkpointing service.The techniques used by ckpt are similar to those used by Condor[Condor], although the code is entirely new and, unlike Condor,does not require you to relink your program in advance.5. DOES IT CHECKPOINT EVERYTHING?No. ckpt only checkpoints the process address space and signal state.The following resources are not checkpointed:- open files- network connections- interprocess communication- process identifiers, including process id, process group id, user id, or group id- thread stateYou can combine ckpt with rocks to checkpoint open networkconnections [Rocks].We are developing mechanisms for checkpointing other resources.6. FILES INCLUDED WITH CKPTA ckpt installation comprises several files:libckpt.so: The checkpoint library. It is linked with the process to be checkpointed in one of the ways described below.librestart.so: The restart library, a temporarily loaded library that assists the restarting of a checkpoint. It should not be linked with the process by the user.restart: A command that restarts a process from a checkpoint.cssrv: A network server that manages checkpoints. It is not necessary; checkpoints can be written to ordinary files instead.7. ENVIRONMENT VARIABLESThe ckpt library is controlled by the environment variables ofthe process in which it is loaded. Since environments reside inthe user address space, the environment of a process thatrestarts a checkpoint will be replaced with the environmentpreserved in the checkpoint.CKPT_ID The checkpoint identifier, a string of ascii characters. The identifier names the checkpoint; it is not interpreted. If this variable is not set, the ckpt library sets the identifer to a random 32-bit integer expressed in ascii hex. The identifer including the terminating null character cannot be longer than 1024 characters.CKPT_SERVER Enables the use of a checkpoint server. The value is the ascii server hostname or dotted IP address optionally followed by a colon and ascii port number. The format of this variable may change to accommodate future server protocols.CKPT_FILENAME The name of the file in which the checkpoint is written. Forward slashes are treated as path separators. Occurrences of "%i" are replaced with the checkpoint identifier. Forward slashes in the identifier are interpreted as path separators. The default is /tmp/ckpt. This variable has no effect if CKPT_SERVER is set.CKPT_RESTARTLIB The pathname of the checkpoint restart library. If this variable is not set, the restart library must be present somewhere in the LD_LIBRARY_PATH. This variable is only significant in the process that restarts a checkpoint.CKPT_CONTINUE After emitting a checkpoint, the ckpt library forces the process to exit unless this variable is set (to any value).CKPT_SIGNAL The signal that triggers a checkpoint. The default is SIGTSTP. The value may be a Unix signal name (e.g., SIGUSR1, SIGURG, etc.) or a signal number expressed as an ascii decimal integer.8. LINKING PROGRAMS WITH THE CHECKPOINT LIBRARYThe simplest way to link libckpt.so with a process is relink theprogram binary, adding the checkpoint library to its library list.For example, if the program foo is linked with this line: cc -o foo foo.o -lm Then edit it as follows to include libckpt.so: cc -o foo foo.o -lm -L/home/me/lib -lckptNote that the -L option directs the linker to include thespecified directory in its search for the library.Note also that the -L option does not affect the set ofdirectories searched when the program is started. You must alsomodify the LD_LIBRARY_PATH environment variable to include thedirectory containing libckpt.so.Sometimes it is inconvenient or impossible to relink a program.You can force a program to load the checkpoint library when it isexecuted with the environment variable LD_PRELOAD. Set thisvariable to the pathname of the library before executing theprogram. For example (csh): % setenv LD_PRELOAD /home/me/lib/libckpt.so % fooFinally, you can inject the checkpoint library into an alreadyrunning process with a process hijacker. A hijacker is availableat www.cs.wisc.edu/~zandy/p.9. TRIGGERING A CHECKPOINTYou trigger a checkpoint by sending the CKPT_SIGNAL to theprocess. A process may checkpoint itself.After the checkpoint, the process exits unless CKPT_CONTINUE isset. It calls _exit to avoid executing functions registered withatexit(3) or on_exit(3). When CKPT_CONTINUE is set, the processcontinues from the point it was interrupted by the checkpointsignal and it can be checkpointed again.When CKPT_SERVER is not set, the checkpoint is written to anordinary file. The name of the file is determined byCKPT_FILENAME.When CKPT_SERVER is set, the checkpoint is transferred over TCPto the checkpoint server. The checkpoint is identified to thecheckpoint server with the checkpoint identifier. Currently onlythe checkpoint server cssrv, included in the ckpt distribution,is supported.10. RESTARTING A CHECKPOINTThere are two ways to restart a checkpoint.1. The restart command restarts a checkpoint from a file. It replaces itself with the continuation of the checkpoint. Restart does not require libckpt.so and ignores it if it is loaded, except when CKPT_SERVER is set.2. When libckpt.so is loaded and CKPT_SERVER is set, it first checks whether the checkpoint server has a checkpoint for CKPT_ID, and if so it downloads and restarts it.A restarted process can be checkpointed again.The library librestart.so is required to restart a checkpoint.This library must be present in one of the directories listed inLD_LIBRARY_PATH or be identified with CKPT_RESTARTLIB.11. DIAGNOSTICSckpt prints warnings and errors to standard error. It does nothesitate to abort if it senses danger.We are happy to assist with problems.12. CKPT APIPrograms linked with the checkpoint library can call thesefunctions:void ckpt_on_preckpt(void (*f)(void *), void *arg); Register F to be called when a checkpoint is triggered. Registered functions are called before the checkpoint begins in the order they were registered. F is passed the ARG argument.void ckpt_on_postckpt(void (*f)(void *), void *arg); Register F to be called when a checkpoint is taken and CKPT_CONTINUE is set. Registered functions are called after the checkpoint completes in the order they were registered. F is passed the ARG argument.void ckpt_on_restart(void (*f)(void *), void *arg); Register F to be called when a process is restarted from a checkpoint. Registered functions are called after the checkpoint has been completely restored, just before control returns to the program, in the reverse of the order they were registered. F is passed the ARG argument.13. CONTACTVictor Zandy wrote and maintains ckpt. Please report bugs tozandy@cs.wisc.edu. Feedback and experience reports are welcome.The ckpt webpage is http://www.cs.wisc.edu/~zandy/ckpt.14. REFERENCES[Condor] http://www.cs.wisc.edu/condor[Hijack] http://www.paradyn.org/papers/index.html#hijack http://www.cs.wisc.edu/~zandy/p[Preload] See ld.so(8).[Rocks] http://www.cs.wisc.edu/~zandy/rocks
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -