📄 ckcplm.txt
字号:
shouldn't but at least one compiler had a bug that made me include this item). * Don't use triple assignments, like a = b = c = 0; (or quadruple, etc). Some compilers generate bad code for these, or crash, etc (some version of DEC C as I recall). * Some compilers don't allow structure members to have the same names as other identifiers. Try to give structure members unique names. * Don't assume anything about order of evaluation in boolean expressions, or that they will stop early if a required condition is not true, e.g.: if (i > 0 && p[i-1] == blah) can still dump core if i == 0 (hopefully this is not true of any modern compiler, but I would not have said this if it did not actually happen somewhere). * Don't have a switch() statement with no cases (e.g. because of #ifdefs); this is a fatal error in some compilers. * Don't put lots of code in a switch case; move it out to a separate function; some compilers run out of memory when presented with a huge switch() statement -- it's not the number of cases that matters; it's the overall amount of code. * Some compilers might also limit the number of switch() cases, e.g. to 254. * Don't put anything between "switch() {" and "case:" -- switch blocks are not like other blocks. * Don't jump into or out of switches. * Don't make character-string constants longer than about 250 bytes. Longer strings should be broken up into arrays of strings. * Don't write into character-string constants (obviously). Even when you know you are not writing past the end; the compiler or linker might have put them into read-only and/or shared memory, and/or coalesced multiple equal constants so if you change one you change them all. * Don't depend on '\r' being carriage return. * Don't depend on '\n' being linefeed or for that matter any SINGLE character. * Don't depend on '\r' and '\n' being different (e.g. as separate switch() cases). * In other words, don't use \n or \r to stand for specific characters; use \012 and \015 instead. * Don't code for "buzzword 1.0 compliance", unless "buzzword" is K&R and "1.0" is the first edition. * Don't use or depend on anything_t (size_t, pid_t, etc), except time_t, without #ifdef protection (time_t is the only one I've found that is accepted everywhere). This is a tough one because the same function might require (say) a size_t arg on one platform, whereas size_t is unheard of on another; or worse, it might require a totally different data type, like int or long or some other typedef'd thing. It has often proved necessary to define a symbol to stand for the type of a particular argument to a particular library or system function to get around this problem. * Don't use or depend on internationalization ("i18n") features, wchar_t, locales, etc, in portable code; they are not portable. Anyway, locales are not the right model for Kermit's multi-character-set support. Kermit does all character-set conversion itself and does not use any external libraries or functions. * In particular, don't use any library functions that deal with wide characters or Unicode in any form. These are not only nonportable, but a constantly shifting target (e.g. the ones in glibc). * Don't make any assumption about signal handler type. It can be void, int, long, or anything else. Always declare signal handlers as SIGTYP (see definition in ckcdeb.h and augment it if necessary) and always use SIGRETURN at exit points from signal handlers. * Signals should always be re-armed to be used again (this barely scratches the surface -- the differences between BSD/V7 and System V and POSIX signal handling are numerous, and some platforms do not even support signals, alarms, or longjmps correctly or at all -- avoid all of this if you can). * On the other hand, don't assume that signals are disarmed after being raised. In some platforms you have to re-arm them, in others they stay armed. * Don't call malloc() and friends from a signal handler; don't do anything but setting integer global variables in a signal handler. * malloc() does not initialize allocated memory -- it never said it did. Don't expect it to be all 0's. * Did You Know: malloc() can succeed and the program can still dump core later when it attempts to use the malloc'd memory? (This happens when allocation is deferred until use and swap space is full.) * memset(), memmove(), and memcpy() are not portable, don't use them without protecting them in ifdefs (we have USE_MEMCPY for this). bzero()/bcopy() too, except we're guaranteed to have bzero()/bcopy() when using the sockets library (not really). See examples in the source. * Don't assume that strncpy() stops on the first null byte -- most versions always copy the number of bytes given in arg 3, padding out with 0's and overwriting whatever was there before. Use C-Kermit ckstrncpy() if you want predictable non-padding behavior, guaranteed NUL-termination, and a useful return code. * DID YOU KNOW.. that some versions of inet_blah() routines return IP addresses in network byte order, while others return them local machine byte order? So passing them to ntohs() or whatever is not always the right thing to do. * Don't use ANSI-format function declarations without #ifdef CK_ANSIC, and always provide an #else for the non-ANSI case. * Use the Kermit _PROTOTYP() macro for declaring function prototypes; it works in both the ANSI and non-ANSI cases. * Don't depend on any other ANSI preprocessor features like "pasting" -- they are often missing or nonoperational. * Don't assume any C++ syntax or semantics. * Don't use // as a comment introducer. C is not C++. * Don't declare a string as "char foo[]" in one module and "extern char * foo" in another, or vice-versa: this causes core dumps. * With compiler makers falling all over themselves trying to outdo each other in ANSI strictness, it has become increasingly necessary to cast EVERYTHING. This is increasingly true for char vs unsigned char. We need to use unsigned chars if we want to deal with 8-bit character sets, but most character- and string-oriented APIs want (signed) char arguments, so explicit casts are necessary. It would be nice if every compiler had a -funsigned-char option (as gcc does), but they don't. * a[x], where x is an unsigned char, can produce a wild memory reference if x, when promoted to an int, becomes negative. Cast it to (unsigned), even though it ALREADY IS unsigned. * Be careful how you declare functions that have char or long arguments; for ANSI compilers you MUST use ANSI declarations to avoid promotion problems, but you can't use ANSI declarations with non-ANSI compilers. Thus declarations of such functions must be hideously entwined in #ifdefs. Example: latter: int /* Put character in server command buffer */ #ifdef CK_ANSIC putsrv(char c) #else putsrv(c) char c; #endif /* CK_ANSIC */ /* putsrv */ { *srvptr++ = c; *srvptr = '\0'; /* Make sure buffer is null-terminated */ return(0); } * Be careful how you return characters from functions that return int values -- "getc-like functions" -- in the ANSI world. Unless you explicitly cast the return value to (unsigned), it is likely to be "promoted" to an int and have its sign extended. * At least one compiler (the one on DEC OSF/1 1.3) treats "/*" and "*/" within string constants as comment begin and end. No amount of #ifdefs will get around this one. You simply can't put these sequences in a string constant, e.g. "/usr/local/doc/*.*". * Avoid putting multiple macro references on a single line, e.g.: putchar(BS); putchar(SP); putchar(BS) This overflows the CPP output buffer of more than a few C preprocessors (this happened, for example, with SunOS 4.1 cc, which evidently has a 1K macro expansion buffer). C-Kermit needs constant adjustment to new OS and compiler releases. Every new OS release shuffles header files or their contents, or prototypes, or data types, or levels of ANSI strictness, etc. Every time you make an adjustment to remove a new compilation error, BE VERY CAREFUL to #ifdef it on a symbol unique to the new configuration so that the previous configuration (and all other configurations on all other platforms) remain as before. Assume nothing. Don't assume header files are where they are supposed to be, that they contain what you think they contain, that they define specific symbols to have certain values -- or define them at all! Don't assume system header files protect themselves against multiple inclusion. Don't assume that particular system or library calls are available, or that the arguments are what you think they are -- order, data type, passed by reference vs value, etc. Be conservative when attempting to write portable code. Avoid all advanced features. If you see something that does not make sense, don't assume it's a mistake -- it might be there for a reason, and changing it or removing is likely to cause compilation, linking, or runtime failures sometime, somewhere. Some huge percentage of the code, especially in the platform-dependent modules, is workarounds for compiler, linker, or API bugs. But finally... feel free to violate any or all of these rules in platform-specific modules for environments in which the rules are certain not to apply. For example, in VMS-specific code, it is OK to use #if, because VAX C, DEC C, and VMS GCC all support it. [ [29]Contents ] [ [30]C-Kermit ] [ [31]Kermit Home ] ________________________________________________________________________ 3.1. Memory Leaks The C language and standard C library are notoriously inadequate and unsafe. Strings are arrays of characters, usually referenced through pointers. There is no native string datatype. Buffers are fixed size, and C provides no runtime bounds checking, thus allowing overwriting of other data or even program code. With the popularization of the Internet, the "buffer exploit" has become a preferred method for hackers to hijack privileged programs; long data strings are fed to a program in hopes that it uses unsafe C library calls such as strcpy() or sprintf() to copy strings into automatic arrays, thus overwriting the call stack, and therefore the routine's return address. When such a hole is discovered, a "string" can be constructed that contains machine code to hijack the program's privileges and penetrate the system. This problem is partially addressed by the strn...() routines, which should always be used in preference to their str...() equivalents (except when the copy operation has already been prechecked, or there is a good reason for not using them, e.g. the sometimes undesirable side effect of strncpy() zeroing the remainder of the buffer). The most gaping whole, however, is sprintf(), which performs no length checking on its destination buffer, and is not easy to replace. Although snprintf() routines are starting to appear, they are not yet widespread, and certainly not universal, nor are they especially portable, or even full-featured. For these reasons, we have started to build up our own little library of C Library replacements, ckclib.[ch]. These are safe and highly portable primitives for memory management and string manipulation, such as: ckstrncpy() Like strncpy but returns a useful value, doesn't zero buffer. ckitoa() Opposite of atoi() ckltoa() Opposite of atol() ckctoa() Returns character as string ckmakmsg() Used with ck?to?() as a safe sprintf() replacement for up to 4 items ckmakxmsg() Like ckmakmsg() but accepts up to 12 items More about library functions in [32]Section 4.A. [ [33]Contents ] [ [34]C-Kermit ] [ [35]Kermit Home ] ________________________________________________________________________ 3.2. The "char" vs "unsigned char" Dilemma This is one of the most aggravating and vexing characteristics of the C language. By design, chars (and char *'s) are SIGNED. But in the modern era, however, we need to process characters that can have (or include) 8-bit values, as in the ISO Latin-1, IBM CP 850, or UTF-8 character sets, so this data must be treated as unsigned. But some C compilers (such as those based on the Bell UNIX V7 compiler) do not support "unsigned char" as a data type. Therefore we have the macro or typedef CHAR, which we use when we need chars to be unsigned, but which, unfortunately, resolves itself to "char" on those compilers that don't support "unsigned char". AND SO... We have to do a lot of fiddling at runtime to avoid sign extension and so forth. Some modern compilers (e.g. IBM, DEC, Microsoft) have options that say "make all chars be unsigned" (e.g. GCC "-funsigned-char") and we use them when they are available. Other compilers don't have this option, and at the same time, are becoming increasingly strict about type mismatches, and spew out torrents of warnings when we use a CHAR where a char is expected, or vice versa. We fix these one by one using casts, and the code becomes increasingly ugly. But there remains a serious problem, namely that certain library and kernel functions have arguments that are declared as signed chars (or pointers to them), whereas our character data is unsigned. Fine, we can can use casts here too -- but who knows what happens inside these routines. [ [36]Contents ] [ [37]C-Kermit ] [ [38]Kermit Home ] ________________________________________________________________________ 4. MODULES When C-Kermit is on the far end of a connection, it is said to be in remote mode. When C-Kermit has made a connection to another computer, it is in local mode. (If C-Kermit is "in the middle" of a multihop connection, it is still in local mode.) On another axis, C-Kermit can be in any of several major states: Command State Reading and writing from the job's controlling terminal or "console". In this mode, all i/o is handled by the Group E conxxx() (console i/o) routines. Protocol State Reading and writing from the communicatons device. In this mode, all i/o is handled by the Group E ttxxx() (terminal i/o)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -