📄 rfc1505.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 5 页
字号:

4.2.3  Type

   The type of an object is usually of interest only to the operating
   system that the object was created on.

   Types are:

          ACAT       access category (Primos)
          CAM        contiguous access method (Primos)
          DAM        direct access method (Primos)
          FIXED      fixed length records (VMS)
          FLAT       `flat file', sequence of bytes (Unix, DOS, default)
          ISAM       indexed-sequential access method (VMS)
          LINK       soft link (Unix)
          MAC        Macintosh file
          SAM        sequential access method (Primos)
          SEGSAM     segmented direct access method (Primos)
          SEGDAM     segmented sequential access method (Primos)
          TEXT       lines of ISO-10646-UTF-1 text ending with CR/LF
          VAR        variable length records (VMS)

4.2.4  Created

   Indicates the creation date of the file.  Dates are in the format
   defined in section 4.3.

4.2.5  Modified

   Indicates the date and time the file was last modified or closed
   after being open for write.

4.2.6  Accessed

   Indicates the date and time the file was last accessed on the
   original file system.

4.2.7  Owner

   The owner directive gives the name or numerical ID of the owner or
   creator of the file.






Costanzo, Robinson & Ullmann                                   [Page 15]

RFC 1505                 Encoding Header Field               August 1993


4.2.8  Group

   The group directive gives the name(s) or numerical IDs of the group
   or groups to which the file belongs.

4.2.9  ACL

   This directive specifies the access control list attribute of an
   object (the ACL attribute may occur more than once within an object).
   The list consist of a series of pairs of IDs and access codes in the
   format:

                user-ID:access-list


   There are four reserved IDs:

                $OWNER  the owner or creator
                $GROUP  a member of the group or groups
                $SYSTEM a system administrator
                $REST   everyone else

   The access list is zero or more single letters:

                A    add (create file)
                D    delete
                L    list (read directory)
                P    change protection
                R    read
                U    use
                W    write
                X    execute
                *    all possible access

4.2.10  Password

   The password attribute gives the access password for this object.
   Since the content of the object follows (being the raison d'etre of
   the encoding), the appearance of the password in plain text is not
   considered a security problem.  If the password is actually set by
   the decoder on a created object, the security (or lack) is the
   responsibility of the application domain controlling the decoder as
   is true of ACL and other protections.

4.2.11  Block

   The block attribute gives the block size of the file as a decimal
   number of bytes.



Costanzo, Robinson & Ullmann                                   [Page 16]

RFC 1505                 Encoding Header Field               August 1993


4.2.12  Record

   The record attribute gives the record size of the file as a decimal
   number of bytes.

4.2.13  Application

   This specifies the application that the file was created with or
   belongs to.  This is of particular interest for Macintosh files.

4.3  Date Field

   Various attributes have a date and time subsequent to and associated
   with them.

4.3.1  Syntax

   The syntax of the date field is a combination of date, time, and
   timezone:

       DD Mon YYYY HH:MM:SS.FFFFFF [+-]HHMMSS

       Date :=  DD Mon YYYY      1 or 2 Digits " " 3 Alpha " " 4 Digits
       DD   :=  Day              e.g. "08", " 8", "8"
       Mon  :=  Month            "Jan" | "Feb" | "Mar" | "Apr" |
                                 "May" | "Jun" | "Jul" | "Aug" |
                                 "Sep" | "Oct" | "Nov" | "Dec"
       YYYY :=  Year
       Time :=  HH:MM:SS.FFFFFF  2 Digits ":" 2 Digits [ ":" 2 Digits
                                 ["." 1 to 6 Digits ] ]
                                 e.g. 00:00:00, 23:59:59.999999
       HH   :=  Hours            00 to 23
       MM   :=  Minutes          00 to 59
       SS   :=  Seconds          00 to 60 (60 only during a leap second)
       FFFFF:=  Fraction
       Zone :=  [+-]HHMMSS       "+" | "-" 2 Digits [ 2 Digits
                                 [ 2 Digits ] ]
       HH   :=  Local Hour Offset
       MM   :=  Local Minutes Offset
       SS   :=  Local Seconds Offset

4.3.2  Semantics

   The date information is that which the file system has stored in
   regard to the file system object.  Date information is stored
   differently and with varying degrees of precision by different
   computer file systems.  An encoder must include as much date
   information as it has available concerning the file system object.  A



Costanzo, Robinson & Ullmann                                   [Page 17]

RFC 1505                 Encoding Header Field               August 1993


   decoder which receives an object encoded with a date field containing
   greater precision than its own must disregard the excessive
   information.  Zone is Co-ordinated Universal Time "UTC" (formerly
   called "Greenwich Mean Time").  The field specifies the time zone of
   the file system object as an offset from Universal Time.  It is
   expressed as a signed [+-] two, four or six digit number.

   A file that was created April 15, 1993 at 8:05 p.m.  in Roselle Park,
   New Jersey, U.S.A.  might have a date field which looks like:

   15 Apr 1993 20:05:22.12 -0500

5.  LZJU90:  Compressed Encoding

   LZJU90 is an encoding for a binary or text object to be sent in an
   Internet mail message.  The encoding provides both compression and
   representation in a text format that will successfully survive
   transmission through the many different mailers and gateways that
   comprise the Internet and connected mail networks.

5.1  Overview

   The encoding first compresses the binary object, using a modified
   LZ77 algorithm, called LZJU90.  It then encodes each 6 bits of the
   output of the compression as a text character, using a character set
   chosen to survive any translations between codes, such as ASCII to
   EBCDIC.  The 64 six-bit strings 000000 through 111111 are represented
   by the characters "+", "-", "0" to "9", "A" to "Z", and "a" to "z".
   The output text begins with a line identifying the encoding.  This is
   for visual reference only, the "Encoding:" field in the header
   identifies the section to the user program.  It also names the object
   that was encoded, usually by a file name.

   The format of this line is:

                * LZJU90 <name>


   where <name> is optional.  For example:

                * LZJU90 vmunix

   This is followed by the compressed and encoded data, broken into
   lines where convenient.  It is recommended that lines be broken every
   78 characters to survive mailers than incorrectly restrict line
   length.  The decoder must accept lines with 1 to 1000 characters on
   each line.  After this, there is one final line that gives the number
   of bytes in the original data and a CRC of the original data.  This



Costanzo, Robinson & Ullmann                                   [Page 18]

RFC 1505                 Encoding Header Field               August 1993


   should match the byte count and CRC found during decompression.

   This line has the format:

                * <count> <CRC>


   where <count> is a decimal number, and CRC is 8 hexadecimal digits.
   For example:

                * 4128076 5AC2D50E

   The count used in the Encoding:  field in the message header is the
   total number of lines, including the start and end lines that begin
   with *.  A complete example is given in section 5.3.2.

5.2  Specification of the LZJU90 compression

   The Lempel-Ziv-Storer-Szymanski model of mixing pointers and literal
   characters is used in the compression algorithm.  Repeat occurrences
   of strings of octets are replaced by pointers to the earlier
   occurrence.

   The data compression is defined by the decoding algorithm.  Any
   encoder that emits symbols which cause the decoder to produce the
   original input is defined to be valid.

   There are many possible strategies for the maximal-string matching
   that the encoder does, section 5.3.1 gives the code for one such
   algorithm.  Regardless of which algorithm is used, and what tradeoffs
   are made between compression ratio and execution speed or space, the
   result can always be decoded by the simple decoder.

   The compressed data consists of a mixture of unencoded literal
   characters and copy pointers which point to an earlier occurrence of
   the string to be encoded.

   Compressed data contains two types of codewords:

   LITERAL pass the literal directly to the uncompressed output.

   COPY    length, offset
           go back offset characters in the output and copy length
           characters forward to the current position.

   To distinguish between codewords, the copy length is used.  A copy
   length of zero indicates that the following codeword is a literal
   codeword.  A copy length greater than zero indicates that the



Costanzo, Robinson & Ullmann                                   [Page 19]

RFC 1505                 Encoding Header Field               August 1993


   following codeword is a copy codeword.

   To improve copy length encoding, a threshold value of 2 has been
   subtracted from the original copy length for copy codewords, because
   the minimum copy length is 3 in this compression scheme.

   The maximum offset value is set at 32255.  Larger offsets offer
   extremely low improvements in compression (less than 1 percent,
   typically).

   No special encoding is done on the LITERAL characters.  However,
   unary encoding is used for the copy length and copy offset values to
   improve compression.  A start-step-stop unary code is used.

   A (start, step, stop) unary code of the integers is defined as
   follows:  The Nth codeword has N ones followed by a zero followed by
   a field of size START + (N * STEP).  If the field width is equal to
   STOP then the preceding zero can be omitted.  The integers are laid
   out sequentially through these codewords.  For example, (0, 1, 4)
   would look like:

             Codeword      Range

             0             0
             10x           1-2
             110xx         3-6
             1110xxx       7-14
             1111xxxx      15-30

   Following are the actual values used for copy length and copy offset:

   The copy length is encoded with a (0, 1, 7) code leading to a maximum
   copy length of 256 by including the THRESHOLD value of 2.

             Codeword       Range

             0              0
             10x            3-4
             110xx          5-8
             1110xxx        9-16
             11110xxxx      17-32
             111110xxxxx    33-64
             1111110xxxxxx  65-128
             1111111xxxxxxx 129-256

   The copy offset is encoded with a (9, 1, 14) code leading to a
   maximum copy offset of 32255.  Offset 0 is reserved as an end of
   compressed data flag.



Costanzo, Robinson & Ullmann                                   [Page 20]

RFC 1505                 Encoding Header Field               August 1993


             Codeword       Range

             0xxxxxxxxx                0-511
             10xxxxxxxxxx            512-1535
             110xxxxxxxxxxx         1536-3583
             1110xxxxxxxxxxxx       3485-7679
             11110xxxxxxxxxxxxx     7680-15871
             11111xxxxxxxxxxxxxx   15872-32255

   The 0 has been chosen to signal the start of the field for ease of
   encoding.  (The bit generator can simply encode one more bit than is
   significant in the binary representation of the excess.)

   The stop values are useful in the encoding to prevent out of range
   values for the lengths and offsets, as well as shortening some codes
   by one bit.

   The worst case compression using this scheme is a 1/8 increase in
   size of the encoded data.  (One zero bit followed by 8 character
   bits).  After the character encoding, the worst case ratio is 3/2 to
   the original data.

   The minimum copy length of 3 has been chosen because the worst case
   copy length and offset is 3 bits (3) and 19 bits (32255) for a total
   of 22 bits to encode a 3 character string (24 bits).

5.3  The Decoder

   As mentioned previously, the compression is defined by the decoder.
   Any encoder that produced output that is correctly decoded is by
   definition correct.

   The following is an implementation of the decoder, written more for
   clarity and as much portability as possible, rather than for maximum
   speed.

   When optimized for a specific environment, it will run significantly
   faster.

    /* LZJU 90 Decoding program */

    /* Written By Robert Jung and Robert Ullmann, 1990 and 1991. */

    /* This code is NOT COPYRIGHT, not protected. It is in the true
       Public Domain. */

    #include <stdio.h>
    #include <string.h>



Costanzo, Robinson & Ullmann                                   [Page 21]

RFC 1505                 Encoding Header Field               August 1993


    typedef unsigned char uchar;
    typedef unsigned int  uint;

    #define N          32255
    #define THRESHOLD      3

    #define STRTP          9
    #define STEPP          1
    #define STOPP         14
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -