📄 rfc1505.txt
字号:
4.2.3 Type
The type of an object is usually of interest only to the operating
system that the object was created on.
Types are:
ACAT access category (Primos)
CAM contiguous access method (Primos)
DAM direct access method (Primos)
FIXED fixed length records (VMS)
FLAT `flat file', sequence of bytes (Unix, DOS, default)
ISAM indexed-sequential access method (VMS)
LINK soft link (Unix)
MAC Macintosh file
SAM sequential access method (Primos)
SEGSAM segmented direct access method (Primos)
SEGDAM segmented sequential access method (Primos)
TEXT lines of ISO-10646-UTF-1 text ending with CR/LF
VAR variable length records (VMS)
4.2.4 Created
Indicates the creation date of the file. Dates are in the format
defined in section 4.3.
4.2.5 Modified
Indicates the date and time the file was last modified or closed
after being open for write.
4.2.6 Accessed
Indicates the date and time the file was last accessed on the
original file system.
4.2.7 Owner
The owner directive gives the name or numerical ID of the owner or
creator of the file.
Costanzo, Robinson & Ullmann [Page 15]
RFC 1505 Encoding Header Field August 1993
4.2.8 Group
The group directive gives the name(s) or numerical IDs of the group
or groups to which the file belongs.
4.2.9 ACL
This directive specifies the access control list attribute of an
object (the ACL attribute may occur more than once within an object).
The list consist of a series of pairs of IDs and access codes in the
format:
user-ID:access-list
There are four reserved IDs:
$OWNER the owner or creator
$GROUP a member of the group or groups
$SYSTEM a system administrator
$REST everyone else
The access list is zero or more single letters:
A add (create file)
D delete
L list (read directory)
P change protection
R read
U use
W write
X execute
* all possible access
4.2.10 Password
The password attribute gives the access password for this object.
Since the content of the object follows (being the raison d'etre of
the encoding), the appearance of the password in plain text is not
considered a security problem. If the password is actually set by
the decoder on a created object, the security (or lack) is the
responsibility of the application domain controlling the decoder as
is true of ACL and other protections.
4.2.11 Block
The block attribute gives the block size of the file as a decimal
number of bytes.
Costanzo, Robinson & Ullmann [Page 16]
RFC 1505 Encoding Header Field August 1993
4.2.12 Record
The record attribute gives the record size of the file as a decimal
number of bytes.
4.2.13 Application
This specifies the application that the file was created with or
belongs to. This is of particular interest for Macintosh files.
4.3 Date Field
Various attributes have a date and time subsequent to and associated
with them.
4.3.1 Syntax
The syntax of the date field is a combination of date, time, and
timezone:
DD Mon YYYY HH:MM:SS.FFFFFF [+-]HHMMSS
Date := DD Mon YYYY 1 or 2 Digits " " 3 Alpha " " 4 Digits
DD := Day e.g. "08", " 8", "8"
Mon := Month "Jan" | "Feb" | "Mar" | "Apr" |
"May" | "Jun" | "Jul" | "Aug" |
"Sep" | "Oct" | "Nov" | "Dec"
YYYY := Year
Time := HH:MM:SS.FFFFFF 2 Digits ":" 2 Digits [ ":" 2 Digits
["." 1 to 6 Digits ] ]
e.g. 00:00:00, 23:59:59.999999
HH := Hours 00 to 23
MM := Minutes 00 to 59
SS := Seconds 00 to 60 (60 only during a leap second)
FFFFF:= Fraction
Zone := [+-]HHMMSS "+" | "-" 2 Digits [ 2 Digits
[ 2 Digits ] ]
HH := Local Hour Offset
MM := Local Minutes Offset
SS := Local Seconds Offset
4.3.2 Semantics
The date information is that which the file system has stored in
regard to the file system object. Date information is stored
differently and with varying degrees of precision by different
computer file systems. An encoder must include as much date
information as it has available concerning the file system object. A
Costanzo, Robinson & Ullmann [Page 17]
RFC 1505 Encoding Header Field August 1993
decoder which receives an object encoded with a date field containing
greater precision than its own must disregard the excessive
information. Zone is Co-ordinated Universal Time "UTC" (formerly
called "Greenwich Mean Time"). The field specifies the time zone of
the file system object as an offset from Universal Time. It is
expressed as a signed [+-] two, four or six digit number.
A file that was created April 15, 1993 at 8:05 p.m. in Roselle Park,
New Jersey, U.S.A. might have a date field which looks like:
15 Apr 1993 20:05:22.12 -0500
5. LZJU90: Compressed Encoding
LZJU90 is an encoding for a binary or text object to be sent in an
Internet mail message. The encoding provides both compression and
representation in a text format that will successfully survive
transmission through the many different mailers and gateways that
comprise the Internet and connected mail networks.
5.1 Overview
The encoding first compresses the binary object, using a modified
LZ77 algorithm, called LZJU90. It then encodes each 6 bits of the
output of the compression as a text character, using a character set
chosen to survive any translations between codes, such as ASCII to
EBCDIC. The 64 six-bit strings 000000 through 111111 are represented
by the characters "+", "-", "0" to "9", "A" to "Z", and "a" to "z".
The output text begins with a line identifying the encoding. This is
for visual reference only, the "Encoding:" field in the header
identifies the section to the user program. It also names the object
that was encoded, usually by a file name.
The format of this line is:
* LZJU90 <name>
where <name> is optional. For example:
* LZJU90 vmunix
This is followed by the compressed and encoded data, broken into
lines where convenient. It is recommended that lines be broken every
78 characters to survive mailers than incorrectly restrict line
length. The decoder must accept lines with 1 to 1000 characters on
each line. After this, there is one final line that gives the number
of bytes in the original data and a CRC of the original data. This
Costanzo, Robinson & Ullmann [Page 18]
RFC 1505 Encoding Header Field August 1993
should match the byte count and CRC found during decompression.
This line has the format:
* <count> <CRC>
where <count> is a decimal number, and CRC is 8 hexadecimal digits.
For example:
* 4128076 5AC2D50E
The count used in the Encoding: field in the message header is the
total number of lines, including the start and end lines that begin
with *. A complete example is given in section 5.3.2.
5.2 Specification of the LZJU90 compression
The Lempel-Ziv-Storer-Szymanski model of mixing pointers and literal
characters is used in the compression algorithm. Repeat occurrences
of strings of octets are replaced by pointers to the earlier
occurrence.
The data compression is defined by the decoding algorithm. Any
encoder that emits symbols which cause the decoder to produce the
original input is defined to be valid.
There are many possible strategies for the maximal-string matching
that the encoder does, section 5.3.1 gives the code for one such
algorithm. Regardless of which algorithm is used, and what tradeoffs
are made between compression ratio and execution speed or space, the
result can always be decoded by the simple decoder.
The compressed data consists of a mixture of unencoded literal
characters and copy pointers which point to an earlier occurrence of
the string to be encoded.
Compressed data contains two types of codewords:
LITERAL pass the literal directly to the uncompressed output.
COPY length, offset
go back offset characters in the output and copy length
characters forward to the current position.
To distinguish between codewords, the copy length is used. A copy
length of zero indicates that the following codeword is a literal
codeword. A copy length greater than zero indicates that the
Costanzo, Robinson & Ullmann [Page 19]
RFC 1505 Encoding Header Field August 1993
following codeword is a copy codeword.
To improve copy length encoding, a threshold value of 2 has been
subtracted from the original copy length for copy codewords, because
the minimum copy length is 3 in this compression scheme.
The maximum offset value is set at 32255. Larger offsets offer
extremely low improvements in compression (less than 1 percent,
typically).
No special encoding is done on the LITERAL characters. However,
unary encoding is used for the copy length and copy offset values to
improve compression. A start-step-stop unary code is used.
A (start, step, stop) unary code of the integers is defined as
follows: The Nth codeword has N ones followed by a zero followed by
a field of size START + (N * STEP). If the field width is equal to
STOP then the preceding zero can be omitted. The integers are laid
out sequentially through these codewords. For example, (0, 1, 4)
would look like:
Codeword Range
0 0
10x 1-2
110xx 3-6
1110xxx 7-14
1111xxxx 15-30
Following are the actual values used for copy length and copy offset:
The copy length is encoded with a (0, 1, 7) code leading to a maximum
copy length of 256 by including the THRESHOLD value of 2.
Codeword Range
0 0
10x 3-4
110xx 5-8
1110xxx 9-16
11110xxxx 17-32
111110xxxxx 33-64
1111110xxxxxx 65-128
1111111xxxxxxx 129-256
The copy offset is encoded with a (9, 1, 14) code leading to a
maximum copy offset of 32255. Offset 0 is reserved as an end of
compressed data flag.
Costanzo, Robinson & Ullmann [Page 20]
RFC 1505 Encoding Header Field August 1993
Codeword Range
0xxxxxxxxx 0-511
10xxxxxxxxxx 512-1535
110xxxxxxxxxxx 1536-3583
1110xxxxxxxxxxxx 3485-7679
11110xxxxxxxxxxxxx 7680-15871
11111xxxxxxxxxxxxxx 15872-32255
The 0 has been chosen to signal the start of the field for ease of
encoding. (The bit generator can simply encode one more bit than is
significant in the binary representation of the excess.)
The stop values are useful in the encoding to prevent out of range
values for the lengths and offsets, as well as shortening some codes
by one bit.
The worst case compression using this scheme is a 1/8 increase in
size of the encoded data. (One zero bit followed by 8 character
bits). After the character encoding, the worst case ratio is 3/2 to
the original data.
The minimum copy length of 3 has been chosen because the worst case
copy length and offset is 3 bits (3) and 19 bits (32255) for a total
of 22 bits to encode a 3 character string (24 bits).
5.3 The Decoder
As mentioned previously, the compression is defined by the decoder.
Any encoder that produced output that is correctly decoded is by
definition correct.
The following is an implementation of the decoder, written more for
clarity and as much portability as possible, rather than for maximum
speed.
When optimized for a specific environment, it will run significantly
faster.
/* LZJU 90 Decoding program */
/* Written By Robert Jung and Robert Ullmann, 1990 and 1991. */
/* This code is NOT COPYRIGHT, not protected. It is in the true
Public Domain. */
#include <stdio.h>
#include <string.h>
Costanzo, Robinson & Ullmann [Page 21]
RFC 1505 Encoding Header Field August 1993
typedef unsigned char uchar;
typedef unsigned int uint;
#define N 32255
#define THRESHOLD 3
#define STRTP 9
#define STEPP 1
#define STOPP 14
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -