📄 readme.txt

📁 csv解析类
💻 TXT
字号:
CSV Format Encoder and Decoder
Make use of Comma-separated values file format.
by Liu Junfeng @ 2008-1-15

The CSV (Comma Separated Values) file format is often used to exchange data between disparate applications.
CSV has much lower overhead, thereby using much less bandwidth and storage than XML.
Many informal documents exist that describe the CSV format. There are differences in how special characters are handled.
Here I propose a solution that most people will agree with.

CSV data contains a list of records and a record contains a list of fields.
Records don't required to have the same number of fields.

Basic rules:
(1) Fields are separated with commas.
(2) Each record occupies just one line.

Extended rules:
(3) Padding spaces can be added ahead of a field.
(4) Fields may always be delimited with double quotes.
(5) The first record may be a record of column names.

Sepcial rules:
(6) If a field value contains leading or trailing space or sepcial chars of comma, double-quote or linebreak, it must be enclosed by double-quotes.
(7) Within double-quoted string, \\, \", \r, \n, \t are treated as escape sequence.
(8) Empty string doublequoted, null string doesn't.

Storage rules:
(9) The text is treate as unicode, load and save to file using a specific encoding.
Usually UTF8 or UTF16 can be used. The byte order mark of UTF8 is optional, of UTF16 is required.

Padding spaces can be used to align fields to the same column.
To stick to the basic rules, sepcial chars need be handled by sepcial rules.

Grammar of CSV expressed in PEG:

CsvData <- Record* EndOfFile
Record <- Field (Separator Field)* EnfOfLine
Field <- Spacing (UnQuotedText / QuotedText)
UnQuotedText <- (-",\"\r\n")*
QuotedText <- '"' (-"\"\r\n\\" / EscapeSequence)* '"'
EscapeSequence <- '\\\\' / '\\"' / '\\r' / '\\n' / '\\t'
Spacing <- Space*
Space <- ' ' / '\t'
Separator <- ','
EnfOfLine <- '\r\n' / '\r' / '\n'
EndOfFile <- <end>

Here ",\"\r\n" means a char set, -",\"\r\n" means a complement char set.
According to this grammar, only leading spaces are ignored and each record must end with line break chars.
These both simplify the grammar and have no hurt to the formatting style.

Other questions:
1.How to encode binary data?
CSV is not a suitable format to store large block of binary data. 
For small binary fields, they can be converted to text using Bin2Hex, Base64, etc.

2.How to encode multiple tables?
A blank line is read as a record with field having null value.
Normally a table has more than one columns, so blank lines can be used to separate tables.

These features can be handled by user of CSV encoder/decoder.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -