algorithm.doc

来自「linux下从网卡远程启动」· DOC 代码 · 共 59 行

DOC

59 行

The  compressor achieves  an  average compression  rate of 60%  of theoriginal size which is on par with "gzip". It seems that you cannot domuch better for compressing  compiled  binaries.  This means that  thebreak even  point  for using compressed  images is   reached, once theuncompressed size approaches 1.5kB. We  can stuff more than 12kB  intoan 8kB EPROM and more than 25kB into an 16kB EPROM.   As there is only32kB of RAM  for both the uncompressed  image  and its BSS  area, thismeans that 32kB EPROMs will hardly ever be required.The compression  algorithm uses a  4kB  ring buffer  for buffering theuncompressed data. Before   compression starts,  the  ring buffer   isfilled  with spaces (ASCII  character  0x20).  The algorithm tries  tofind repeated  input sequences of a  maximum length of  60 bytes.  All256 different input  bytes  plus the 58 (60   minus a threshold of  2)possible  repeat lengths form a set  of 314 symbols. These symbols areadaptively Huffman encoded.  The  algorithm starts out with a Huffmanntree  that  assigns equal code lengths    to each of  the  314 symbols(slightly favoring the repeat  symbols over symbols for regular  inputcharacters), but  it will be changed whenever  the frequency of any ofthe symbols  changes. Frequency counts are  kept in 16bit  words untilthe total number of compressed codes totals 2^15.  Then, all frequencycounts will be halfed (rounding to the bigger number).  For unrepeatedcharacters (symbols 0..255) the Huffman code  is written to the outputstream.  For repeated characters the  Huffmann code, which denotes thelength of the repeated character sequence, is written out and then theindex in the ring buffer is computed.   From this index, the algorithmcomputes  the offset   relative to  the current  index  into  the ringbuffer. Thus,  for typical input data,  one would expect that short tomedium range offsets are more frequent  than extremely short or mediumrange to long range offsets. Thus the  12bit (for a 4kB buffer) offsetvalue  is statically Huffman encoded  using a precomputed Huffman treethat favors  those  offset  values    that  are deemed to   be    morefrequent. The  Huffman encoded offset  is  written to the output  datastream,  directly  following the code  that   determines the length ofrepeated characters.This algorithm, as implemented in the  C example code, looks very goodand  its operating parameters are   already well optimized. This  alsoexplains   why  it achieves     compression ratios    comparable  with"gzip". Depending on the input data, it sometimes excells considerablybeyond what "gzip -9" does, but this  phenomenon does not appear to betypical. There are some flaws with  the algorithm, such as the limitedbuffer  sizes, the  adaptive  Huffman tree  which takes  very  long tochange, if    the input  characters  experience   a sudden   change indistribution, and the static Huffman   tree for encoding offsets  intothe  buffer.   The slow  changes of   the  adaptive  Huffman  tree arepartially counteracted by  artifically keeping  a 16bit precision  forthe frequency counts, but  this does not  come into play until 32kB ofcompressed data is output, so  it does not  have any impact on our usefor "etherboot", because  the BOOT Prom  does not support uncompresseddata of more then 32kB (c.f. doc/spec.doc).Nonetheless,  these problems  do  not  seem  to affect  compression ofcompiled  programs very much.  Mixing  object code with English  text,would not work too  well though, and  the algorithm should be reset inbetween. Actually, we  might  gain a little  improvement, if  text anddata   segments    were compressed  individually,    but   I have  notexperimented with this option, yet.

algorithm.doc - 源码说明

本页面展示了「linux下从网卡远程启动」中的 algorithm.doc 源码文件，采用 DOC 编程语言编写，共 59 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与linux相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?