⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 algorithm.doc

📁 linux下从网卡远程启动
💻 DOC
字号:
The  compressor achieves  an  average compression  rate of 60%  of theoriginal size which is on par with "gzip". It seems that you cannot domuch better for compressing  compiled  binaries.  This means that  thebreak even  point  for using compressed  images is   reached, once theuncompressed size approaches 1.5kB. We  can stuff more than 12kB  intoan 8kB EPROM and more than 25kB into an 16kB EPROM.   As there is only32kB of RAM  for both the uncompressed  image  and its BSS  area, thismeans that 32kB EPROMs will hardly ever be required.The compression  algorithm uses a  4kB  ring buffer  for buffering theuncompressed data. Before   compression starts,  the  ring buffer   isfilled  with spaces (ASCII  character  0x20).  The algorithm tries  tofind repeated  input sequences of a  maximum length of  60 bytes.  All256 different input  bytes  plus the 58 (60   minus a threshold of  2)possible  repeat lengths form a set  of 314 symbols. These symbols areadaptively Huffman encoded.  The  algorithm starts out with a Huffmanntree  that  assigns equal code lengths    to each of  the  314 symbols(slightly favoring the repeat  symbols over symbols for regular  inputcharacters), but  it will be changed whenever  the frequency of any ofthe symbols  changes. Frequency counts are  kept in 16bit  words untilthe total number of compressed codes totals 2^15.  Then, all frequencycounts will be halfed (rounding to the bigger number).  For unrepeatedcharacters (symbols 0..255) the Huffman code  is written to the outputstream.  For repeated characters the  Huffmann code, which denotes thelength of the repeated character sequence, is written out and then theindex in the ring buffer is computed.   From this index, the algorithmcomputes  the offset   relative to  the current  index  into  the ringbuffer. Thus,  for typical input data,  one would expect that short tomedium range offsets are more frequent  than extremely short or mediumrange to long range offsets. Thus the  12bit (for a 4kB buffer) offsetvalue  is statically Huffman encoded  using a precomputed Huffman treethat favors  those  offset  values    that  are deemed to   be    morefrequent. The  Huffman encoded offset  is  written to the output  datastream,  directly  following the code  that   determines the length ofrepeated characters.This algorithm, as implemented in the  C example code, looks very goodand  its operating parameters are   already well optimized. This  alsoexplains   why  it achieves     compression ratios    comparable  with"gzip". Depending on the input data, it sometimes excells considerablybeyond what "gzip -9" does, but this  phenomenon does not appear to betypical. There are some flaws with  the algorithm, such as the limitedbuffer  sizes, the  adaptive  Huffman tree  which takes  very  long tochange, if    the input  characters  experience   a sudden   change indistribution, and the static Huffman   tree for encoding offsets  intothe  buffer.   The slow  changes of   the  adaptive  Huffman  tree arepartially counteracted by  artifically keeping  a 16bit precision  forthe frequency counts, but  this does not  come into play until 32kB ofcompressed data is output, so  it does not  have any impact on our usefor "etherboot", because  the BOOT Prom  does not support uncompresseddata of more then 32kB (c.f. doc/spec.doc).Nonetheless,  these problems  do  not  seem  to affect  compression ofcompiled  programs very much.  Mixing  object code with English  text,would not work too  well though, and  the algorithm should be reset inbetween. Actually, we  might  gain a little  improvement, if  text anddata   segments    were compressed  individually,    but   I have  notexperimented with this option, yet.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -