📄 aes-586.pl
字号:
#!/usr/bin/env perl## ====================================================================# Written by Andy Polyakov <appro@fy.chalmers.se> for the OpenSSL# project. Rights for redistribution and usage in source and binary# forms are granted according to the OpenSSL license.# ====================================================================## Version 3.6.## You might fail to appreciate this module performance from the first# try. If compared to "vanilla" linux-ia32-icc target, i.e. considered# to be *the* best Intel C compiler without -KPIC, performance appears# to be virtually identical... But try to re-configure with shared# library support... Aha! Intel compiler "suddenly" lags behind by 30%# [on P4, more on others]:-) And if compared to position-independent# code generated by GNU C, this code performs *more* than *twice* as# fast! Yes, all this buzz about PIC means that unlike other hand-# coded implementations, this one was explicitly designed to be safe# to use even in shared library context... This also means that this# code isn't necessarily absolutely fastest "ever," because in order# to achieve position independence an extra register has to be# off-loaded to stack, which affects the benchmark result.## Special note about instruction choice. Do you recall RC4_INT code# performing poorly on P4? It might be the time to figure out why.# RC4_INT code implies effective address calculations in base+offset*4# form. Trouble is that it seems that offset scaling turned to be# critical path... At least eliminating scaling resulted in 2.8x RC4# performance improvement [as you might recall]. As AES code is hungry# for scaling too, I [try to] avoid the latter by favoring off-by-2# shifts and masking the result with 0xFF<<2 instead of "boring" 0xFF.## As was shown by Dean Gaudet <dean@arctic.org>, the above note turned# void. Performance improvement with off-by-2 shifts was observed on# intermediate implementation, which was spilling yet another register# to stack... Final offset*4 code below runs just a tad faster on P4,# but exhibits up to 10% improvement on other cores.## Second version is "monolithic" replacement for aes_core.c, which in# addition to AES_[de|en]crypt implements AES_set_[de|en]cryption_key.# This made it possible to implement little-endian variant of the# algorithm without modifying the base C code. Motivating factor for# the undertaken effort was that it appeared that in tight IA-32# register window little-endian flavor could achieve slightly higher# Instruction Level Parallelism, and it indeed resulted in up to 15%# better performance on most recent
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -