crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction
authorTim Chen <tim.c.chen@linux.intel.com>
Thu, 27 Sep 2012 22:44:22 +0000 (15:44 -0700)
committerHerbert Xu <herbert@gondor.apana.org.au>
Mon, 15 Oct 2012 14:18:24 +0000 (22:18 +0800)
commit6a8ce1ef3940e0cab5ff5f11e1cff5301f83fef6
tree3c407b6f40b1dfdf01310348996dc9b939f4e600
parent35b80920d4f0253fed03a1c3a345df8578dbd057
crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction

This patch adds the crc_pcl function that calculates CRC32C checksum using the
PCLMULQDQ instruction on processors that support this feature. This will
provide speedup over using CRC32 instruction only.
The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and
kernel_fpu_end and incur some overhead.  So the new crc_pcl function is only
invoked for buffer size of 512 bytes or more.  Larger sized
buffers will expect to see greater speedup.  This feature is best used coupled
with eager_fpu which reduces the kernel_fpu_begin/end overhead.  For
buffer size of 1K the speedup is around 1.6x and for buffer size greater than
4K, the speedup is around 3x compared to original implementation in crc32c-intel
module. Test was performed on Sandy Bridge based platform with constant frequency
set for cpu.

A white paper detailing the algorithm can be found here:
http://download.intel.com/design/intarch/papers/323405.pdf

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
arch/x86/crypto/Makefile
arch/x86/crypto/crc32c-intel_glue.c
arch/x86/crypto/crc32c-pcl-intel-asm_64.S [new file with mode: 0644]
crypto/Kconfig