论文标题
Cryptopt:通过随机程序搜索加密原语(完整版)的验证汇编(完整版)
CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)
论文作者
论文摘要
大多数软件域依靠编译器将高级代码转换为多种不同的机器语言,其性能并不比开发人员具有直接用汇编语言写入的耐心差得多。但是,密码学一直是一个例外,在这种情况下,许多关键性的例程直接写在组装中(有时是通过元编程层)。过去的一些工作已经显示了如何对该组件进行正式验证,而其他工作则显示了如何自动生成C代码以及正式的证明,但其性能惩罚与最著名的组装相比。我们提出了Cryptopt,这是第一个将高级加密功能程序专门用于组装代码的编译管道,其最终定理语句的最终定理功能程序和X86-64组装的操作语义远远超出了GCC或CLANG的产生速度,其最终定理陈述却几乎没有提及其最终定理语句。在优化方面,我们通过汇编程序的空间进行随机搜索,并在目标CPU上重复进行自动基准测试。在正式验证方面,我们连接到菲亚特密码框架(将功能程序转化为类似C样的IR代码),并使用新的正式验证的程序等效性检查器将其扩展,并结合了SMT Solvers和Symbolic-Scecution引擎的适度子集。总体原型非常实用,例如为Curve25519(TLS标准的一部分)和Intel $ 12^{th} $和$ 13^{th} $ Generations生成了曲线25519(TLS标准的一部分)和比特币椭圆曲线SECP256K1的有限场算术的新最新实现。
Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best-known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1 for the Intel $12^{th}$ and $13^{th}$ generations.