Skip to content

Emill/rsa-armv7

Repository files navigation

RSA for 32-bit ARM processors (Cortex-M4, Cortex-M33, Cortex-A7 etc.)

This library implements highly efficient assembler optimized RSA for ARMv7E-M processors or later. For Cortex-M33, the processor must have the DSP extension.

Supported operations

The library implements the most common operations in RFC 8017.

  • Encryption and decryption using RSAES PKCS #1 v1.5 (RSAES-PKCS1-v1_5).
  • Signature generation and verification using RSASSA-PKCS1-v1_5.
  • Signature generation and verification using RSASSA-PSS (limited to MGF1 with the same hash algorithm as used for the message, with salt length equal to hash length). This is compatible with the requirements of TLS 1.3.
  • Included is also functionality for generic big integer modular exponentiation and can thus be used in e.g. SRP implementations.

RSA key generation functionality is not included.

The library can handle any valid key sizes, but it is recommended that the application limits it to e.g. 4096 or 8192 bits to avoid resource exhaustion, especially on untrusted inputs, since the time complexity is quadratic (in terms of key size) for public key operations, with fixed exponent.

Performance

The library is designed to be highly optimized mostly for performance, but also for code size. It is designed to be the fastest available library on the market for typical key sizes such as 2048-4096 bits while still keeping the code size reasonably small.

For private key operations, the code uses the CRT optimization and thus requires the private key to be in such format.

The following numbers were obtained on nRF52 (Cortex-M4) with ICACHE enabled, using GCC with the -O2 optimization level.

Operation Cycles Time at 64 MHz
Public 2048-bit (e=3) 53.4k 0.833 ms
Public 2048-bit (e=65537) 257k 4.02 ms
Public 3072-bit (e=65537) 539k 8.42 ms
Public 4096-bit (e=65537) 928k 14.5 ms
Private 2048-bit 10.8M 169 ms
Private 3072-bit 32.8M 513 ms
Private 4096-bit 73.6M 1.15 s

Note that Cortex-M33 runs the same code in about the same number of cycles, around 2% faster typically than Cortex-M4.

To the numbers above, overhead for the specific high-level RSA operations built on top of big number exponentiation (sign/verify/encrypt/decrypt) must be added, but is typically relatively small. Hashing time is not included either.

The following numbers were obtained on ARM Cortex-A7, using GCC with the -O2 optimization level.

Operation Time at 1 GHz
Public 2048-bit (e=65537) 0.437 ms
Public 3072-bit (e=65537) 0.951 ms
Public 4096-bit (e=65537) 1.661 ms
Private 2048-bit 18.3 ms
Private 3072-bit 57.4 ms
Private 4096-bit 131.7 ms

Comparison with other libraries

WolfSSL 5.6.4 benchmarks show that their RSA 2048 public operation takes 24.738 ms and their RSA 2048 private operation takes 664.5 ms on an 80 MHz Cortex-M4 processor.

The ocrypto library in nRF Connect SDK 3.0.0 uses around 1.03M cycles for the RSA 2048 public key operation (e=65537) and around 46M cycles for the RSA 2048 private key operation on nRF52 with ICACHE enabled.

OpenSSL 3.3.1 (using NEON SIMD instructions) uses around 0.626 ms for the RSA 2048 public key operation (e=65537) and around 25.5 ms for the RSA 2048 private key operation on the same ARM Cortex-A7 1 GHz system as used for the performance measurement of this library above.

This shows that the performance of this library is highly competitive and several times faster than other alternatives, at least for Cortex-M processors.

Code size

The big integer arithmetic functions make up around 3.1 kB of space in compiled form when targeting Cortex-M4. The RSA high level operations make up additionally around 1.7 kB (when all features are used).

If only the RSASSA-PKCS1-v1_5 verify operation is used, the library in total uses 2.4 kB of compiled code.

Stack usage

This library does not allocate any significant amounts of stack space. Instead, the caller must supply a temporary work space area where the needed space depends on the key size. The header files define macros that can be used to calculate the necessary buffer size. It is suggested that the application limits the allowed key size to e.g. 4096 bits and then allocates the temporary work space area on the stack, assuming the stack is large enough.

Security

No branches (control flow) depend on secret data or a message to be encrypted.

For public key operations, the running time and memory access pattern depends only on the bit pattern of the public key as well as the size of the operands.

For private key operations, the running time and program execution flow depends only on the byte length of the exponents and the number of significant bits in the modulus components (n, p, q). It is configurable whether the memory access pattern may depend on the private key or not. If true, the address of some memory loads from RAM will depend on the private key, which is typically OK for embedded SoCs running Cortex-M4 or Cortex-M33 processors, but not for systems having data caches, like Cortex-A systems. Therefore, it is configurable whether a constant memory access pattern should be used or not in the bignum_config.h file. Alternatively, set the define CONSTANT_MEMORY_ACCESS_PATTERN to 1 directly when compiling the library if you would like this to be enabled.

Documentation

See the header files for API usage.

License

The library is licensed under the BSD-2-Clause license, which permits redistributions as long as the copyright notice and license text is included.

About

RSA for 32-bit ARM processors (Cortex-M4, Cortex-M33, Cortex-A7 etc.)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published