[PATCH v5 00/27] Integrate MbedTLS v3.6 LTS with U-Boot
Raymond Mao
raymond.mao at linaro.org
Wed Aug 14 19:49:43 CEST 2024
Hi Tom,
On Wed, 14 Aug 2024 at 11:07, Tom Rini <trini at konsulko.com> wrote:
> On Wed, Aug 14, 2024 at 09:42:08AM -0400, Raymond Mao wrote:
> > Hi Tom,
> >
> > On Fri, 2 Aug 2024 at 11:34, Raymond Mao <raymond.mao at linaro.org> wrote:
> >
> > > Hi Tom,
> > >
> > > On Thu, 1 Aug 2024 at 16:46, Tom Rini <trini at konsulko.com> wrote:
> > >
> > >> On Wed, Jul 31, 2024 at 10:25:10AM -0700, Raymond Mao wrote:
> > >> >
> > >> > Integrate MbedTLS v3.6 LTS (currently v3.6.0) with U-Boot.
> > >> >
> > >> > Motivations:
> > >> > ------------
> > >> >
> > >> > 1. MbedTLS is well maintained with LTS versions.
> > >> > 2. LWIP is integrated with MbedTLS and easily to enable HTTPS.
> > >> > 3. MbedTLS recently switched license back to GPLv2.
> > >> >
> > >> > Prerequisite:
> > >> > -------------
> > >> >
> > >> > This patch series requires mbedtls git repo to be added as a
> > >> > subtree to the main U-Boot repo via:
> > >> > $ git subtree add --prefix lib/mbedtls/external/mbedtls \
> > >> > https://github.com/Mbed-TLS/mbedtls.git \
> > >> > v3.6.0 --squash
> > >> > Moreover, due to the Windows-style files from mbedtls git repo,
> > >> > we need to convert the CRLF endings to LF and do a commit manually:
> > >> > $ git add --renormalize .
> > >> > $ git commit
> > >> >
> > >> > New Kconfig options:
> > >> > --------------------
> > >> >
> > >> > `MBEDTLS_LIB` is for MbedTLS general switch.
> > >> > `MBEDTLS_LIB_CRYPTO` is for replacing original digest and crypto
> libs
> > >> with
> > >> > MbedTLS.
> > >> > `MBEDTLS_LIB_X509` is for replacing original X509, PKCS7, MSCode,
> ASN1,
> > >> > and Pubkey parser with MbedTLS.
> > >> > `LEGACY_CRYPTO` is introduced as a main switch for legacy crypto
> > >> library.
> > >> > `LEGACY_CRYPTO_BASIC` is for the basic crypto functionalities and
> > >> > `LEGACY_CRYPTO_CERT` is for the certificate related functionalities.
> > >> > For each of the algorithm, a pair of `<alg>_LEGACY` and
> `<alg>_MBEDTLS`
> > >> > Kconfig options are introduced. Meanwhile, `SPL_` Kconfig options
> are
> > >> > introduced.
> > >> >
> > >> > In this patch set, MBEDTLS_LIB, MBEDTLS_LIB_CRYPTO and
> MBEDTLS_LIB_X509
> > >> > are by default enabled in qemu_arm64_defconfig and sandbox_defconfig
> > >> > for testing purpose.
> > >> >
> > >> > Patches for external MbedTLS project:
> > >> > -------------------------------------
> > >> >
> > >> > Since U-Boot uses Microsoft Authentication Code to verify PE/COFFs
> > >> > executables which is not supported by MbedTLS at the moment,
> > >> > addtional patches for MbedTLS are created to adapt with the EFI
> loader:
> > >> > 1. Decoding of Microsoft Authentication Code.
> > >> > 2. Decoding of PKCS#9 Authenticate Attributes.
> > >> > 3. Extending MbedTLS PKCS#7 lib to support multiple signer's
> > >> certificates.
> > >> > 4. MbedTLS native test suites for PKCS#7 signer's info.
> > >> >
> > >> > All above 4 patches (tagged with `mbedtls/external`) are submitted
> to
> > >> > MbedTLS project and being reviewed, eventually they should be part
> of
> > >> > MbedTLS LTS release.
> > >> > But before that, please merge them into U-Boot, otherwise the
> building
> > >> > will be broken when MBEDTLS_LIB_X509 is enabled.
> > >> >
> > >> > See below PR link for the reference:
> > >> > https://github.com/Mbed-TLS/mbedtls/pull/9001
> > >> >
> > >> > Miscellaneous:
> > >> > --------------
> > >> >
> > >> > Optimized MbedTLS library size by tailoring the config file
> > >> > and disabling all unnecessary features for EFI loader.
> > >> > From v2, original libs (rsa, asn1_decoder, rsa_helper, md5, sha1,
> > >> sha256,
> > >> > sha512) are completely replaced when MbedTLS is enabled.
> > >> > From v3, the size-growth is slightly reduced by refactoring Hash
> > >> functions.
> > >> >
> > >> > Target(QEMU arm64) size-growth when enabling MbedTLS:
> > >> > v1: 6.03%
> > >> > v2: 4.66%
> > >> > From v3: 4.55%
> > >> >
> > >> > Please see the latest output from buildman for size-growth on QEMU
> > >> arm64,
> > >> > Sandbox and Nanopi A64. [1]
> > >>
> > >> Let us inline the growth on qemu_arm64 for a moment:
> > >> aarch64: (for 1/1 boards) all +6916.0 bss -32.0 data -64.0 rodata
> > >> +200.0 text +6812.0
> > >> qemu_arm64 : all +6916 bss -32 data -64 rodata +200
> text
> > >> +6812
> > >> u-boot: add: 28/-17, grow: 12/-16 bytes: 15492/-8304
> (7188)
> > >> function old
> new
> > >> delta
> > >> mbedtls_internal_sha1_process -
> 4540
> > >> +4540
> > >> mbedtls_internal_md5_process -
> 2928
> > >> +2928
> > >> mbedtls_internal_sha256_process -
> 2052
> > >> +2052
> > >> mbedtls_internal_sha512_process -
> 1056
> > >> +1056
> > >> K -
> 896
> > >> +896
> > >> mbedtls_sha512_finish -
> 556
> > >> +556
> > >> mbedtls_sha256_finish -
> 484
> > >> +484
> > >> mbedtls_sha1_finish -
> 420
> > >> +420
> > >> mbedtls_sha512_starts -
> 340
> > >> +340
> > >> mbedtls_md5_finish -
> 336
> > >> +336
> > >> mbedtls_sha512_update -
> 264
> > >> +264
> > >> mbedtls_sha256_update -
> 252
> > >> +252
> > >> mbedtls_sha1_update -
> 236
> > >> +236
> > >> mbedtls_md5_update -
> 236
> > >> +236
> > >> mbedtls_sha512 -
> 148
> > >> +148
> > >> mbedtls_sha256_starts -
> 124
> > >> +124
> > >> hash_init_sha512 52
> 128
> > >> +76
> > >> hash_init_sha256 52
> 128
> > >> +76
> > >> mbedtls_sha1_starts -
> 72
> > >> +72
> > >> mbedtls_md5_starts -
> 60
> > >> +60
> > >> hash_init_sha1 52
> 112
> > >> +60
> > >> mbedtls_platform_zeroize -
> 56
> > >> +56
> > >> mbedtls_sha512_free -
> 16
> > >> +16
> > >> mbedtls_sha256_free -
> 16
> > >> +16
> > >> mbedtls_sha1_free -
> 16
> > >> +16
> > >> mbedtls_md5_free -
> 16
> > >> +16
> > >> hash_finish_sha512 72
> 88
> > >> +16
> > >> hash_finish_sha256 72
> 88
> > >> +16
> > >> hash_finish_sha1 72
> 88
> > >> +16
> > >> sha512_csum_wd 68
> 80
> > >> +12
> > >> sha256_csum_wd 68
> 80
> > >> +12
> > >> sha1_csum_wd 68
> 80
> > >> +12
> > >> md5_wd 68
> 80
> > >> +12
> > >> mbedtls_sha512_init -
> 12
> > >> +12
> > >> mbedtls_sha256_init -
> 12
> > >> +12
> > >> mbedtls_sha1_init -
> 12
> > >> +12
> > >> mbedtls_md5_init -
> 12
> > >> +12
> > >> memset_func -
> 8
> > >> +8
> > >> sha512_update 4
> 8
> > >> +4
> > >> sha384_update 4
> 8
> > >> +4
> > >> sha256_update 12
> 8
> > >> -4
> > >> sha1_update 12
> 8
> > >> -4
> > >> sha256_process 16
> -
> > >> -16
> > >> sha1_process 16
> -
> > >> -16
> > >> hash_update_sha512 36
> 16
> > >> -20
> > >> hash_update_sha256 36
> 16
> > >> -20
> > >> hash_update_sha1 36
> 16
> > >> -20
> > >> MD5Init 56
> 36
> > >> -20
> > >> sha1_starts 60
> 36
> > >> -24
> > >> hash_update_sha384 36
> -
> > >> -36
> > >> hash_init_sha384 52
> -
> > >> -52
> > >> sha384_csum_wd 68
> 12
> > >> -56
> > >> sha256_starts 104
> 40
> > >> -64
> > >> sha256_padding 64
> -
> > >> -64
> > >> sha1_padding 64
> -
> > >> -64
> > >> hash_finish_sha384 72
> -
> > >> -72
> > >> sha512_finish 152
> 36
> > >> -116
> > >> sha512_starts 168
> 40
> > >> -128
> > >> sha384_starts 168
> 40
> > >> -128
> > >> sha384_finish 152
> 4
> > >> -148
> > >> MD5Final 196
> 44
> > >> -152
> > >> sha512_base_do_finalize 160
> -
> > >> -160
> > >> static.sha256_update 228
> -
> > >> -228
> > >> static.sha1_update 240
> -
> > >> -240
> > >> sha512_base_do_update 244
> -
> > >> -244
> > >> MD5Update 260
> -
> > >> -260
> > >> sha1_finish 300
> 36
> > >> -264
> > >> sha256_finish 404
> 36
> > >> -368
> > >> sha256_armv8_ce_process 428
> -
> > >> -428
> > >> sha1_armv8_ce_process 484
> -
> > >> -484
> > >> sha512_K 640
> -
> > >> -640
> > >> sha512_block_fn 1212
> -
> > >> -1212
> > >> MD5Transform 2552
> -
> > >> -2552
> > >>
> > >> And to start with, that's not bad. In fact, tossing LTO in before
> mbedTLS
> > >> only changes
> > >> the top-line a little:
> > >> aarch64: (for 1/1 boards) all +5120.0 bss -16.0 data -64.0 rodata
> > >> +200.0 text +5000.0
> > >> qemu_arm64 : all +5120 bss -16 data -64 rodata +200
> text
> > >> +5000
> > >> u-boot: add: 19/-18, grow: 11/-7 bytes: 14696/-7884
> (6812)
> > >>
> > >> But, is there something we can do still? mbedTLS is a more robust
> > >> solution and I'm accepting there will be growth. But still the
> > >> process/start/finish is much larger. Is there something configurable
> > >> there?
> > >>
> > >> I have investigated all those MbedTLS native functions with big-size
> > > (_process/_update/_finish).
> > > For MD5 and SHA1, we don't have turnable configs.
> > > For SHA256 and SHA512, there are a few configs:
> > > 1. Performance configs only for Armv8/a64.
> > > I didn't turn that on, which might affect the target size as well.
> > > 2. Smaller implementation with lower size (only for non-Armv8/a64) at
> the
> > > expense of losing
> > > performance.
> > > I didn't enable both, as #1 is more for performance and might
> potentially
> > > increase target size;
> > > #2 compromises the performance and only for non-Armv8/a64.
> > > Looks like that both don't help in reducing the size of qemu_arm64.
> > > But I will try #1 on qemu_arm64 and #2 on sandbox and let you know
> > > the size impact soon.
> > >
> > > The smaller footprint implementation for SHA256/512 can reduce the
> target
> > size significantly on
> > those "<hash>_process()" functions. Please see below output from
> buildman:
> > ```
> > aarch64: (for 2/2 boards) all -1468.0 bss +16.0 data -64.0 rodata
> +200.0
> > text -1620.0
> > qemu_arm64 : all +4608 bss +80 data -64 rodata +200 text
> > +4392
> > u-boot: add: 29/-17, grow: 12/-16 bytes: 13072/-8304
> (4768)
> > nanopi_a64 : all -7544 bss -48 data -64 rodata +200 text
> > -7632
> > u-boot: add: 21/-8, grow: 4/-8 bytes: 10692/-4364 (6328)
> > sandbox: (for 1/1 boards) all +19312.0 data +1440.0 rodata -4128.0
> text
> > +22000.0
> > sandbox : all +19312 data +1440 rodata -4128 text
> +22000
> > u-boot: add: 258/-206, grow: 122/-59 bytes: 90286/-76286
> > (14000)
> > ```
> > Since this is a trade-off between size and performance, I will add one
> more
> > kconfig to
> > allow the user to turn it on/off. What are your thoughts?
> >
> > On the other hand, the "Armv8/a64 only" options depend on NEON
> > instructions, so I
> > will keep them off.
>
> Yes, we should have it as a Kconfig option. Can you perhaps do a
> real-world performance test (so some not-qemu platform) where you sha256
> a big hunk of memory? That might help inform the defaults.
Below are a few tests I did on my Ubuntu PC, leveraging the benchmark tool.
I measured two key factors, throughput and the number of CPU cycles per
byte.
Each process was done with a 1KB data block.
The throughput was measured in an average data process (KiB/s) in 10
seconds.
The number of CPU cycles per byte is an average number of processing 4096
times with the same data block size.
See below result:
Original:
SHA-256 : 87972 KiB/s, 23 cycles/byte
SHA-512 : 107328 KiB/s, 20 cycles/byte
Smaller implementations:
SHA-256 : 65872 KiB/s, 36 cycles/byte
SHA-512 : 94059 KiB/s, 23 cycles/byte
Please note that this is not very precise as it depends on many other
factors,
but we still can take this as a reference.
As we can see here, the "smaller" one is not too bad.
I would put the kconfig for the "smaller" one as a default config when
MbedTLS is
selected for qemu64 and sandbox.
Other platforms can decide what they want depends on the real performance on
each platform.
Is this fine for you? Please feel free to advise.
> Also, sorry,
> I don't understand your comment about NEON instructions. Is it the kind
> of thing where there's too much core variability in terms of who has
> what? If so, it too should be an option, but not enabled by default.
>
> The Armv8/A64 accelerating config depends on some of the NEON instructions
for example, vaddq_u32() and vaddq_u64().
Please see this link for the reference:
https://developer.arm.com/documentation/dht0002/a/Introducing-NEON/Developing-for-NEON/Intrinsics
I think we are missing this in U-Boot.
I would suggest ignoring it at the moment (keep them all off).
Regards,
Raymond
More information about the U-Boot
mailing list