[PATCH v5 00/27] Integrate MbedTLS v3.6 LTS with U-Boot

Tom Rini trini at konsulko.com
Wed Aug 14 20:04:15 CEST 2024


On Wed, Aug 14, 2024 at 01:49:43PM -0400, Raymond Mao wrote:
> Hi Tom,
> 
> On Wed, 14 Aug 2024 at 11:07, Tom Rini <trini at konsulko.com> wrote:
> 
> > On Wed, Aug 14, 2024 at 09:42:08AM -0400, Raymond Mao wrote:
> > > Hi Tom,
> > >
> > > On Fri, 2 Aug 2024 at 11:34, Raymond Mao <raymond.mao at linaro.org> wrote:
> > >
> > > > Hi Tom,
> > > >
> > > > On Thu, 1 Aug 2024 at 16:46, Tom Rini <trini at konsulko.com> wrote:
> > > >
> > > >> On Wed, Jul 31, 2024 at 10:25:10AM -0700, Raymond Mao wrote:
> > > >> >
> > > >> > Integrate MbedTLS v3.6 LTS (currently v3.6.0) with U-Boot.
> > > >> >
> > > >> > Motivations:
> > > >> > ------------
> > > >> >
> > > >> > 1. MbedTLS is well maintained with LTS versions.
> > > >> > 2. LWIP is integrated with MbedTLS and easily to enable HTTPS.
> > > >> > 3. MbedTLS recently switched license back to GPLv2.
> > > >> >
> > > >> > Prerequisite:
> > > >> > -------------
> > > >> >
> > > >> > This patch series requires mbedtls git repo to be added as a
> > > >> > subtree to the main U-Boot repo via:
> > > >> >     $ git subtree add --prefix lib/mbedtls/external/mbedtls \
> > > >> >           https://github.com/Mbed-TLS/mbedtls.git \
> > > >> >           v3.6.0 --squash
> > > >> > Moreover, due to the Windows-style files from mbedtls git repo,
> > > >> > we need to convert the CRLF endings to LF and do a commit manually:
> > > >> >     $ git add --renormalize .
> > > >> >     $ git commit
> > > >> >
> > > >> > New Kconfig options:
> > > >> > --------------------
> > > >> >
> > > >> > `MBEDTLS_LIB` is for MbedTLS general switch.
> > > >> > `MBEDTLS_LIB_CRYPTO` is for replacing original digest and crypto
> > libs
> > > >> with
> > > >> > MbedTLS.
> > > >> > `MBEDTLS_LIB_X509` is for replacing original X509, PKCS7, MSCode,
> > ASN1,
> > > >> > and Pubkey parser with MbedTLS.
> > > >> > `LEGACY_CRYPTO` is introduced as a main switch for legacy crypto
> > > >> library.
> > > >> > `LEGACY_CRYPTO_BASIC` is for the basic crypto functionalities and
> > > >> > `LEGACY_CRYPTO_CERT` is for the certificate related functionalities.
> > > >> > For each of the algorithm, a pair of `<alg>_LEGACY` and
> > `<alg>_MBEDTLS`
> > > >> > Kconfig options are introduced. Meanwhile, `SPL_` Kconfig options
> > are
> > > >> > introduced.
> > > >> >
> > > >> > In this patch set, MBEDTLS_LIB, MBEDTLS_LIB_CRYPTO and
> > MBEDTLS_LIB_X509
> > > >> > are by default enabled in qemu_arm64_defconfig and sandbox_defconfig
> > > >> > for testing purpose.
> > > >> >
> > > >> > Patches for external MbedTLS project:
> > > >> > -------------------------------------
> > > >> >
> > > >> > Since U-Boot uses Microsoft Authentication Code to verify PE/COFFs
> > > >> > executables which is not supported by MbedTLS at the moment,
> > > >> > addtional patches for MbedTLS are created to adapt with the EFI
> > loader:
> > > >> > 1. Decoding of Microsoft Authentication Code.
> > > >> > 2. Decoding of PKCS#9 Authenticate Attributes.
> > > >> > 3. Extending MbedTLS PKCS#7 lib to support multiple signer's
> > > >> certificates.
> > > >> > 4. MbedTLS native test suites for PKCS#7 signer's info.
> > > >> >
> > > >> > All above 4 patches (tagged with `mbedtls/external`) are submitted
> > to
> > > >> > MbedTLS project and being reviewed, eventually they should be part
> > of
> > > >> > MbedTLS LTS release.
> > > >> > But before that, please merge them into U-Boot, otherwise the
> > building
> > > >> > will be broken when MBEDTLS_LIB_X509 is enabled.
> > > >> >
> > > >> > See below PR link for the reference:
> > > >> > https://github.com/Mbed-TLS/mbedtls/pull/9001
> > > >> >
> > > >> > Miscellaneous:
> > > >> > --------------
> > > >> >
> > > >> > Optimized MbedTLS library size by tailoring the config file
> > > >> > and disabling all unnecessary features for EFI loader.
> > > >> > From v2, original libs (rsa, asn1_decoder, rsa_helper, md5, sha1,
> > > >> sha256,
> > > >> > sha512) are completely replaced when MbedTLS is enabled.
> > > >> > From v3, the size-growth is slightly reduced by refactoring Hash
> > > >> functions.
> > > >> >
> > > >> > Target(QEMU arm64) size-growth when enabling MbedTLS:
> > > >> > v1: 6.03%
> > > >> > v2: 4.66%
> > > >> > From v3: 4.55%
> > > >> >
> > > >> > Please see the latest output from buildman for size-growth on QEMU
> > > >> arm64,
> > > >> > Sandbox and Nanopi A64. [1]
> > > >>
> > > >> Let us inline the growth on qemu_arm64 for a moment:
> > > >>    aarch64: (for 1/1 boards) all +6916.0 bss -32.0 data -64.0 rodata
> > > >> +200.0 text +6812.0
> > > >>             qemu_arm64     : all +6916 bss -32 data -64 rodata +200
> > text
> > > >> +6812
> > > >>                u-boot: add: 28/-17, grow: 12/-16 bytes: 15492/-8304
> > (7188)
> > > >>                  function                                   old
> >  new
> > > >>  delta
> > > >>                  mbedtls_internal_sha1_process                -
> > 4540
> > > >>  +4540
> > > >>                  mbedtls_internal_md5_process                 -
> > 2928
> > > >>  +2928
> > > >>                  mbedtls_internal_sha256_process              -
> > 2052
> > > >>  +2052
> > > >>                  mbedtls_internal_sha512_process              -
> > 1056
> > > >>  +1056
> > > >>                  K                                            -
> >  896
> > > >>   +896
> > > >>                  mbedtls_sha512_finish                        -
> >  556
> > > >>   +556
> > > >>                  mbedtls_sha256_finish                        -
> >  484
> > > >>   +484
> > > >>                  mbedtls_sha1_finish                          -
> >  420
> > > >>   +420
> > > >>                  mbedtls_sha512_starts                        -
> >  340
> > > >>   +340
> > > >>                  mbedtls_md5_finish                           -
> >  336
> > > >>   +336
> > > >>                  mbedtls_sha512_update                        -
> >  264
> > > >>   +264
> > > >>                  mbedtls_sha256_update                        -
> >  252
> > > >>   +252
> > > >>                  mbedtls_sha1_update                          -
> >  236
> > > >>   +236
> > > >>                  mbedtls_md5_update                           -
> >  236
> > > >>   +236
> > > >>                  mbedtls_sha512                               -
> >  148
> > > >>   +148
> > > >>                  mbedtls_sha256_starts                        -
> >  124
> > > >>   +124
> > > >>                  hash_init_sha512                            52
> >  128
> > > >>    +76
> > > >>                  hash_init_sha256                            52
> >  128
> > > >>    +76
> > > >>                  mbedtls_sha1_starts                          -
> > 72
> > > >>    +72
> > > >>                  mbedtls_md5_starts                           -
> > 60
> > > >>    +60
> > > >>                  hash_init_sha1                              52
> >  112
> > > >>    +60
> > > >>                  mbedtls_platform_zeroize                     -
> > 56
> > > >>    +56
> > > >>                  mbedtls_sha512_free                          -
> > 16
> > > >>    +16
> > > >>                  mbedtls_sha256_free                          -
> > 16
> > > >>    +16
> > > >>                  mbedtls_sha1_free                            -
> > 16
> > > >>    +16
> > > >>                  mbedtls_md5_free                             -
> > 16
> > > >>    +16
> > > >>                  hash_finish_sha512                          72
> > 88
> > > >>    +16
> > > >>                  hash_finish_sha256                          72
> > 88
> > > >>    +16
> > > >>                  hash_finish_sha1                            72
> > 88
> > > >>    +16
> > > >>                  sha512_csum_wd                              68
> > 80
> > > >>    +12
> > > >>                  sha256_csum_wd                              68
> > 80
> > > >>    +12
> > > >>                  sha1_csum_wd                                68
> > 80
> > > >>    +12
> > > >>                  md5_wd                                      68
> > 80
> > > >>    +12
> > > >>                  mbedtls_sha512_init                          -
> > 12
> > > >>    +12
> > > >>                  mbedtls_sha256_init                          -
> > 12
> > > >>    +12
> > > >>                  mbedtls_sha1_init                            -
> > 12
> > > >>    +12
> > > >>                  mbedtls_md5_init                             -
> > 12
> > > >>    +12
> > > >>                  memset_func                                  -
> >  8
> > > >>     +8
> > > >>                  sha512_update                                4
> >  8
> > > >>     +4
> > > >>                  sha384_update                                4
> >  8
> > > >>     +4
> > > >>                  sha256_update                               12
> >  8
> > > >>     -4
> > > >>                  sha1_update                                 12
> >  8
> > > >>     -4
> > > >>                  sha256_process                              16
> >  -
> > > >>    -16
> > > >>                  sha1_process                                16
> >  -
> > > >>    -16
> > > >>                  hash_update_sha512                          36
> > 16
> > > >>    -20
> > > >>                  hash_update_sha256                          36
> > 16
> > > >>    -20
> > > >>                  hash_update_sha1                            36
> > 16
> > > >>    -20
> > > >>                  MD5Init                                     56
> > 36
> > > >>    -20
> > > >>                  sha1_starts                                 60
> > 36
> > > >>    -24
> > > >>                  hash_update_sha384                          36
> >  -
> > > >>    -36
> > > >>                  hash_init_sha384                            52
> >  -
> > > >>    -52
> > > >>                  sha384_csum_wd                              68
> > 12
> > > >>    -56
> > > >>                  sha256_starts                              104
> > 40
> > > >>    -64
> > > >>                  sha256_padding                              64
> >  -
> > > >>    -64
> > > >>                  sha1_padding                                64
> >  -
> > > >>    -64
> > > >>                  hash_finish_sha384                          72
> >  -
> > > >>    -72
> > > >>                  sha512_finish                              152
> > 36
> > > >>   -116
> > > >>                  sha512_starts                              168
> > 40
> > > >>   -128
> > > >>                  sha384_starts                              168
> > 40
> > > >>   -128
> > > >>                  sha384_finish                              152
> >  4
> > > >>   -148
> > > >>                  MD5Final                                   196
> > 44
> > > >>   -152
> > > >>                  sha512_base_do_finalize                    160
> >  -
> > > >>   -160
> > > >>                  static.sha256_update                       228
> >  -
> > > >>   -228
> > > >>                  static.sha1_update                         240
> >  -
> > > >>   -240
> > > >>                  sha512_base_do_update                      244
> >  -
> > > >>   -244
> > > >>                  MD5Update                                  260
> >  -
> > > >>   -260
> > > >>                  sha1_finish                                300
> > 36
> > > >>   -264
> > > >>                  sha256_finish                              404
> > 36
> > > >>   -368
> > > >>                  sha256_armv8_ce_process                    428
> >  -
> > > >>   -428
> > > >>                  sha1_armv8_ce_process                      484
> >  -
> > > >>   -484
> > > >>                  sha512_K                                   640
> >  -
> > > >>   -640
> > > >>                  sha512_block_fn                           1212
> >  -
> > > >>  -1212
> > > >>                  MD5Transform                              2552
> >  -
> > > >>  -2552
> > > >>
> > > >> And to start with, that's not bad. In fact, tossing LTO in before
> > mbedTLS
> > > >> only changes
> > > >> the top-line a little:
> > > >>    aarch64: (for 1/1 boards) all +5120.0 bss -16.0 data -64.0 rodata
> > > >> +200.0 text +5000.0
> > > >>             qemu_arm64     : all +5120 bss -16 data -64 rodata +200
> > text
> > > >> +5000
> > > >>                u-boot: add: 19/-18, grow: 11/-7 bytes: 14696/-7884
> > (6812)
> > > >>
> > > >> But, is there something we can do still? mbedTLS is a more robust
> > > >> solution and I'm accepting there will be growth. But still the
> > > >> process/start/finish is much larger. Is there something configurable
> > > >> there?
> > > >>
> > > >> I have investigated all those MbedTLS native functions with big-size
> > > > (_process/_update/_finish).
> > > > For MD5 and SHA1, we don't have turnable configs.
> > > > For SHA256 and SHA512, there are a few configs:
> > > > 1. Performance configs only for Armv8/a64.
> > > >     I didn't turn that on, which might affect the target size as well.
> > > > 2. Smaller implementation with lower size (only for non-Armv8/a64) at
> > the
> > > > expense of losing
> > > >     performance.
> > > > I didn't enable both, as #1 is more for performance and might
> > potentially
> > > > increase target size;
> > > > #2 compromises the performance and only for non-Armv8/a64.
> > > > Looks like that both don't help in reducing the size of qemu_arm64.
> > > > But I will try #1 on qemu_arm64 and  #2 on sandbox and let you know
> > > > the size impact soon.
> > > >
> > > > The smaller footprint implementation for SHA256/512 can reduce the
> > target
> > > size significantly on
> > > those "<hash>_process()" functions. Please see below output from
> > buildman:
> > > ```
> > >    aarch64: (for 2/2 boards) all -1468.0 bss +16.0 data -64.0 rodata
> > +200.0
> > > text -1620.0
> > >             qemu_arm64     : all +4608 bss +80 data -64 rodata +200 text
> > > +4392
> > >                u-boot: add: 29/-17, grow: 12/-16 bytes: 13072/-8304
> > (4768)
> > >             nanopi_a64     : all -7544 bss -48 data -64 rodata +200 text
> > > -7632
> > >                u-boot: add: 21/-8, grow: 4/-8 bytes: 10692/-4364 (6328)
> > >    sandbox: (for 1/1 boards) all +19312.0 data +1440.0 rodata -4128.0
> > text
> > > +22000.0
> > >             sandbox        : all +19312 data +1440 rodata -4128 text
> > +22000
> > >                u-boot: add: 258/-206, grow: 122/-59 bytes: 90286/-76286
> > > (14000)
> > > ```
> > > Since this is a trade-off between size and performance, I will add one
> > more
> > > kconfig to
> > > allow the user to turn it on/off. What are your thoughts?
> > >
> > > On the other hand, the "Armv8/a64 only" options depend on NEON
> > > instructions, so I
> > > will keep them off.
> >
> > Yes, we should have it as a Kconfig option. Can you perhaps do a
> > real-world performance test (so some not-qemu platform) where you sha256
> > a big hunk of memory? That might help inform the defaults.
> 
> Below are a few tests I did on my Ubuntu PC, leveraging the benchmark tool.
> I measured two key factors, throughput and the number of CPU cycles per
> byte.
> Each process was done with a 1KB data block.
> The throughput was measured in an average data process (KiB/s) in 10
> seconds.
> The number of CPU cycles per byte is an average number of processing 4096
> times with the same data block size.
> 
> See below result:
> 
> Original:
> SHA-256                  :      87972 KiB/s,         23 cycles/byte
> SHA-512                  :      107328 KiB/s,       20 cycles/byte
> 
> Smaller implementations:
> SHA-256                  :      65872 KiB/s,         36 cycles/byte
> SHA-512                  :      94059 KiB/s,         23 cycles/byte
> 
> Please note that this is not very precise as it depends on many other
> factors,
> but we still can take this as a reference.
> As we can see here, the "smaller" one is not too bad.
> I would put the kconfig for the "smaller" one as a default config when
> MbedTLS is
> selected for qemu64 and sandbox.
> Other platforms can decide what they want depends on the real performance on
> each platform.
> Is this fine for you? Please feel free to advise.

Thanks. Yes, we should just default to "smaller" for everyone.

> > Also, sorry,
> > I don't understand your comment about NEON instructions. Is it the kind
> > of thing where there's too much core variability in terms of who has
> > what? If so, it too should be an option, but not enabled by default.
> >
> > The Armv8/A64 accelerating config depends on some of the NEON instructions
> for example, vaddq_u32() and vaddq_u64().
> Please see this link for the reference:
> https://developer.arm.com/documentation/dht0002/a/Introducing-NEON/Developing-for-NEON/Intrinsics
> I think we are missing this in U-Boot.
> I would suggest ignoring it at the moment (keep them all off).

Ah OK, follow-up work then.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20240814/a5689f5d/attachment.sig>


More information about the U-Boot mailing list