[PATCH v5 11/16] crypto/fsl: Fix kick_trng

ZHIZHIKIN Andrey andrey.zhizhikin at leica-geosystems.com
Mon Nov 22 20:45:15 CET 2021


Hello Gaurav,

> -----Original Message-----
> From: U-Boot <u-boot-bounces at lists.denx.de> On Behalf Of Gaurav Jain
> Sent: Monday, November 15, 2021 8:00 AM
> To: u-boot at lists.denx.de
> Cc: Stefano Babic <sbabic at denx.de>; Fabio Estevam <festevam at gmail.com>; Peng Fan
> <peng.fan at nxp.com>; Simon Glass <sjg at chromium.org>; Priyanka Jain
> <priyanka.jain at nxp.com>; Ye Li <ye.li at nxp.com>; Horia Geanta
> <horia.geanta at nxp.com>; Ji Luo <ji.luo at nxp.com>; Franck Lenormand
> <franck.lenormand at nxp.com>; Silvano Di Ninno <silvano.dininno at nxp.com>; Sahil
> malhotra <sahil.malhotra at nxp.com>; Pankaj Gupta <pankaj.gupta at nxp.com>; Varun
> Sethi <V.Sethi at nxp.com>; NXP i . MX U-Boot Team <uboot-imx at nxp.com>; Shengzhou
> Liu <Shengzhou.Liu at nxp.com>; Mingkai Hu <mingkai.hu at nxp.com>; Rajesh Bhagat
> <rajesh.bhagat at nxp.com>; Meenakshi Aggarwal <meenakshi.aggarwal at nxp.com>; Wasim
> Khan <wasim.khan at nxp.com>; Alison Wang <alison.wang at nxp.com>; Pramod Kumar
> <pramod.kumar_1 at nxp.com>; Tang Yuantian <andy.tang at nxp.com>; Adrian Alonso
> <adrian.alonso at nxp.com>; Vladimir Oltean <olteanv at gmail.com>
> Subject: [PATCH v5 11/16] crypto/fsl: Fix kick_trng
> 
> 
> From: Ye Li <ye.li at nxp.com>
> 
> fix hwrng performance issue in kernel.

This patch is missing some context information, specifically which performance
issue does exist in the Kernel (with some quantification), and how is it addressed
here.

This function introduced with this patch already exist in the Kernel [1], and the
implementation does differ from Kernel one. Specifically, this patch lowers the
number of test samples that are run to decide whether the entropy generated by
TRNG is sufficiently random: it reduces the monobit count range, poker test limits,
and number or runs for consecutive 0's and 1's.

Considering the fact that after TRNG is initialized - JDKEK, TDKEK and TDSK are
preloaded from the RNG and are locked until the next PoR, Kernel will not
re-initialize the TRNG (in fact, there is a check that is done in the Kernel not to
touch RNG if it is already initialized [2]), and this would leave the Crypto facilities
running in the Kernel to use entropy model that is defined here. In this case, at
least a justification of this change should be made clear - e.g. significant speed
improvement over reduced entropy (with quantifiable numbers).

In addition, with those new parameter set, would the RNG pass FIPS 140-2 test?

> 
> Signed-off-by: Ye Li <ye.li at nxp.com>
> Acked-by: Gaurav Jain <gaurav.jain at nxp.com>>
> ---
>  drivers/crypto/fsl/jr.c | 109 ++++++++++++++++++++++++++++++++++------
>  include/fsl_sec.h       |   1 +
>  2 files changed, 94 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/crypto/fsl/jr.c b/drivers/crypto/fsl/jr.c
> index 9b751aca9b..ef136988b6 100644
> --- a/drivers/crypto/fsl/jr.c
> +++ b/drivers/crypto/fsl/jr.c
> @@ -602,30 +602,107 @@ static u8 get_rng_vid(ccsr_sec_t *sec)
>   */
>  static void kick_trng(int ent_delay, ccsr_sec_t *sec)
>  {
> +       u32 samples  = 512; /* number of bits to generate and test */
> +       u32 mono_min = 195;
> +       u32 mono_max = 317;
> +       u32 mono_range  = mono_max - mono_min;
> +       u32 poker_min = 1031;
> +       u32 poker_max = 1600;
> +       u32 poker_range = poker_max - poker_min + 1;
> +       u32 retries    = 2;
> +       u32 lrun_max   = 32;
> +       s32 run_1_min   = 27;
> +       s32 run_1_max   = 107;
> +       s32 run_1_range = run_1_max - run_1_min;
> +       s32 run_2_min   = 7;
> +       s32 run_2_max   = 62;
> +       s32 run_2_range = run_2_max - run_2_min;
> +       s32 run_3_min   = 0;
> +       s32 run_3_max   = 39;
> +       s32 run_3_range = run_3_max - run_3_min;
> +       s32 run_4_min   = -1;
> +       s32 run_4_max   = 26;
> +       s32 run_4_range = run_4_max - run_4_min;
> +       s32 run_5_min   = -1;
> +       s32 run_5_max   = 18;
> +       s32 run_5_range = run_5_max - run_5_min;
> +       s32 run_6_min   = -1;
> +       s32 run_6_max   = 17;
> +       s32 run_6_range = run_6_max - run_6_min;
> +       u32 val;

Why does those values are lowered with respect to what is provided by
default? A bit more explanation on why those primes are chosen here
would be good to have, together with documenting default values (so
people can compare).

> +
>         struct rng4tst __iomem *rng =
>                         (struct rng4tst __iomem *)&sec->rng;
> -       u32 val;
> 
> -       /* put RNG4 into program mode */
> -       sec_setbits32(&rng->rtmctl, RTMCTL_PRGM);
> -       /* rtsdctl bits 0-15 contain "Entropy Delay, which defines the
> -        * length (in system clocks) of each Entropy sample taken
> -        * */
> +       /* Put RNG in program mode */
> +       /* Setting both RTMCTL:PRGM and RTMCTL:TRNG_ACC causes TRNG to
> +        * properly invalidate the entropy in the entropy register and
> +        * force re-generation.
> +        */
> +       sec_setbits32(&rng->rtmctl, RTMCTL_PRGM | RTMCTL_ACC);
> +
> +       /* Configure the RNG Entropy Delay
> +        * Performance-wise, it does not make sense to
> +        * set the delay to a value that is lower
> +        * than the last one that worked (i.e. the state handles
> +        * were instantiated properly. Thus, instead of wasting
> +        * time trying to set the values controlling the sample
> +        * frequency, the function simply returns.
> +        */
>         val = sec_in32(&rng->rtsdctl);
> -       val = (val & ~RTSDCTL_ENT_DLY_MASK) |
> -             (ent_delay << RTSDCTL_ENT_DLY_SHIFT);
> +       val &= RTSDCTL_ENT_DLY_MASK;
> +       val >>= RTSDCTL_ENT_DLY_SHIFT;
> +       if (ent_delay < val) {
> +               /* Put RNG4 into run mode */
> +               sec_clrbits32(&rng->rtmctl, RTMCTL_PRGM | RTMCTL_ACC);
> +               return;
> +       }
> +
> +       val = (ent_delay << RTSDCTL_ENT_DLY_SHIFT) | samples;
>         sec_out32(&rng->rtsdctl, val);
> -       /* min. freq. count, equal to 1/4 of the entropy sample length */
> -       sec_out32(&rng->rtfreqmin, ent_delay >> 2);
> -       /* disable maximum frequency count */
> -       sec_out32(&rng->rtfreqmax, RTFRQMAX_DISABLE);
> +
>         /*
> -        * select raw sampling in both entropy shifter
> +        * Recommended margins (min,max) for freq. count:
> +        *   freq_mul = RO_freq / TRNG_clk_freq
> +        *   rtfrqmin = (ent_delay x freq_mul) >> 1;
> +        *   rtfrqmax = (ent_delay x freq_mul) << 3;
> +        * Given current deployments of CAAM in i.MX SoCs, and to simplify
> +        * the configuration, we consider [1,16] to be a safe interval
> +        * for the freq_mul and the limits of the interval are used to compute
> +        * rtfrqmin, rtfrqmax
> +        */
> +       sec_out32(&rng->rtfreqmin, ent_delay >> 1);
> +       sec_out32(&rng->rtfreqmax, ent_delay << 7);
> +
> +       sec_out32(&rng->rtscmisc, (retries << 16) | lrun_max);
> +       sec_out32(&rng->rtpkrmax, poker_max);
> +       sec_out32(&rng->rtpkrrng, poker_range);
> +       sec_out32(&rng->rsvd1[0], (mono_range << 16) | mono_max);
> +       sec_out32(&rng->rsvd1[1], (run_1_range << 16) | run_1_max);
> +       sec_out32(&rng->rsvd1[2], (run_2_range << 16) | run_2_max);
> +       sec_out32(&rng->rsvd1[3], (run_3_range << 16) | run_3_max);
> +       sec_out32(&rng->rsvd1[4], (run_4_range << 16) | run_4_max);
> +       sec_out32(&rng->rsvd1[5], (run_5_range << 16) | run_5_max);
> +       sec_out32(&rng->rsvd1[6], (run_6_range << 16) | run_6_max);
> +
> +       val = sec_in32(&rng->rtmctl);
> +       /*
> +        * Select raw sampling in both entropy shifter
>          * and statistical checker
>          */
> -       sec_setbits32(&rng->rtmctl, RTMCTL_SAMP_MODE_RAW_ES_SC);
> -       /* put RNG4 into run mode */
> -       sec_clrbits32(&rng->rtmctl, RTMCTL_PRGM);
> +       val &= ~RTMCTL_SAMP_MODE_INVALID;
> +       val |= RTMCTL_SAMP_MODE_RAW_ES_SC;
> +       /* Put RNG4 into run mode */
> +       val &= ~(RTMCTL_PRGM | RTMCTL_ACC);
> +       /*test with sample mode only */
> +       sec_out32(&rng->rtmctl, val);
> +
> +       /* Clear the ERR bit in RTMCTL if set. The TRNG error can occur when the
> +        * RNG clock is not within 1/2x to 8x the system clock.
> +        * This error is possible if ROM code does not initialize the system PLLs
> +        * immediately after PoR.
> +        */
> +       /* setbits_le32(CAAM_RTMCTL, RTMCTL_ERR); */

Unused code?

>  }
> 
>  static int rng_init(uint8_t sec_idx, ccsr_sec_t *sec)
> diff --git a/include/fsl_sec.h b/include/fsl_sec.h
> index 7b6e3e2c20..2b3239414a 100644
> --- a/include/fsl_sec.h
> +++ b/include/fsl_sec.h
> @@ -34,6 +34,7 @@
>  #if CONFIG_SYS_FSL_SEC_COMPAT >= 4
>  /* RNG4 TRNG test registers */
>  struct rng4tst {
> +#define RTMCTL_ACC  0x20
>  #define RTMCTL_PRGM 0x00010000 /* 1 -> program mode, 0 -> run mode */
>  #define RTMCTL_SAMP_MODE_VON_NEUMANN_ES_SC     0 /* use von Neumann data in
>                                                     both entropy shifter and
> --
> 2.17.1

-- andrey

Link: [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/crypto/caam/ctrl.c?#n348
Link: [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/crypto/caam/ctrl.c?#n287


More information about the U-Boot mailing list