[U-Boot] [PATCH 050/126] x86: timer: Reduce timer code size in TPL on Intel CPUs
Bin Meng
bmeng.cn at gmail.com
Mon Oct 14 02:00:46 UTC 2019
Hi Simon,
On Sun, Oct 13, 2019 at 1:55 AM Simon Glass <sjg at chromium.org> wrote:
>
> Hi Bin,
>
> On Fri, 11 Oct 2019 at 23:18, Bin Meng <bmeng.cn at gmail.com> wrote:
> >
> > Hi Simon,
> >
> > On Sat, Oct 12, 2019 at 11:38 AM Simon Glass <sjg at chromium.org> wrote:
> > >
> > > Hi Bin,
> > >
> > > On Fri, 11 Oct 2019 at 07:19, Bin Meng <bmeng.cn at gmail.com> wrote:
> > > >
> > > > Hi Simon,
> > > >
> > > > On Fri, Oct 11, 2019 at 1:06 AM Simon Glass <sjg at chromium.org> wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > On Sat, 5 Oct 2019 at 08:36, Bin Meng <bmeng.cn at gmail.com> wrote:
> > > > > >
> > > > > > Hi Simon,
> > > > > >
> > > > > > On Wed, Sep 25, 2019 at 10:58 PM Simon Glass <sjg at chromium.org> wrote:
> > > > > > >
> > > > > > > Most of the timer-calibration methods are not needed on recent Intel CPUs
> > > > > > > and just increase code size. Add an option to use the known-good way to
> > > > > > > get the clock frequency in TPL. Size reduction is about 700 bytes.
> > > > > > >
> > > > > > > Signed-off-by: Simon Glass <sjg at chromium.org>
> > > > > > > ---
> > > > > > >
> > > > > > > drivers/timer/Kconfig | 29 +++++++++++++++++++----------
> > > > > > > drivers/timer/tsc_timer.c | 7 +++++--
> > > > > > > 2 files changed, 24 insertions(+), 12 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/timer/Kconfig b/drivers/timer/Kconfig
> > > > > > > index 5f4bc6edb67..90bc8ec7c53 100644
> > > > > > > --- a/drivers/timer/Kconfig
> > > > > > > +++ b/drivers/timer/Kconfig
> > > > > > > @@ -117,16 +117,6 @@ config RENESAS_OSTM_TIMER
> > > > > > > Enables support for the Renesas OSTM Timer driver.
> > > > > > > This timer is present on Renesas RZ/A1 R7S72100 SoCs.
> > > > > > >
> > > > > > > -config X86_TSC_TIMER_EARLY_FREQ
> > > > > > > - int "x86 TSC timer frequency in MHz when used as the early timer"
> > > > > > > - depends on X86_TSC_TIMER
> > > > > > > - default 1000
> > > > > > > - help
> > > > > > > - Sets the estimated CPU frequency in MHz when TSC is used as the
> > > > > > > - early timer and the frequency can neither be calibrated via some
> > > > > > > - hardware ways, nor got from device tree at the time when device
> > > > > > > - tree is not available yet.
> > > > > > > -
> > > > > > > config OMAP_TIMER
> > > > > > > bool "Omap timer support"
> > > > > > > depends on TIMER
> > > > > > > @@ -174,6 +164,25 @@ config X86_TSC_TIMER
> > > > > > > help
> > > > > > > Select this to enable Time-Stamp Counter (TSC) timer for x86.
> > > > > > >
> > > > > > > +config X86_TSC_TIMER_EARLY_FREQ
> > > > > > > + int "x86 TSC timer frequency in MHz when used as the early timer"
> > > > > > > + depends on X86_TSC_TIMER
> > > > > > > + default 1000
> > > > > > > + help
> > > > > > > + Sets the estimated CPU frequency in MHz when TSC is used as the
> > > > > > > + early timer and the frequency can neither be calibrated via some
> > > > > > > + hardware ways, nor got from device tree at the time when device
> > > > > > > + tree is not available yet.
> > > > > > > +
> > > > > > > +config TPL_X86_TSC_TIMER_NATIVE
> > > > > > > + bool "x86 TSC timer uses native calibration"
> > > > > > > + depends on TPL && X86_TSC_TIMER
> > > > > > > + help
> > > > > > > + Selects native timer calibration for TPL and don't include the other
> > > > > > > + methods in the code. This helps to reduce code size in TPL and works
> > > > > > > + on fairly modern Intel chips. Code-size reductions is about 700
> > > > > > > + bytes.
> > > > > > > +
> > > > > > > config MTK_TIMER
> > > > > > > bool "MediaTek timer support"
> > > > > > > depends on TIMER
> > > > > > > diff --git a/drivers/timer/tsc_timer.c b/drivers/timer/tsc_timer.c
> > > > > > > index 919caba8a14..9630036bc7f 100644
> > > > > > > --- a/drivers/timer/tsc_timer.c
> > > > > > > +++ b/drivers/timer/tsc_timer.c
> > > > > > > @@ -49,8 +49,7 @@ static unsigned long native_calibrate_tsc(void)
> > > > > > > return 0;
> > > > > > >
> > > > > > > crystal_freq = tsc_info.ecx / 1000;
> > > > > > > -
> > > > > > > - if (!crystal_freq) {
> > > > > > > + if (!CONFIG_IS_ENABLED(X86_TSC_TIMER_NATIVE) && !crystal_freq) {
> > > > > > > switch (gd->arch.x86_model) {
> > > > > > > case INTEL_FAM6_SKYLAKE_MOBILE:
> > > > > > > case INTEL_FAM6_SKYLAKE_DESKTOP:
> > > > > > > @@ -405,6 +404,10 @@ static void tsc_timer_ensure_setup(bool early)
> > > > > > > if (fast_calibrate)
> > > > > > > goto done;
> > > > > > >
> > > > > > > + /* Reduce code size by dropping other methods */
> > > > > > > + if (CONFIG_IS_ENABLED(X86_TSC_TIMER_NATIVE))
> > > > > > > + panic("no timer");
> > > > > > > +
> > > > > >
> > > > > > I don't get it. How could this reduce the code size? I don't see any
> > > > > > #ifdefs around the other methods we want to drop?
> > > > >
> > > > > The compiler sees that CONFIG_IS_ENABLED(..) is 1, and leaves out the
> > > > > code that follows it.
> > > >
> > > > Why?
> > > >
> > > > if (1)
> > > > panic("no timer");
> > > >
> > > > then compiler does not generate any codes of the following?
> > > >
> > > > fast_calibrate = cpu_mhz_from_cpuid();
> > > >
> > > > I don't understand.
> > > >
> > >
> > > The panic() function is marked as noreturn, so the compiler assume it
> > > doesn't return. You can try this if you like. It reduces the size by
> > > 700 bytes which on a 22KB image is a lot.
> >
> > OK, compiler is smart to generate less codes :)
> >
> > But the way you added the CONFIG_IS_ENABLED(..) logic check here is
> > obscure if one does not dig into that deep ..
> >
> > >
> > > > Besides, I think adding some random Kconfig options to exclude some
> > > > specific parts in one C file is a bad idea. It's unclear to me why we
> > > > should exclude one part versus another part. I'm OK to exclude the
> > > > whole C file for TPL/SPL though, but not part of it for size
> > > > limitation purpose.
> > >
> > > My understanding is the most of the code in this function is a
> > > fallback in case an earlier method doesn't work. But on modern CPUs
> >
> > Yes, correct.
> >
> > > the first method always works, so this is a waste of time?
> > >
> >
> > It's not a wast of time, but a bloat of the code size. As you said,
> > these are fallbacks, and methods are prioritized based on the age of
> > the processors, so that native method is tried first, followed by
> > cpuid, MSR, and finally PIT.
> >
> > You also mentioned that "on modern CPUs the first method always
> > works", so today the first method is native_calibrate_tsc(), but say 3
> > years later, this might not be true, and chances are that we may add
> > another method before native_calibrate_tsc() for whatever mechanism is
> > used on the latest processors, and the insertion of the TPL Kconfig
> > option (TPL_X86_TSC_TIMER_NATIVE) check today is not future proof.
>
> That's right, it is not. Perhaps we need to have separate timer
> drivers for different generations? But if not, I am loath to have 700
> bytes of dead code in TPL, which I why I added the option.
>
I asked the TPL question in another thread. I think I will need
understand why size is a problem for latest x86 processors to have
just a single U-Boot booting from reset vector to the shell.
> If we later need to adjust it, we can do so, but this cuts off the
> worst of the bloat.
Regards,
Bin
More information about the U-Boot
mailing list