[U-Boot] [PATCH 2/2] mmc: tegra: invalidate complete cachelines

Simon Glass sjg at chromium.org
Fri Apr 27 07:50:59 CEST 2012


Hi Mike,

On Fri, Apr 27, 2012 at 5:29 PM, Mike Frysinger <vapier at gentoo.org> wrote:

> On Thursday 26 April 2012 06:34:43 Thierry Reding wrote:
> > For reference, see sd_change_freq() in drivers/mmc/mmc.c.
>
> yes, that shows what we're talking about.
> int sd_change_freq(struct mmc *mmc)
> {
>        ... stack vars ...
>        int err;
>        ALLOC_CACHE_ALIGN_BUFFER(uint, scr, 2);
>        struct mmc_data data;
>        int timeout;
>        ... stack vars ...
> ...
>        data.dest = (char *)scr;
>        data.blocksize = 8;
>        data.blocks = 1;
>        data.flags = MMC_DATA_READ;
>        err = mmc_send_cmd(mmc, &cmd, &data);
> ...
>
> static int mmc_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
>             struct mmc_data *data)
> {
> ...
>         if (data->flags & MMC_DATA_READ) {
>                 if ((uintptr_t)data->dest & (ARCH_DMA_MINALIGN - 1))
>                        printf("Warning: unaligned read from %p "
>                                "may fail\n", data->dest);
>                 invalidate_dcache_range((ulong)data->dest,
>                        (ulong)data->dest +
>                         data->blocks * data->blocksize);
>        }
> ...
>
> so what invalidate_dcache_range() will invalidate is from &scr and 8 bytes
> after that.  the higher layers make sure &scr is aligned properly, but not
> that it spans a full cache line.  so if the stack layout is:
>                [random stuff]
> 0x100   [scr[0]]
> 0x104   [scr[1]]
> 0x108   [err]
> 0x10a   [timeout]
>                [more stuff]
>
> this implicitly invalidates [err] and [timeout] and more stuff.  by you
> forcibly rounding up the length, you no longer get a warning, but you still
> (silently) blew away those variables from cache possibly losing changes
> that
> weren't written back to external memory yet.
>
> > > I worry that what you have done will just introduce obscure bugs, since
> > > we will potentially invalidate stack variants (for example) and lose
> > > their values.
> > >
> > > With the case problems, we are trying to fix them at source (i.e. at
> the
> > > higher level).
> >
> > I understand. The question then becomes how to best fix the passed in
> size.
> > Always passing the size of a complete cacheline in the SEND_SCR command
> > doesn't seem like a better option because it may have other implications
> on
> > other hardware.
>
> i proposed this in a previous thread, but i think Simon missed it.
>
> perhaps the warning in the core code could be dropped and all your changes
> in
> fringe code obsoleted (such as these USB patches): when it detects that an
> address is starting on an unaligned boundary, flush that line first, and
> then
> let it be invalidated.  accordingly, when the end length is on an unaligned
> boundary, do the same flush-then-invalidate step.  this should also make
> things
> work without a (significant) loss in performance.  if anything, i suspect
> the
> overhead of doing runtime buffer size calculations and manually aligning
> pointers (which is what ALLOC_CACHE_ALIGN_BUFFER does) is a wash compared
> to
> partially flushing cache lines in the core ...
>
> Simon: what do you think of this last idea ?
>

I think I proved to myself a while back that it can't be done. Even if you
flush you can't know that more activity will not occur on that cache line
when you access the stack a microsecond later. Then much later when the DMA
updates the RAM address with the data, you will not see it, because you
have already pulled in that cache line. We are relying on the invalidate to
'hold' until we are ready to read the data.

How about just using an array of one element? So instead of:

struct fred fred;

do_something_with(&fred);

use:

ALLOC_CACHE_ALIGN_BUFFER(struct fred, scr, 1)

do_something_with(fred);

Regards,
Simon


> -mike
>


More information about the U-Boot mailing list