[U-Boot] [RFC] [PATCH] rewrite doc/README.arm-unaligned-accesses

Sat Feb 22 13:17:50 CET 2014

Hi Tom,

On Fri, 21 Feb 2014 08:38:18 -0500, Tom Rini <trini at ti.com> wrote:

> > > And as an aside, building with -Wcast-align shows some places we _may_
> > > need to investigate.
> > 
> > Thanks for the suggestion. If investigation shows it to be a good sign
> > of some alignment issue, then maybe we could add it systematically if
> > it does not adversely affect build or run performance. 
> 
> The problem is it's going to give a _lot_ of false positives.  One of
> the things we've got right now with W= thanks to Kbuild, and W=2 turns
> it on.  It's not in the default list for the kernel because there's many
> cases of "but we know this is safe after investigation" and there's
> (apparently) not an easy way to annotate things.

Drat.

> > > [snip]
> > > > +ii) In order to make sure the following is self-sufficient, it goes
> > > > +through the basics of alignment and assumes only good, not expert,
> > > > +knowledge of the C language.
> > > > +
> > > > +1. C99 alignment requirements
> > > > +
> > > > +The C99 standard [1] (henceforth: 'C99') defines alignment requirements
> > > > +as a "requirement that objects of a particular type be located on
> > > > +storage boundaries with addresses that are particular multiples of a
> > > > +byte address".
> > > > +
> > > > +In C99, unaligned accesses (those which which do not respect alignment
> > > > +requirements of the object type being accessed) are deemed 'undefined'.
> > > > +This means programs which contain unaligned accesses might build and
> > > > +execute as if there were no alignment constraints, or build and execute
> > > > +but not as expected, or build and crash at execution, or not build
> > > > +at all--or even crash the compiler.
> > > 
> > > OK.
> > > 
> > > > +2. Implementation alignment requirements
> > > > +
> > > > +While C99 does define alignment requirements, it does not lay out any
> > > > +actual alignment requirements, because these depend greatly on the
> > > > +hardware involved; they are thus defined by C99 implementations.
> > > > +
> > > > +For ARM, the C alignment requirements are laid out in the ARM EABI [2].
> > > > +For instance, is is the ARM EABI which sets the alignment constraint of
> > > > +type 'long' to four-byte boundaries.
> > > > +
> > > > +Alignment requirements for a given architecture may differ from
> > > > +hardware capabilities, i.e. they might be stricter than what the
> > > > +hardware can actually do. One example is (32-bit) x86, which can do
> > > > +unaligned accesses at the hardware level at some performance cost, but
> > > > +has stricter requirement analogous to the ARM EABI ones. ARM is a mixed
> > > > +bag: some older ARM hardware cannot perform unaligned access at all;
> > > > +some can but at a cost; some can at virtually no cost.
> > > > +
> > > > +Further, even when a given architecture is capable of emitting such
> > > > +unaligned accesses at the core or CPU level, at a higher (system) level
> > > > +they might be forbidden because the target address falls within a
> > > > +region in which only aligned accesses are possible.
> > > 
> > > The compiler must make reasonable decisions based on what it knows about
> > > a particular architecture and ABI based on what it can know.  The
> > > developer must make correct decisions based on things the compiler
> > > cannot know such as memory region constraints.
> > 
> > OK -- is this an addition suggestion?
> 
> Assuming we don't go for a more direct copy of the kernel doc, yes.

The kernel doc teaches developers how to specify wanted unaligned
accesses at source code level. I want our doc to also teach its
readers  what our policy is regarding wanted *and* *unwanted* unaligned
accesses, how it is implemented, and why; we can refer the kernel doc
for some of this content, but not for all if it.

> > > > +3. Native vs emulated unaligned accesses.
> > > > +
> > > > +There is a different between the alignment of accesses in a C program
> > > > +source code and the alignment of accesses at the core or CPU level. In
> > > > +the following, translated accesses (core or CPU accesses) will be
> > > > +qualified as /native/ accesses to distinguish them from (untranslated) C
> > > > +program accesses.
> > > > +
> > > > +A C99 implementation might translate an unaligned access into a native
> > > > +unaligned access, or it might emulate it by a combination of native
> > > > +aligned accesses. 
> > > 
> > > This isn't relevant, given the contraint on the compiler to do sane
> > > things for a given architecture.
> > 
> > "Sane things", yes; but not one sane thing only: for instance, both
> > options are sane with an architecture where native unaligned accesses
> > are possible at some cost, possibly without one option being clearly
> > better than the other" so there is a choice here.
> > 
> > So I think this is relevant here, in order to inform the reader that
> > there are options in the way unaligned accesses should be translated,
> > and that this option might have to be decided on a case-by-case, or
> > even access-by-access, basis.
> > 
> > I probably should make that explicit in the text.
> 
> Well, the thing is we inform the compiler of the architecture and
> additional constraints and it does the correct thing, that needs to be
> clear.

Yes, but again, "the correct thing" sounds like there is only one
obviously correct thing to do and the others are incorrect, which is
wrong: there are several options and which one is best (as opposed
to correct) requires a choice based on a criterion.

> [sections 4 and 5 to be merged]

> Oh my.  We've got fs/fdos/ that's (aside from automated things)
> untouched since 2003 and not normally built.  fs/fat/ is where everyone
> goes for FAT code today.

Yes, I saw the removal patch. I will use another example then. :)

> > > But here's the non-problem, demonstrated with the 'gpt write' code on an
> > > armv5te target.  Having described a struct to the compiler, and given
> > > the compiler knows architecture aligment requirements it will either go:
> > > (a) The struct is not packed, invisble padding time
> > > (b) The struct is packed, this is unaligned, do it correctly.
> > 
> > I'd say (b) is "do it as instructed" rather than "do it correctly",
> > precisely because options exist to control how unaligned accesses go.
> 
> Keep in mind -mno-unaligned-access is an ARM thing, for all other
> architectures it just does the right thing.

Well, yes, -m[no-]unalined-accesses is ARM-specific (and this patch is
for an ARM-specific document even though parts of it are more general).

But again, "the right thing" does not exist per se: in ARM, like on
PowerPC if I'm not mistaken, "the right thing" could be either to allow
native unaligned accesses or to emulate unaligned accesses through
smaller native ones, depending on what the goal is.

> > Apart from this, I agree about the packed vs non-packed behavior --
> > which of course is not specific to ARMv5te or even ARM. It is pretty
> > much a de facto standard although to be complete, (b) does not
> > conform to C99 per se.
> 
> The compiler must, for a given ABI and C standard make things correct.
> That includes invisible padding of structs for alignment requirements
> and dealing with unaligned access correctly.
> 
> > However, ARMv5TE is not able to do native unaligned accesses, and the
> > following does not seem specific to ARMv5TE so I am not sure why
> > ARMv5TE specifically matters here.
> 
> It matters here because I took this "bad" code and ran it on an ARMv5TE
> board without issue, because the code is not bad, it's correct.  And
> it's going to be correct everywhere on any architecture.

Which code do you mean? The code below, which generates then traps
unaligned accesses? If that's what you mean, then no, this code is not
valid since it is not C99 conformant - as any code which does unaligned
accesses.

> > > > +6. Unintended unaligned accesses in U-Boot
> > > 
> > > We indeed don't want to allow actually bad things to be written and
> > > work.  On processors such as ARM where we may, or may not, do a silent
> > > fixup, we choose not to do a silent fixup and instead fail.  For
> > > example, the following blows right up as we want it to today, and
> > > continues to blow up with the patch I sent where we send
> > > -mno-unaligned-access.  It _only_ works if we cleared the SCTRL.A bit
> > > which _we_do_not_want_.
> > > 
> > > diff --git a/common/cmd_misc.c b/common/cmd_misc.c
> > > index 93f9eab..fa4a146 100644
> > > --- a/common/cmd_misc.c
> > > +++ b/common/cmd_misc.c
> > > @@ -38,6 +38,18 @@ U_BOOT_CMD(
> > >  	"    - delay execution for N seconds (N is _decimal_ !!!)"
> > >  );
> > >  
> > > +static int do_unaligned(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
> > > +{
> > > +	unsigned char buf[8] = { 0 };
> > > +	return *(int*)((unsigned long)buf | 1);
> > > +}
> > > +
> > > +U_BOOT_CMD(
> > > +	unaligned,    1,    1,     do_unaligned,
> > > +	"Make some unaligned accesses happen",
> > > +	"Make some unaligned accesses happen"
> > > +);
> > > +
> > >  #ifdef CONFIG_CMD_TIMER
> > >  static int do_timer(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
> > >  {
> > 
> > Agreed so far: this code is an example of usually unpredictable
> > run-time unaligned access.
> 
> Right.  And part of my point in showing the above is that after adding
> -mno-unaligned-access to armv7 builds we're _still_ crashing like we
> want when people do things that aren't portable.

We're still crashing for that scenario indeed, and yes, unaligned
run-time pointer dereferencement is unportable; but to be fair, so is
accessing unaligned fields in packed structs. It is not in C99. 

> > > And that's where we need to stop, more or less.  The above is the type
> > > of unaligned access problem we care about because it fails, everywhere
> > > _unless_ we fix it up, and we don't want to.  All of the problems we're
> > > encountering now are either:
> > > - Self inflicted (due to the mismatch in what the compiler requires and
> > >   what we've not done)
> > > - Potentially abusing the packed attribute.
> > 
> > Indeed, and "that's where we need to stop" is where we disagree, the
> > point of disagreement being that unpredictable run-time unaligned
> > accesses would be "the type of unaligned access problem we care about
> > because it fails everywhere", if by this you meant "the *only* type".
> > 
> > I don't only care about code which fails everywhere; I care about code
> > which I can tell will fail elsewhere even though it "works for me",
> > whether "me" means ARMv7 or ARM in general, or any other architecture
> > for that matter.
> 
> Right.  But you're catching code that you can tell won't fail elsewhere
> and claiming that it will fail.  The reason you can tell that it won't
> fail is that we aren't doing pointer math or other tricky things.  We're
> doing well annotated plain old C.

I understand what you are saying, and I would agree if I shared the
assumptions it is based upon, which I don't. Here are these assumptions:

- it wrongly assumes that all architectures and systems properly handle
  accesses to unaligned struct fields when these accesses are not
  explicitly specified to be unaligned. Some simply won't. Some even
  will seem to tolerate unaligned accesses but perform them *wrong* --
  I've been personally bitten by this kind of issue. ARM is not the
  only case where Linux suggests the use of get_unaligned() and
  put_unaligned(), and that's because ARM is not the only arch where
  the compiler might not "do the right thing" all the time and we have
  to tell it what to do on a case by case basis.

- it wrongly assumes that the only type of unaligned access we care
  about is caused by bad pointer arithmetic. As I said, we don't simply
  care about what will always fail. We care about what is likely to
  fail.

- it wrongly assumes packing structs is conformant. The 'packed'
  attribute is not even mentioned in C99. C99 never says unaligned
  access should or will work. By using the 'packed' attribute, we may be
  doing plain old C, but we're not doing conformant C. As soon
  as we let a C program perform implicit unaligned accesses (as opposed
  to explicitly using put_unaligned and get_unaligned), we're in dragon
  territory.

>  Lets quote the kernel doc[1]:
> "Another point worth mentioning is the use of __attribute__((packed)) on
> a structure type. This GCC-specific attribute tells the compiler never
> to insert any padding within structures, useful when you want to use a C
> struct to represent some data that comes in a fixed arrangement 'off the
> wire'.

BTW, note: GCC-spectific attribute -- not 'plain old C'. But granted,
many compilers provide a similar option to pack structs, and compilers
always know how to translate access to unaligned fields in packed
structs, but...

> You might be inclined to believe that usage of this attribute can easily
> lead to unaligned accesses when accessing fields that do not satisfy
> architectural alignment requirements. However, again, the compiler is
> aware of the alignment constraints and will generate extra instructions
> to perform the memory access in a way that does not cause unaligned
> access. Of course, the extra instructions obviously cause a loss in
> performance compared to the non-packed case, so the packed attribute
> should only be used when avoiding structure padding is of importance."

... this only describes one way the compiler can translate such
accesses. There are others, and as the compiler knows several ways,
there is a choice. And as there is a choice, ther has to be a criterion
for choosing -- my criterion being to use the option to catch potential
issues rather than not.

> Which is the rule we need to be following, and I'm sure we can find
> cases where we aren't.

I do agree that we must follow a rule which mandates emulating
unaligned accesses. And as far as I can tell, we do, even though we
enable -munaligned-access. 

> In other words, this is code that at first glance my look like it will
> generate unaligned access but in fact it never generate unaligned
> access, it will generate potentially slow access (which is why we should
> avoid them, unless important).  Which is why I say "and that's where we
> need to stop", because the others aren't.  We only see them as unaligned
> access on ARMv7 because the compiler assumes SCTRL.A being clear and not
> needing to take alignment care here, but needing to because we set
> SCTRL.A.

Sorry, I don't really get what "the others aren't" means exactly.

> > (and I am and will be grateful to people using other architectures than
> > ARM if they decide to suffer some small cost in order to try and catch
> > an issue that might bite us even though it would not bite them.)
> > 
> > Having -munaligned-access in conjunction with SRC.A allows us to trap
> > unintended unaligned accesses, at the only expense of having to make
> > intended ones explicit rather than relying on a compiler option. That
> > is a small cost to us, and a benefit overall.
> 
> No.  It's causing us to have correct code fail and in some cases add
> unneeded overhead for all architectures.  We don't need to use
> get/put_unaligned in certain places where, regardless of architecture,
> things will work correctly.

Code which traps with -munaligned-access and SRC.A set is code which
is not correct, since the native unaligned access which was trapped
reflects an unaligned access within the C source code, which is not
allowed as per C99. If that access had been performed through get_ or
put_unaligned, then the code would be conformant and wouldn't trap.

> > > In the case of the GPT code, I'm still not convinced there's a problem
> > > that isn't due to how we're building code that validly describes the
> > > standard and how we tell the compiler to build it.  It should be
> > > __packed to match the on-disk format.  Being __packed means the compiler
> > > does the right thing about access or it's a compiler bug.
> > 
> > Being packed just tells the compiler not to pad, it does not tell the
> > compiler how to access the unaligned fields; how to access unaligned
> > fields is what -m[no-]unaligned-access is about, which means the
> > decision is in the hands of the developer who builds the code, based
> > on what this developer intends the code to do. 
> 
> No, what tells the compiler how to access the unaligned fields is
> -march/-mcpu/-mtune for all architectures as that informs gcc of lots of
> things about the architecture.

Yes, but I think -march/-mcpu/-mtune take us away from the subject, in
that they just set the default for -m[no-]unaligned-access which is the
option controlling how unaligned accesses are translated into native
accesses, and which can be set regardless of -march/-mcpu/-mtune.

BTW, the mere existence of -m[no-]unaligned-access shows i) that there
is not one, but two "right things" the compiler can do when translating
unaligned accesses into native accesses, and ii) that the default one is
not ncessarily the right one.

Again -- I do insist -- I understand your viewpoint, which is that
compilers can always be made to produce correct code from packed structs
with unaligned fields, therefore such unaligned fields are not a
problem and we should just let the compilers deal with it by passing
them adequate options, which in the case of ARM means pasing gcc
"-mno-unaligned-access" at least when it is not the default already.

And I would agree with this viewpoint if I did not want us to catch
unaligned accesses in the source code which we don't know about yet.

My viewpoint is that not all unaligned accesses are wanted, and we
should catch those that we did not and either remove them or make them
explicit. And I think that is the gist of the kernel doc -- if relying
on the compiler was enough, put/get_unaligned would not be suggested.

Now maybe another way to look at it is to compare the pros and cons of
"enabling -m[no-]unaligned-code generally" vs "not enabling it unless we
have no other choice for a give file" which is the current choice.

Pros of -mno-unaligned-access

- no case-by-case choice: all code is -mno-unaligned-access.
- no risk of a build-time unaligned access causing a data abort.
- no need to use get/put_unaligned, the compiler does the emulation.

Cons of -mno-unaligned-access

- performance hit (but unaligned accesses are infrequent).
- new and possibly unwanted unaligned accesses are not detected.

Pros of -munaligned-access

- unwanted unaligned accesses are caught at run time.

Cons of -munaligned-access

- wanted unaligned accesses must use get_unaligned/put_unaligned
  (but that's what the kernel doc asks for anyway).
- some files don't use get/put_unaligned as they should, and must thus
  be compiled on a case-by-case basis with -mno-unaligned-access.

Anyone feel free to correct/add to these cons/pros.

The way I see it, if we follow the kernel doc advice of always making
unaligned accesses explicit in the source code (and we should), then
-munaligned-access only has advantages over -mno-unaligned-access,
because both behave the same way for wanted unaligned accesses, but
-munaligned-access will catch unwanted ones whereas
-mno-unaligned-access won't.

IOW, when does -munaligned-access bother us? When we have code which
performs unaligned accesses that we have not looked at and saud we
wanted tt that's the GPT case for instance. Piotr's patch says that
we want these unaligned accesses, and then -munaligned-access does not
bother us any more.

> [1]: https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt

Thanks for you feedback

Amicalement,
-- 
Albert.