[U-Boot] [PATCH] ARM: fixed relocation using proper alignment

Lokesh Vutla lokeshvutla at ti.com
Wed May 17 04:13:57 UTC 2017



On Tuesday 16 May 2017 07:59 PM, Manfred Schlaegl wrote:
> On 2017-05-11 08:53, Lokesh Vutla wrote:
>>
>>
>> On Wednesday 10 May 2017 07:11 PM, Manfred Schlaegl wrote:
>>> Using u-boot-2017.05 on i.MX6UL we ran into following problem:
>>> Initially U-Boot could be started normally.
>>> If we added one random command in configuration, the newly generated
>>> image hung at startup (last output was DRAM:  256 MiB).
>>>
>>> We tracked this down to a data abort within relocation (relocated_code).
>>>
>>> relocated_code in arch/arm/lib/relocate.S copies 8 bytes per loop
>>> iteration until the source pointer is equal to __image_copy_end.
>>> In a good case __image_copy_end was aligned to 8 bytes, so the loop
>>> stopped as suggested, but in an errornous case __image_copy_end was
>>> not aligned to 8 bytes, so the loop ran out of bounds and caused a
>>> data abort exception.
>>
>> Well I agree with this patch but I have a small query. Looking at the
>> relocation code:
>>
>> copy_loop:
>> 	ldmia	r1!, {r10-r11}		/* copy from source address [r1]    */
>> 	stmia	r0!, {r10-r11}		/* copy to   target address [r0]    */
>> 	cmp  	r1, r2			/* until source end address [r2]    */
>> 	blo		copy_loop
>>
>> IIUC, this loops exits as soon as r1 >= r2
>>
>> In your case
>> r1 is __image_copy_start
>> r0 is relocation address
>> r2 is __image_copy_end (which is 4 byte aligned)
>>
>> In the corner case you mentioned there will be extra memcpy of 4 bytes
>> from address in r1 to address in r0. I assume you are running from DDR
>> and relocation offset is calculated such that (aligned to previous 4K
>> page) it should be able to accommodate extra 4 bytes(I assume). I am
>> wondering why should this give a data abort.
>>
>> Thanks and regards,
>> Lokesh
>>
> 
> Thanks a lot for your input!
> The patch solved my problem randomly.
> 
> The loop terminates only one word beyond __image_copy_end.
> This is not the problem in my case because both, the source and destination
> pointers point to valid addresses in dram.
> So the problem must be later in relocate_code.

That's right.

> 
> I spent some time using a debugger and found that the data abort happens here
> 
> 	/*
> 	 * fix .rel.dyn relocations
> 	 */
> 	ldr	r2, =__rel_dyn_start	/* r2 <- SRC &__rel_dyn_start */
> 	ldr	r3, =__rel_dyn_end	/* r3 <- SRC &__rel_dyn_end */
> fixloop:
> 	ldmia	r2!, {r0-r1}		/* (r0,r1) <- (SRC location,fixup) */
> 	and	r1, r1, #0xff
> 	cmp	r1, #R_ARM_RELATIVE
> 	bne	fixnext
> 
> 	/* relative fix: increase location by offset */
> 	add	r0, r0, r4
> 	ldr	r1, [r0]
> 	add	r1, r1, r4
> 	str	r1, [r0]
> fixnext:
> 	cmp	r2, r3
> 	blo	fixloop
> 
> 
> Also I found out, that it's not the alignment of image_copy_end, but of rel_dyn_start
> (which was also implicitly changed by my patch).
> 
> In a good case (rel_dyn_start aligned to 8 byte), u-boot is starting up normally
> rel_dyn_start is 0x8785FC28
> rel_dyn_end is 0x87857BD0
> A dump of this memory area shows no abnormality
> 
> In a bad case (same source, but rel_dyn_start aligned to 4 byte), the data abort happens
> rel_dyn_start is 0x8785FC24
> rel_dyn_end is 0x87857BCC
> So we have the same size of 32856 bytes but a memory dump showed exactly one difference, which is
> very interesting:
> 
> At offset 0x610 (relative to rel_dyn_start) we have following difference
> -00000610  30 3e 80 87 17 00 00 00  34 3e 80 87 00 00 00 00  |0>......4>......|
> +00000610  30 3e 80 87 17 00 00 00  00 00 00 00 17 00 00 00  |0>..............|

Looks like someone is corrupting the data(assuming). Is it all 0's just
at this location or continuously after this?

 If possible can you try the following experiments with the failing code:

- Dump rel_dyn_* data right after spl copies U-boot to ddr.
- Similarly dump rel_dyn_* data before calling relocate_code() and
compare both?

If the comparison fails, add a write breakpoint at the corrupted address
before starting u-boot and see when it hits.

BTW, where is your BSS located?

Thanks and regards,
Lokesh

> 
> Here is a pseudo-trace of fixloop in bad case starting at offset 0x618 (relative to rel_dyn_start)
> 
> fixloop:				// R2 = 0x878581E4, R4 = 0x08786000
> 	ldmia	r2!, {r0-r1}		// R0 = 0x00000000, R1 = 0x00000017, R2 = 0x878581EC
> 	and	r1, r1, #0xff
> 	cmp	r1, #R_ARM_RELATIVE
> 	bne	fixnext			// R1 == 0x17 -> no branch
> 	
> 	/* relative fix: increase location by offset */
> 	add	r0, r0, r4		// R0 = 0x08786000
> 	ldr	r1, [r0]		// ABORT while accessing 0x08786000 -> not a valid address in dram
> 
> 
> It seems, that the entry 00 00 00 00 17 00 00 00 @ offset 0x618 is some how incorrect, or at least
> produces some incorrect behavior. There are no similar entries in the table.
> 
> I'm not really experienced with these tables.
> Has anyone suggestions, starting points, ideas/hints which could help me to track this down?
> 
> Thanks a lot!
> 
> Best regards,
> Manfred
> 
> 
> 
> 
> 


More information about the U-Boot mailing list