Xilinx ZynqMP SPL boot: psu_init_gpl.c code corrupts U-Boot memory

Robert Hancock robert.hancock at calian.com
Sat Jan 30 01:39:57 CET 2021


On Thu, 2021-01-21 at 16:40 +0000, Robert Hancock wrote:
> On 1/20/21 9:25 PM, Robert Hancock wrote:
> > > I've been trying to get the U-Boot SPL to work on a Xilinx ZCU102
> > > development board. I have been testing with U-Boot 2021.01, and using
> > > the generated psu_init_gpl HW initialization code generated by Vivado
> > > 2020.2, rather than the versions in the U-Boot tree, as we are
> > > expecting to make changes to the configuration as we move to our own
> > > board design.
> > > 
> > > The issue I was seeing is that the SPL seemingly locking up or crashing
> > > early in the boot process, just after the SPL banner was printed. I've
> > > CCed "Major A" as he reported something similar on the mailing list in
> > > March ('ZynqMP boot: no messages from SPL other than "Debug uart
> > > enabled"') but I didn't see a resoution posted.
> > > 
> > > After groveling around with the Xilinx JTAG debugger for a day or so, I
> > > figured out that the problem was that the SPL ended up reading from an
> > > invalid pointer address somewhere during the driver model
> > > initialization process, and crashing. After using memory watchpoints to
> > > identify where the bogus value was being written to the pointer storage
> > > location, the memory write was actually coming from the Vivado 2020.2-
> > > generated psu_init_gpl code. The offending code is in a function called
> > > serdes_illcalib_pcie_gen1 and looks like this:
> > > 
> > >           if (gen2_calib != 1) 
> > >           {
> > >             ill1_val[loop] = ((0x04 + meancount[loop]*8) % 0x100);
> > >             ill12_val[loop] = ((0x04 + meancount[loop]*8) >= 0x100) ?
> > > 0x10 : 0x00;
> > >             Xil_Out32(0xFFFE0000+loop*4,iterresult[loop]);
> > >             Xil_Out32(0xFFFE0010+loop*4,iterresult[loop+4]);
> > >             Xil_Out32(0xFFFE0020+loop*4,bistpasscount[loop]);
> > >             Xil_Out32(0xFFFE0030+loop*4,meancount[loop]);
> > >           }
> > >           if (gen2_calib == 1) 
> > >           {
> > >             ill1_val[loop] = ((0x104 + meancount[loop]*8) % 0x100);
> > >             ill12_val[loop] = ((0x104 + meancount[loop]*8) >= 0x200) ?
> > > 0x02 : 0x01;
> > >             Xil_Out32(0xFFFE0040+loop*4,iterresult[loop]);
> > >             Xil_Out32(0xFFFE0050+loop*4,iterresult[loop+4]);
> > >             Xil_Out32(0xFFFE0060+loop*4,bistpasscount[loop]);
> > >             Xil_Out32(0xFFFE0070+loop*4,meancount[loop]);
> > >           }
> > > 
> > > Those writes to 0xFFFExxxx are to on-chip memory addresses, not any
> > > hardware register, so I have no idea what this is trying to achieve.
> > > Possibly this is some debug code to store some calibration results that
> > > was left in by mistake? Those addresses may not be used in the Xilinx
> > > FSBL, but overwriting them in the U-Boot SPL wreaks havoc.
> > > 
> > > For now, I've worked around it by hacking the Xil_Out32 implementation
> > > provided by U-Boot in board/xilinx/zynqmp/xil_io.h to trap and ignore
> > > writes to memory addresses in the OCRAM range (0xFFFC0000 to
> > > 0xFFFFFFFF). With that in place, things seem to be working. Probably
> > > this should be addressed in the Vivado psu_init code generator, but we
> > > may need some workaround like this to avoid those using the affected
> > > Xilinx tool versions from hitting this issue?
> > > 
> > 
> > I can't see any code like this in u-boot code. I can't see context of
> > this code that's why simply don't know what it does and why.
> 
> I can email the psu_init_gpl code I have privately if you need it - the code
> is
> almost 1MB so too large for the list or the usual Pastebin etc. it seems.
> However, you can see similar code in this file from ZCU102 in the Xilinx
> embeddedsw repo:
> 
> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXilinx%2Fembeddedsw%2Fblob%2Fmaster%2Flib%2Fsw_apps%2Fzynqmp_fsbl%2Fmisc%2Fzcu102%2Fpsu_init_gpl.c&data=04%7C01%7Crobert.hancock%40calian.com%7C55d03544d14741e8474808d8be4489a7%7C23b57807562f49ad92c43bb0f07a1fdf%7C0%7C0%7C637468548882351310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nEiVgL1d%2F%2B4r9iqV8NYZ3JUVQPwCOPRFxEtU69FjH3Q%3D&reserved=0
> 
> That has writes to 0xFFFExxxx addresses in the serdes_illcalib_pcie_gen1
> function, like mine has, as well as in its serdes_illcalib which don't seem
> to
> be in the version I have. Looking at the git history, it appears those writes
> were added in a commit that updated the file for the 2020.2 tools release, so
> that may have been when this was introduced.
> 
> I don't see any such writes in the psu_init_gpl files in the U-Boot repo, I
> am
> guessing because those were all generated using older tool versions that
> didn't
> have this behavior.
> 
> Interestingly, looking at the 2020.2 Petalinux BSP for ZCU102 in its provided
> xilinx-zcu102-2020.2/project-spec/hw-description/psu_init_gpl.c file, those
> memory writes are there but have been commented out. Not sure if it somehow
> generated that way or if someone realized that issue and did that manually?
> 

From some more investigations on this, it appears that whether these writes end
up in the code or not may depend on exactly what state the Vivado project is in
when the XSA file is exported. If you just set up the block design and then
export the hardware at that point without bitstream included, these "debug"
writes are included. If the logic has been fully built up to bitstream
generation and then you export XSA with bitstream included, then the generated
code has the "debug" writes commented out.

That seems a bit unintuitive - I haven't tried to narrow down exactly what
stage of the build process causes that change to occur - but it's an
explanation for the difference at least. Hopefully if anyone else runs into a
similar issue they'll see this post and have some idea what has happened..

> 
> > And also not sure what you want me to do with this. Please suggest.
> 
> The main purpose was to let the community know of this issue as others
> working
> with ZynqMP hardware may run into this with the current Xilinx tools.
> 
> I have this patch locally to work around this issue right now. We may want
> something like this in U-Boot as even if the Xilinx tools are eventually
> updated to not do these writes, people using affected versions may still run
> into issues. Let me know if something like this seems reasonable and I can
> submit as a proper patch:
> 
> diff --git a/board/xilinx/zynqmp/xil_io.h b/board/xilinx/zynqmp/xil_io.h
> index e6caa7c850..6c7c8829e9 100644
> --- a/board/xilinx/zynqmp/xil_io.h
> +++ b/board/xilinx/zynqmp/xil_io.h
> @@ -12,6 +12,17 @@
>  
>  static void Xil_Out32(unsigned long addr, unsigned long val)
>  {
> +	/*
> +	 * Vivado 2020.2 (at least) generates psu_init_gpl code that attempts
> +	 * to write to OCM memory addresses. This may be some leftover debug
> +	 * code meant for use with the Xilinx FSBL, but these memory addresses
> +	 * are used by U-Boot SPL and overwriting them will corrupt our memory.
> +	 * Ignore attempts to write to these addresses.
> +	 */
> +	if (addr >= 0xfffc0000 && addr <= 0xffffffff) {
> +		printf("Ignoring psu_init_gpl write to 0x%08lx\n", addr);
> +		return;
> +	}
>  	writel(val, addr);
>  }
>  
> 
> > Thanks,
> > Michal
-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com


More information about the U-Boot mailing list