[U-Boot] tegra124 jetson-tk1 ethernet problems.

Joe Hershberger joe.hershberger at gmail.com
Thu Aug 4 23:22:16 CEST 2016


Hi Stephen,

On Tue, Aug 2, 2016 at 11:56 AM, Stephen Warren <swarren at wwwdotorg.org> wrote:
> On 08/01/2016 09:20 PM, Peter Chubb wrote:
>>
>>
>> Hi Folks,
>>    Since patch 96350f729c42 "dm: tegra: net: Convert tegra boards to
>>    driver model for Ethernet" booting via dhcp has been broken on the
>>    Jetson TK1.
>
>
> Best to Cc the net and DM maintainers (Joe Hershberger and Simon Glass) on
> this to make sure they see the issue. I've done so here.
>
>>    I tried applying "net: Probe PCI before looking for ethernet
>>    devices"; this `works' in that the ethernet device is detected and
>>    works,  but I end up with huge numbers of
>>       CACHE: Misaligned operation at range [fffb8c00, fffb8c2e]
>>    messages on the serial console.
>
>>
>>
>>    These come from the flush_cache() calls in net/rtl8169.c.  I
>>    suggest the attached patch (or something like it):
>
>
> Interesting; in my automated testing system, I do see these cache error
> messages, yet ping and TFTP work for me. Admittedly my test setup uses a
> static IP configuration rather than DHCP.
>
>> diff --git a/drivers/net/rtl8169.c b/drivers/net/rtl8169.c
>
>
>>  static void rtl_flush_rx_desc(struct RxDesc *desc)
>>  {
>>  #ifndef CONFIG_SYS_NONCACHED_MEMORY
>> -       flush_cache((unsigned long)desc, sizeof(*desc));
>> +       unsigned long start = (unsigned long)desc & ~(ARCH_DMA_MINALIGN -
>> 1);
>> +       unsigned long size = ALIGN(sizeof(*desc), ARCH_DMA_MINALIGN);
>> +
>> +       flush_cache(start, size);
>>  #endif
>>  }
>>
>> @@ -493,21 +496,28 @@ static void rtl_inval_tx_desc(struct TxDesc *desc)
>>  static void rtl_flush_tx_desc(struct TxDesc *desc)
>>  {
>>  #ifndef CONFIG_SYS_NONCACHED_MEMORY
>> -       flush_cache((unsigned long)desc, sizeof(*desc));
>> +       unsigned long start = (unsigned long)desc & ~(ARCH_DMA_MINALIGN -
>> 1);
>> +       unsigned long sz = ALIGN(sizeof *desc, ARCH_DMA_MINALIGN);
>> +
>> +       flush_cache(start, sz);
>>  #endif
>>  }
>
>
> Those two are wrong. Hopefully neither of those changes do anything on
> Jetson TK1, since CONFIG_SYS_NONCACHED_MEMORY is enabled there.
>
> The cache line is likely larger than the individual descriptor size, so
> rounding up the flush length to a whole cache line will likely end up
> flushing more descriptors than you want. This will eventually result in SW
> over-writing a HW update to another descriptor, and so at least lose
> packets. For this reason, CONFIG_SYS_NONCACHED_MEMORY must be set if the
> descriptor and cache line sizes don't match, except for systems where IO is
> fully cache-coherent and hence the cache operations are no-ops.

I agree... it this should (and does) require
CONFIG_SYS_NONCACHED_MEMORY in that case.  This is in the top of this
driver:

/*
 * Warn if the cache-line size is larger than the descriptor size. In such
 * cases the driver will likely fail because the CPU needs to flush the cache
 * when requeuing RX buffers, therefore descriptors written by the hardware
 * may be discarded.
 *
 * This can be fixed by defining CONFIG_SYS_NONCACHED_MEMORY which will cause
 * the driver to allocate descriptors from a pool of non-cached memory.
 */
#if RTL8169_DESC_SIZE < ARCH_DMA_MINALIGN
#if !defined(CONFIG_SYS_NONCACHED_MEMORY) && \
        !defined(CONFIG_SYS_DCACHE_OFF) && !defined(CONFIG_X86)
#warning cache-line size is larger than descriptor size
#endif
#endif

>>  static void rtl_inval_buffer(void *buf, size_t size)
>>  {
>> -       unsigned long start = (unsigned long)buf & ~(ARCH_DMA_MINALIGN -
>> 1);
>> -       unsigned long end = ALIGN(start + size, ARCH_DMA_MINALIGN);
>> +       unsigned long end = ALIGN((unsigned long)buf + size,
>> ARCH_DMA_MINALIGN);
>>
>> -       invalidate_dcache_range(start, end);
>> +        /* buf is aligned to RTL8169_ALIGN,
>> +         * which is a multiple of ARCH_DMA_ALIGN
>> +         */
>> +       invalidate_dcache_range((unsigned long)buf, end);
>>  }
>>
>>  static void rtl_flush_buffer(void *buf, size_t size)
>>  {
>> -       flush_cache((unsigned long)buf, size);
>> +       unsigned long sz = ALIGN(size, ARCH_DMA_MINALIGN);
>> +
>> +       flush_cache((unsigned long)buf, sz);
>>  }
>
>
> I believe the correct approach is for the caller (network core code) to
> provide cache-aligned buffers, rather than for each driver to align the
> start/size when performing cache operations. Again, this is to ensure that
> cache operations don't affect any other data adjacent to the buffer. Can you
> see why the network core code isn't using cache-aligned buffers when DM_ETH
> is enabled? Perhaps DM_ETH isn't the trigger, but just changed something
> about the memory layout that exposed some other pre-existing issue.

This should already be the case.  The transmit buffer is setup like this:

         net_tx_packet = &net_pkt_buf[0] + (PKTALIGN - 1);
         net_tx_packet -= (ulong)net_tx_packet % PKTALIGN;

in net/net.c:net_init().

The recv buffers are defined by the drivers and passed back to the
core network code. In this case, the misalignment is caused by the
rtl8169 driver...

rtl8169_init_ring() does this:

        tpc->RxBufferRing[i] = &rxb[i * RX_BUF_SIZE];

...but the size breaks alignment for all but the first entry:

       #define RX_BUF_SIZE     1536    /* Rx Buffer size */

This should be fixed by defining this instead:

       #define RX_BUF_SIZE     ALIGN(1536, RTL8169_ALIGN)    /* Rx
Buffer size */

Also, there is an extra buffer that is memcpy'ed to, and then that is
passed back to the core net code instead of the actual buffer that was
recv'd into; I don't know why:

       static unsigned char rxdata[RX_BUF_LEN];

Cheers,
-Joe


More information about the U-Boot mailing list