[U-Boot] tegra124 jetson-tk1 ethernet problems.

Thu Aug 4 23:34:46 CEST 2016

On Thu, Aug 4, 2016 at 4:22 PM, Joe Hershberger
<joe.hershberger at gmail.com> wrote:
> Hi Stephen,
>
> On Tue, Aug 2, 2016 at 11:56 AM, Stephen Warren <swarren at wwwdotorg.org> wrote:
>> On 08/01/2016 09:20 PM, Peter Chubb wrote:
>>>
>>>
>>> Hi Folks,
>>>    Since patch 96350f729c42 "dm: tegra: net: Convert tegra boards to
>>>    driver model for Ethernet" booting via dhcp has been broken on the
>>>    Jetson TK1.
>>
>>
>> Best to Cc the net and DM maintainers (Joe Hershberger and Simon Glass) on
>> this to make sure they see the issue. I've done so here.
>>
>>>    I tried applying "net: Probe PCI before looking for ethernet
>>>    devices"; this `works' in that the ethernet device is detected and
>>>    works,  but I end up with huge numbers of
>>>       CACHE: Misaligned operation at range [fffb8c00, fffb8c2e]
>>>    messages on the serial console.
>>
>>>
>>>
>>>    These come from the flush_cache() calls in net/rtl8169.c.  I
>>>    suggest the attached patch (or something like it):
>>
>>
>> Interesting; in my automated testing system, I do see these cache error
>> messages, yet ping and TFTP work for me. Admittedly my test setup uses a
>> static IP configuration rather than DHCP.
>>
>>> diff --git a/drivers/net/rtl8169.c b/drivers/net/rtl8169.c
>>
>>
>>>  static void rtl_flush_rx_desc(struct RxDesc *desc)
>>>  {
>>>  #ifndef CONFIG_SYS_NONCACHED_MEMORY
>>> -       flush_cache((unsigned long)desc, sizeof(*desc));
>>> +       unsigned long start = (unsigned long)desc & ~(ARCH_DMA_MINALIGN -
>>> 1);
>>> +       unsigned long size = ALIGN(sizeof(*desc), ARCH_DMA_MINALIGN);
>>> +
>>> +       flush_cache(start, size);
>>>  #endif
>>>  }
>>>
>>> @@ -493,21 +496,28 @@ static void rtl_inval_tx_desc(struct TxDesc *desc)
>>>  static void rtl_flush_tx_desc(struct TxDesc *desc)
>>>  {
>>>  #ifndef CONFIG_SYS_NONCACHED_MEMORY
>>> -       flush_cache((unsigned long)desc, sizeof(*desc));
>>> +       unsigned long start = (unsigned long)desc & ~(ARCH_DMA_MINALIGN -
>>> 1);
>>> +       unsigned long sz = ALIGN(sizeof *desc, ARCH_DMA_MINALIGN);
>>> +
>>> +       flush_cache(start, sz);
>>>  #endif
>>>  }
>>
>>
>> Those two are wrong. Hopefully neither of those changes do anything on
>> Jetson TK1, since CONFIG_SYS_NONCACHED_MEMORY is enabled there.
>>
>> The cache line is likely larger than the individual descriptor size, so
>> rounding up the flush length to a whole cache line will likely end up
>> flushing more descriptors than you want. This will eventually result in SW
>> over-writing a HW update to another descriptor, and so at least lose
>> packets. For this reason, CONFIG_SYS_NONCACHED_MEMORY must be set if the
>> descriptor and cache line sizes don't match, except for systems where IO is
>> fully cache-coherent and hence the cache operations are no-ops.
>
> I agree... it this should (and does) require
> CONFIG_SYS_NONCACHED_MEMORY in that case.  This is in the top of this
> driver:
>
> /*
>  * Warn if the cache-line size is larger than the descriptor size. In such
>  * cases the driver will likely fail because the CPU needs to flush the cache
>  * when requeuing RX buffers, therefore descriptors written by the hardware
>  * may be discarded.
>  *
>  * This can be fixed by defining CONFIG_SYS_NONCACHED_MEMORY which will cause
>  * the driver to allocate descriptors from a pool of non-cached memory.
>  */
> #if RTL8169_DESC_SIZE < ARCH_DMA_MINALIGN
> #if !defined(CONFIG_SYS_NONCACHED_MEMORY) && \
>         !defined(CONFIG_SYS_DCACHE_OFF) && !defined(CONFIG_X86)
> #warning cache-line size is larger than descriptor size
> #endif
> #endif
>
>>>  static void rtl_inval_buffer(void *buf, size_t size)
>>>  {
>>> -       unsigned long start = (unsigned long)buf & ~(ARCH_DMA_MINALIGN -
>>> 1);
>>> -       unsigned long end = ALIGN(start + size, ARCH_DMA_MINALIGN);
>>> +       unsigned long end = ALIGN((unsigned long)buf + size,
>>> ARCH_DMA_MINALIGN);
>>>
>>> -       invalidate_dcache_range(start, end);
>>> +        /* buf is aligned to RTL8169_ALIGN,
>>> +         * which is a multiple of ARCH_DMA_ALIGN
>>> +         */
>>> +       invalidate_dcache_range((unsigned long)buf, end);
>>>  }
>>>
>>>  static void rtl_flush_buffer(void *buf, size_t size)
>>>  {
>>> -       flush_cache((unsigned long)buf, size);
>>> +       unsigned long sz = ALIGN(size, ARCH_DMA_MINALIGN);
>>> +
>>> +       flush_cache((unsigned long)buf, sz);
>>>  }
>>
>>
>> I believe the correct approach is for the caller (network core code) to
>> provide cache-aligned buffers, rather than for each driver to align the
>> start/size when performing cache operations. Again, this is to ensure that
>> cache operations don't affect any other data adjacent to the buffer. Can you
>> see why the network core code isn't using cache-aligned buffers when DM_ETH
>> is enabled? Perhaps DM_ETH isn't the trigger, but just changed something
>> about the memory layout that exposed some other pre-existing issue.
>
> This should already be the case.  The transmit buffer is setup like this:
>
>          net_tx_packet = &net_pkt_buf[0] + (PKTALIGN - 1);
>          net_tx_packet -= (ulong)net_tx_packet % PKTALIGN;
>
> in net/net.c:net_init().
>
> The recv buffers are defined by the drivers and passed back to the
> core network code. In this case, the misalignment is caused by the
> rtl8169 driver...
>
> rtl8169_init_ring() does this:
>
>         tpc->RxBufferRing[i] = &rxb[i * RX_BUF_SIZE];
>
> ...but the size breaks alignment for all but the first entry:
>
>        #define RX_BUF_SIZE     1536    /* Rx Buffer size */
>
> This should be fixed by defining this instead:
>
>        #define RX_BUF_SIZE     ALIGN(1536, RTL8169_ALIGN)    /* Rx
> Buffer size */
>
> Also, there is an extra buffer that is memcpy'ed to, and then that is
> passed back to the core net code instead of the actual buffer that was
> recv'd into; I don't know why:
>
>        static unsigned char rxdata[RX_BUF_LEN];

I also noticed that the code to setup the tx ring buffers are
completely wrong. It's a good thing that NUM_TX_DESC is defined to be
1. From rtl8169_init_ring():

        for (i = 0; i < NUM_TX_DESC; i++) {
                tpc->Tx_skbuff[i] = &txb[i];
        }

That would set the buffer not only to be unaligned after the index 0,
but also to be one byte after the previous, thus they would all stomp
on each other.

It should be:

                tpc->Tx_skbuff[i] = &txb[i * RX_BUF_SIZE];

It's probably also worth making another define for TX_BUF_SIZE even if
it is just defined to RX_BUF_SIZE to keep it from looking like a bug.

Cheers,
-Joe