[U-Boot] tegra124 jetson-tk1 ethernet problems.

Thu Aug 4 23:48:08 CEST 2016

On Thu, Aug 4, 2016 at 4:34 PM, Joe Hershberger
<joe.hershberger at gmail.com> wrote:
> On Thu, Aug 4, 2016 at 4:22 PM, Joe Hershberger
> <joe.hershberger at gmail.com> wrote:
>> Hi Stephen,
>>
>> On Tue, Aug 2, 2016 at 11:56 AM, Stephen Warren <swarren at wwwdotorg.org> wrote:
>>> On 08/01/2016 09:20 PM, Peter Chubb wrote:
>>>>
>>>>
>>>> Hi Folks,
>>>>    Since patch 96350f729c42 "dm: tegra: net: Convert tegra boards to
>>>>    driver model for Ethernet" booting via dhcp has been broken on the
>>>>    Jetson TK1.
>>>
>>>
>>> Best to Cc the net and DM maintainers (Joe Hershberger and Simon Glass) on
>>> this to make sure they see the issue. I've done so here.
>>>
>>>>    I tried applying "net: Probe PCI before looking for ethernet
>>>>    devices"; this `works' in that the ethernet device is detected and
>>>>    works,  but I end up with huge numbers of
>>>>       CACHE: Misaligned operation at range [fffb8c00, fffb8c2e]
>>>>    messages on the serial console.
>>>
>>>>
>>>>
>>>>    These come from the flush_cache() calls in net/rtl8169.c.  I
>>>>    suggest the attached patch (or something like it):
>>>
>>>
>>> Interesting; in my automated testing system, I do see these cache error
>>> messages, yet ping and TFTP work for me. Admittedly my test setup uses a
>>> static IP configuration rather than DHCP.
>>>
>>>> diff --git a/drivers/net/rtl8169.c b/drivers/net/rtl8169.c
>>>
>>>
>>>>  static void rtl_flush_rx_desc(struct RxDesc *desc)
>>>>  {
>>>>  #ifndef CONFIG_SYS_NONCACHED_MEMORY
>>>> -       flush_cache((unsigned long)desc, sizeof(*desc));
>>>> +       unsigned long start = (unsigned long)desc & ~(ARCH_DMA_MINALIGN -
>>>> 1);
>>>> +       unsigned long size = ALIGN(sizeof(*desc), ARCH_DMA_MINALIGN);
>>>> +
>>>> +       flush_cache(start, size);
>>>>  #endif
>>>>  }
>>>>
>>>> @@ -493,21 +496,28 @@ static void rtl_inval_tx_desc(struct TxDesc *desc)
>>>>  static void rtl_flush_tx_desc(struct TxDesc *desc)
>>>>  {
>>>>  #ifndef CONFIG_SYS_NONCACHED_MEMORY
>>>> -       flush_cache((unsigned long)desc, sizeof(*desc));
>>>> +       unsigned long start = (unsigned long)desc & ~(ARCH_DMA_MINALIGN -
>>>> 1);
>>>> +       unsigned long sz = ALIGN(sizeof *desc, ARCH_DMA_MINALIGN);
>>>> +
>>>> +       flush_cache(start, sz);
>>>>  #endif
>>>>  }
>>>
>>>
>>> Those two are wrong. Hopefully neither of those changes do anything on
>>> Jetson TK1, since CONFIG_SYS_NONCACHED_MEMORY is enabled there.
>>>
>>> The cache line is likely larger than the individual descriptor size, so
>>> rounding up the flush length to a whole cache line will likely end up
>>> flushing more descriptors than you want. This will eventually result in SW
>>> over-writing a HW update to another descriptor, and so at least lose
>>> packets. For this reason, CONFIG_SYS_NONCACHED_MEMORY must be set if the
>>> descriptor and cache line sizes don't match, except for systems where IO is
>>> fully cache-coherent and hence the cache operations are no-ops.
>>
>> I agree... it this should (and does) require
>> CONFIG_SYS_NONCACHED_MEMORY in that case.  This is in the top of this
>> driver:
>>
>> /*
>>  * Warn if the cache-line size is larger than the descriptor size. In such
>>  * cases the driver will likely fail because the CPU needs to flush the cache
>>  * when requeuing RX buffers, therefore descriptors written by the hardware
>>  * may be discarded.
>>  *
>>  * This can be fixed by defining CONFIG_SYS_NONCACHED_MEMORY which will cause
>>  * the driver to allocate descriptors from a pool of non-cached memory.
>>  */
>> #if RTL8169_DESC_SIZE < ARCH_DMA_MINALIGN
>> #if !defined(CONFIG_SYS_NONCACHED_MEMORY) && \
>>         !defined(CONFIG_SYS_DCACHE_OFF) && !defined(CONFIG_X86)
>> #warning cache-line size is larger than descriptor size
>> #endif
>> #endif
>>
>>>>  static void rtl_inval_buffer(void *buf, size_t size)
>>>>  {
>>>> -       unsigned long start = (unsigned long)buf & ~(ARCH_DMA_MINALIGN -
>>>> 1);
>>>> -       unsigned long end = ALIGN(start + size, ARCH_DMA_MINALIGN);
>>>> +       unsigned long end = ALIGN((unsigned long)buf + size,
>>>> ARCH_DMA_MINALIGN);
>>>>
>>>> -       invalidate_dcache_range(start, end);
>>>> +        /* buf is aligned to RTL8169_ALIGN,
>>>> +         * which is a multiple of ARCH_DMA_ALIGN
>>>> +         */
>>>> +       invalidate_dcache_range((unsigned long)buf, end);
>>>>  }
>>>>
>>>>  static void rtl_flush_buffer(void *buf, size_t size)
>>>>  {
>>>> -       flush_cache((unsigned long)buf, size);
>>>> +       unsigned long sz = ALIGN(size, ARCH_DMA_MINALIGN);
>>>> +
>>>> +       flush_cache((unsigned long)buf, sz);
>>>>  }
>>>
>>>
>>> I believe the correct approach is for the caller (network core code) to
>>> provide cache-aligned buffers, rather than for each driver to align the
>>> start/size when performing cache operations. Again, this is to ensure that
>>> cache operations don't affect any other data adjacent to the buffer. Can you
>>> see why the network core code isn't using cache-aligned buffers when DM_ETH
>>> is enabled? Perhaps DM_ETH isn't the trigger, but just changed something
>>> about the memory layout that exposed some other pre-existing issue.
>>
>> This should already be the case.  The transmit buffer is setup like this:
>>
>>          net_tx_packet = &net_pkt_buf[0] + (PKTALIGN - 1);
>>          net_tx_packet -= (ulong)net_tx_packet % PKTALIGN;
>>
>> in net/net.c:net_init().
>>
>> The recv buffers are defined by the drivers and passed back to the
>> core network code. In this case, the misalignment is caused by the
>> rtl8169 driver...
>>
>> rtl8169_init_ring() does this:
>>
>>         tpc->RxBufferRing[i] = &rxb[i * RX_BUF_SIZE];
>>
>> ...but the size breaks alignment for all but the first entry:
>>
>>        #define RX_BUF_SIZE     1536    /* Rx Buffer size */
>>
>> This should be fixed by defining this instead:
>>
>>        #define RX_BUF_SIZE     ALIGN(1536, RTL8169_ALIGN)    /* Rx
>> Buffer size */
>>
>> Also, there is an extra buffer that is memcpy'ed to, and then that is
>> passed back to the core net code instead of the actual buffer that was
>> recv'd into; I don't know why:
>>
>>        static unsigned char rxdata[RX_BUF_LEN];
>
> I also noticed that the code to setup the tx ring buffers are
> completely wrong. It's a good thing that NUM_TX_DESC is defined to be
> 1. From rtl8169_init_ring():
>
>         for (i = 0; i < NUM_TX_DESC; i++) {
>                 tpc->Tx_skbuff[i] = &txb[i];
>         }
>
> That would set the buffer not only to be unaligned after the index 0,
> but also to be one byte after the previous, thus they would all stomp
> on each other.
>
> It should be:
>
>                 tpc->Tx_skbuff[i] = &txb[i * RX_BUF_SIZE];
>
> It's probably also worth making another define for TX_BUF_SIZE even if
> it is just defined to RX_BUF_SIZE to keep it from looking like a bug.

Essentially this patch was incomplete; only ensuring that the first of
the buffers in each ring were aligned:

https://patchwork.ozlabs.org/patch/419406/

Cheers,
-Joe