[U-Boot] [PATCH V2] net: fec_mxc: allow use with cache enabled
Eric Nelson
eric.nelson at boundarydevices.com
Tue Mar 6 22:28:34 CET 2012
On 03/06/2012 02:22 PM, Marek Vasut wrote:
> Dear Eric Nelson,
>
>> On 03/06/2012 12:45 PM, Marek Vasut wrote:
>>> Dear Eric Nelson,
>>>
>>>> On 03/05/2012 01:06 PM, Marek Vasut wrote:
>>>>> Dear Eric Nelson,
>>>>>
>>>>>> ensure that transmit and receive buffers are cache-line aligned
>>>>>>
>>>>>> invalidate cache after each packet received
>>>>>> flush cache before transmitting
>>>>>>
>>>>>> Original patch by Marek:
>>>>>> http://lists.denx.de/pipermail/u-boot/2012-February/117695.html
>>>>>
>>>>> Would be cool to Cc me :-p
>>>>
>>>> Sorry about that.
>>>
>>> It's ok, don't worry about it ;-)
>>>
>>> [...]
>>>
>>>>> I think this "writel()" call is bogus and should be removed in
>>>>> subsequent patch and replaced with simple assignment. It was here
>>>>> probably due to cache issues on PPC?
>>>>
>>>> The RBD has me puzzled. Do we treat them like registers and use
>>>> readx/writex or like in-RAM data structures...
>>>
>>> I'd go for the in-RAM data structures, but such conversion should happen
>>> in a separate patch only AFTER the cache support is in.
>>>
>>> [...]
>>
>> Sounds good.
>>
>>>>>> + if (!fec->rbd_base) {
>>>>>> + ret = -ENOMEM;
>>>>>> + goto err2;
>>>>>> + }
>>>>>> + memset(fec->rbd_base, 0, size);
>>>>>> + }
>>>>>
>>>>> We might want to flush the descriptors to memory after they have been
>>>>> inited?
>>>>
>>>> Again, good catch.
>>>>
>>>> On this topic (initialization of RBD), I had a bit of a concern
>>>> regarding the initialization of things.
>>>>
>>>> In fec_open, the receive buffer descriptors are initialized and the
>>>> last one set is to 'wrap'. If this code were to execute when the
>>>> controller is live, bad things would surely happen.
>>>>
>>>> I traced through all of the paths I can see, and I believe that
>>>> we're safe. It appears that fec_halt() will be called prior to
>>>> any call to fec_init() and fec_open().
>>>
>>> Yes, this will only happen if something went wrong.
>>>
>>>> In fec_open() a number of calls to fec_rbd_clean() are made and
>>>> a single flush_dcache() is made afterwards.
>>>>
>>>> While this works and causes less thrashing than per-RBD flushes,
>>>> I think the code would be simpler if fec_rbd_clean just did the
>>>> flush itself. This would also remove the need for a separate
>>>> flush in fec_recv.
>>>
>>> Not really, rbd might be (and likely is) smaller than cache line,
>>> therefore you won't be able to flush single rbd, unless it's cacheline
>>> aligned, which wastes space.
>>>
>>> [...]
>>
>> Yeah. Please disregard my comments. I wrote that before I fully
>> appreciated what was being done in fec_recv().
>>
>>>>>> + invalidate_dcache_range(addr, addr + size);
>>>>>> +
>>>>>
>>>>> The idea here is the following (demo uses 32byte long cachelines):
>>>>>
>>>>> [DESC1][DESC2][DESC3][DESC4][DESC5][DESC6]
>>>>> `------- cacheline --------'
>>>>>
>>>>> We want to start retrieving contents of DESC3, therefore "addr" points
>>>>> to DESC1, "size" is size of cacheline (I hope there's no hardware with
>>>>> 8 byte cachelines, but this should be ok here).
>>>>>
>>>>> NOTE[5]: Here we can invalidate the whole cacheline, because the
>>>>> descriptors so far were modified only be hardware, not by us. We are
>>>>> not writing anything there so we won't loose any information.
>>>>>
>>>>> NOTE[6]: This invalidation ensures that we always have a fresh copy of
>>>>> the cacheline containing all the descriptors, therefore we always have
>>>>> a fresh status of the descriptors we are about to pick. Since this is
>>>>> a sequential execution, the cache eviction should not kick in here (or
>>>>> might it?).
>>>>
>>>> Another way to look at this is this:
>>>> After fec_open(), the hardware owns the rbd, and all we should do
>>>> is read it. In order to make sure we don't have a stale copy, we
>>>> need to call invalidate() before looking at the values.
>>>>
>>>> Tracing the code to find out whether this is true, the only write I see
>>>> is within fec_recv() when the last descriptor is full, when the driver
>>>> takes ownership of **all** of the descriptors, calling fec_rbd_clean()
>>>> on each.
>>>>
>>>> The only thing that looks funky is this:
>>>> size = (CONFIG_FEC_ALIGN / sizeof(struct fec_bd)) - 1;
>>>> if ((fec->rbd_index& size) == size) {
>>>>
>>>> Wouldn't a test of rbd_index against FEC_RBD_NUM be more appropriate?
>>>> i.e.
>>>>
>>>> if (fec->rbd_index == FEC_RBD_NUM-1) {
>>>
>>> I believe the FEC doesn't always start from rbd_index == 0, and if you
>>> were to receive more than 64 rbds between open() and close(), this
>>> implementation works, your would fail.
>>
>> Yep. Disregard that too.
>>
>> <snip>
>>
>>>>> The solution is the following:
>>>>>
>>>>> 1) Compute how many descriptors are per-cache line
>>>>> 2) Make sure FEC_RBD_NUM * sizeof(struct fec_bd) is at least 2 *
>>>>> CONFIG_FEC_DATA_ALIGNMENT in size, see NOTE[11].
>>>>> 3) Once the last RX buffer in the cacheline is processed, mark them all
>>>>> clean and flush them all, see NOTE[10].
>>>>>
>>>>> NOTE[10]: This is legal, because the hardware won't use RX descriptors
>>>>> that it marked dirty (which means not picked up by software yet). We
>>>>> clean the desciptors in an order the hardware would pick them up again
>>>>> so there's no problem with race condition either. The only possible
>>>>> issue here is if there was hardware with cacheline size smaller than
>>>>> descriptor size (we should add a check for this at the begining of the
>>>>> file).
>>>>>
>>>>> NOTE[11]: This is because we want the FEC to overwrite descriptors
>>>>> below the other cacheline while we're marking the one containing
>>>>> retrieved descriptors clean.
>>>>
>>>> Ahah! Now I see what the size calculation is doing.
>>>>
>>>> A well-named constant, maybe "RXDESC_PER_CACHELINE" would be useful
>>>> here.
>>>
>>> Yes
>>>
>>>> #define RXDESC_PER_CACHELINE (CONFIG_FEC_ALIGN/sizeof(struct fec_bd))
>>>>
>>>>>> - fec_rbd_clean(fec->rbd_index == (FEC_RBD_NUM - 1) ? 1 :
> 0, rbd);
>>>>>> + size = (CONFIG_FEC_DATA_ALIGNMENT / sizeof(struct
> fec_bd)) - 1;
>>>>>>
>>>> size = RXDESC_PER_CACHELINE-1;
>>>>
>>>>>> + if ((fec->rbd_index& size) == size) {
>>>>
>>>> The line above only works if RXDESC_PER_CACHELINE is a multiple of 2,
>>>> which is likely to work because sizeof(struct fec_bd) == 8.
>>>
>>> Adding such a comment (and maybe CPP check) won't hurt.
>>
>> I'm struggling getting the CPP to do the work at the moment...
>>
>>>>>> + i = fec->rbd_index - size;
>>>>>> + addr = (uint32_t)&fec->rbd_base[i];
>>>>>> + for (; i<= fec->rbd_index + size; i++) {
>>>>
>>>> This flushes too many descriptors! This should be:
>>>> for (; i<= fec->rbd_index; i++) {
>>>
>>> Agreed
>>
>> V3 patch forthcoming.
>>
>>>>> Uh, this was tough.
>>>>
>>>> How bad do we want cache?
>>>
>>> We're almost there, why do you ask? :-)
>>
>> I was just bein' snarky...
>>
>> I'm making a couple of other small changes in V3:
>>
>> - change fec_rbd_clean to only have a single
>> call to write() and make it clearer that there's
>> only one additional flag iff 'last'
>> - use comparison to supply 'last' parameter in
>> the call to fec_rbd_clean
>>
>> I think each of these makes the intent clearer.
>>
>> diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
>> index d5d0d5e..f8691d4 100644
>> --- a/drivers/net/fec_mxc.c
>> +++ b/drivers/net/fec_mxc.c
>> @@ -355,16 +355,10 @@ static void fec_tbd_init(struct fec_priv *fec)
>> */
>> static void fec_rbd_clean(int last, struct fec_bd *pRbd)
>> {
>> - /*
>> - * Reset buffer descriptor as empty
>> - */
>> + unsigned short flags = FEC_RBD_EMPTY;
>> if (last)
>> - writew(FEC_RBD_WRAP | FEC_RBD_EMPTY,&pRbd->status);
>> - else
>> - writew(FEC_RBD_EMPTY,&pRbd->status);
>> - /*
>> - * no data in it
>> - */
>> + flags |= FEC_RBD_WRAP;
>> + writew(flags,&pRbd->status);
>> writew(0,&pRbd->data_length);
>> }
>>
>> @@ -880,10 +874,8 @@ static int fec_recv(struct eth_device *dev)
>> i = fec->rbd_index - size;
>> addr = (uint32_t)&fec->rbd_base[i];
>> for (; i<= fec->rbd_index ; i++) {
>> - if (i == (FEC_RBD_NUM - 1))
>> - fec_rbd_clean(1,
>> &fec->rbd_base[i]); - else
>> - fec_rbd_clean(0,
>> &fec->rbd_base[i]); + fec_rbd_clean(i ==
>> (FEC_RBD_NUM - 1), +
>> &fec->rbd_base[i]);
>> }
>> flush_dcache_range(addr,
>> addr + CONFIG_FEC_ALIGN);
>
> Looking forward to V3!
>
Here's an early peek. I have to jump on a CC so I haven't had time to
generate a proper one.
Also haven't straightened out FEC_ALIGN and RXDESC_PER_CACHELINE macros.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fec-mxc-cache-v3.patch
Type: text/x-patch
Size: 13648 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120306/24cc6025/attachment.bin>
More information about the U-Boot
mailing list