[U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds

Przemyslaw Marczak p.marczak at samsung.com
Fri Sep 18 10:07:14 CEST 2015


Hello,

On 09/18/2015 09:32 AM, Lukasz Majewski wrote:
> Hi Tom,
>
>> On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:
>>
>>> Hi Tom,
>>>
>>>> On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski
>>>> wrote:
>>>>> Hi Alexey,
>>>>>
>>>>>> Hi Marek, Lukasz,
>>>>>>
>>>>>> On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:
>>>>>>> On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz
>>>>>>> Majewski wrote:
>>>>>>>> Hi Marek,
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>>>>> Still we need to fix regression first with virtually
>>>>>>>>>>> infinite timeout :) I would even thing that simple
>>>>>>>>>>> revert of Marek's patch may make sense for now.
>>>>>>>>>>
>>>>>>>>>> +1 - unfortunately there were some other patches
>>>>>>>>>> applied to this particular patch. Simple revert might
>>>>>>>>>> be a bit tricky here.
>>>>>>>>>
>>>>>>>>> -1 - In case the card gets removed during the DMA
>>>>>>>>> transfer and the board doesn't have a watchdog, it will
>>>>>>>>> get stuck indefinitelly.
>>>>>>>>
>>>>>>>> I'm just wondering here - why the indefinite loop was
>>>>>>>> working previously? Was anybody complaining (on the ML)
>>>>>>>> about the problem of removing the SD card when some
>>>>>>>> operation is ongoing?
>>>>>>>
>>>>>>> It worked for me for all the workloads I used. Noone was
>>>>>>> complaining.
>>>>>>
>>>>>> The same story here - previous code with infinite loop was
>>>>>> working for my boards. And now I do see a problem with pretty
>>>>>> simple scenario that we do use in our products.
>>>>>>
>>>>>>>> The problem with potential removal of SD card (after
>>>>>>>> booting the board) is with us for quite long time. Even
>>>>>>>> with indefinite loop (without your patch) we also could
>>>>>>>> "hang" the board if the SD card was removed during a
>>>>>>>> transfer.
>>>>>>>
>>>>>>> Which is why we should weed out the unbounded loops.
>>>>>>>
>>>>>>>>> We
>>>>>>>>> absolutelly don't want this sort of behavior in U-Boot.
>>>>>>>>> I understand that this is the easiest way for everyone
>>>>>>>>> to achieve some sort of "working" solution, but it is
>>>>>>>>> definitelly not the correct one. While I do agree to
>>>>>>>>> increasing the timeout, I do not agree to unbounded
>>>>>>>>> loops, sorry.
>>>>>>>>
>>>>>>>> We have agreed to not agree :-)
>>>>>>>
>>>>>>> Yes :-)
>>>>>>
>>>>>> The first thing I care is working U-Boot v2015.10 out of the
>>>>>> box on my boards. And so I may agree on any temporary
>>>>>> solution. I see it as timeout value either being infinite or
>>>>>> obviously very high like 60 seconds.
>>>>>>
>>>>>> 60 seconds might sound stupid but my thought behind this is to
>>>>>> make sure even long transfers succeed. Imagine 100 Mb rootfs
>>>>>> or update file downloaded from slow SD-card.
>>>>>
>>>>> Transfer of rootfs to SD-card (downloaded to memory via tftp) is
>>>>> definitely valid scenario.
>>>>>
>>>>>>>>>>>  From both points of view for keeping history
>>>>>>>>>>> clean (compared to stacked fixes/workarounds) and
>>>>>>>>>>> from removal of regression root cause.
>>>>>>>>>>
>>>>>>>>>> As I said before - +1 from me.
>>>>>>>>>
>>>>>>>>> As I said before, -1 from me. Btw. did anything regress
>>>>>>>>> in here? To me, this seems like a newly discovered
>>>>>>>>> bug ...
>>>>>>>>
>>>>>>>> Yes, this is a bug. We had similar problem with Samsung's
>>>>>>>> SDHCI, before we switched to dw_mmc. This issue is new at
>>>>>>>> dw_mmc.
>>>>>>>>
>>>>>>>>>>> It's not that I like to have infinite loops but
>>>>>>>>>>> given previous implementation worked fine for
>>>>>>>>>>> people in the previous U-Boot release.
>>>>>>>>>>
>>>>>>>>>> Good justification
>>>>>>>>>
>>>>>>>>> It is never a justified to return to a potentially
>>>>>>>>> problematic version
>>>>>>>>
>>>>>>>> IMHO revering the change (before the release) is from the
>>>>>>>> software development point of view better solution than
>>>>>>>> adding some heuristic delta to timeout.
>>>>>>>>
>>>>>>>>> for the sake of getting some sort of crappy hardware
>>>>>>>>> operational.
>>>>>>>>
>>>>>>>> Unfortunately this "crappy hardware" is pervasive and we
>>>>>>>> cannot do anything about it.
>>>>>>>>
>>>>>>>> To sum up (my point of view):
>>>>>>>> 1. The best would be to revert the patch - but if simple
>>>>>>>> "git revert" is not working then,
>>>>>>
>>>>>> Well even if clean revert won't work we may do manual tweaks
>>>>>> so that functionally it is "revert". If of any interest I may
>>>>>> come up with that sort of patch.
>>>>>>
>>>>>>>> 2. We should increase the timeout (with my patch) for
>>>>>>>> v2015.10 release
>>>>>>
>>>>>> If everybody is OK with that let's go do it. Because release
>>>>>> is around the corner and I don't want to explain each and
>>>>>> every user how to fix their new problem.
>>>>>
>>>>> v2015.10 correctness is crucial here.
>>
>> Yes.
>>
>>>>>>> Let's do this for the sake of crappy cards.
>>>>>>>
>>>>>>>> 3. After release we can devise some kind of solution
>>>>>>>> 4. It is a good topic for U-boot's minisummit discussion
>>>>>>>> at Dublin
>>>>>>>>
>>>>>>>> Marek, Alexey, Tom, Pantelis what do you think?
>>>>>>>
>>>>>>> I think yes.
>>>>>>
>>>>>> What's important we need to make sure Tom is aware of this
>>>>>> situation and he won't cut a release until our fix is in place
>>>>>> and all involved parties reported their happiness.
>>>>>
>>>
>>>
>>>>> I also think that Tom should speak up on this issue.
>>>
>>> Tom, could you give your opinion on this issue?
>>
>> Well, is there a concensus patch now?
>>
>
> There isn't a consensus patch.
>
> There are two options:
> 1. Try to revert changes (which remove endless loop)
>
> 2. Increase the delay to e.g. 4 minutes (as it is done in Linux) before
> v2015.10 and provide correct solution (based on internal SD card
> information) after the release.
>
>

I recommend the second option, however please note that SD card doesn't 
provide any information about the delay required to finish the command,
beside some informations about block erase operation time.

The block erase time info, probably depends on vendor implementation, so 
it would be better to use as long time for delay as possible, and just 
rely on the controller error status info instead of breaking the 
operation after too short timeout.

In such case I think, that it would be nice if user could be notified 
with some time interval, that the card is busy.

In other words, when he's waiting the second minute for the transfer 
end, then how does he know if the board doesn't hang?

Best regards,
-- 
Przemyslaw Marczak
Samsung R&D Institute Poland
Samsung Electronics
p.marczak at samsung.com


More information about the U-Boot mailing list