[RFC PATCH 4/6] drivers: misc: k3_bist: Add K3 BIST driver

Tue Sep 10 11:11:10 CEST 2024

Hi Udit,

On 05/09/24 17:20, Kumar, Udit wrote:
> 
> On 9/5/2024 3:11 PM, Neha Malcom Francis wrote:
>> On 04/09/24 10:35, Kumar, Udit wrote:
>>>
>>> On 9/3/2024 5:14 PM, Neha Malcom Francis wrote:
>>>> Add a driver for the BIST module which currently includes support for
>>>> BIST IPs that trigger PBIST (Memory BIST).
>>>>
>>>> Signed-off-by: Neha Malcom Francis <n-francis at ti.com>
>>>> ---
>>>>   drivers/misc/Kconfig               |   8 +
>>>>   drivers/misc/Makefile              |   1 +
>>>>   drivers/misc/k3_bist.c             | 507 ++++++++++++++++++++++++++
>>>>   drivers/misc/k3_bist_static_data.h | 551 +++++++++++++++++++++++++++++
>>>>   4 files changed, 1067 insertions(+)
>>>>   create mode 100644 drivers/misc/k3_bist.c
>>>>   create mode 100644 drivers/misc/k3_bist_static_data.h
>>>>
>>>> [...]
>>>> +
>>>
>>>
>>> In general, few macro's name are too long
>>>
>>> many places hard-coded values are used, please consider to move to macro
>>>
>>> driver looks to be j784s4 specific including header files  ,
>>>
>>> please see, if we can make this generic driver.
>>
>> I've put SoC specific (J784S4 right now) data protected with SoC specific 
>> configs in k3_bist_static_data.h; the hardcoded values are a sequence for 
>> triggering a specific test, whatever is generic and known I have put as a 
>> macro, however I'll try to better understand the sequence if I can put them as 
>> macros.
>>
>>>
>>>
>>>> +#include "k3_bist_static_data.h"
>>>> +
>>>> [...]
>>>
>>> Please consider to rename function name as per description above something like
>>>
>>> check_pbist_results_of_mcu_domain,, if you agree
>>>
>>> Also give more context, I believe, ROM runs BIST on MCU domain, Please 
>>> consider to mention, if you want
>>
>> Yes will do!
>>
>>>
>>>> +{
>>>> +    bool is_done, timed_out;
>>>> +    u32 mask;
>>>> +    u32 post_reg_val, shift;
>>>> +
>>>> +    /* Read HW POST status register */
>>>> +    post_reg_val = readl(WKUP_CTRL_MMR0_BASE + 
>>>> WKUP_CTRL_MMR_CFG0_WKUP_POST_STAT);
>>>> +
>>>> +    /* Check if HW POST PBIST was performed */
>>>> +    shift = WKUP_CTRL_MMR_CFG0_WKUP_POST_STAT_POST_MCU_PBIST_DONE_SHIFT;
>>>> +    is_done = (((post_reg_val >> shift) & 0x1u) == 0x1u) ? (bool)true : 
>>>> (bool)false;
>>>> +
>>>> +    if (!is_done) {
>>>> +        /* HW POST: PBIST not completed, check if it timed out */
>>>> +        shift = 
>>>> WKUP_CTRL_MMR_CFG0_WKUP_POST_STAT_POST_MCU_PBIST_TIMEOUT_SHIFT;
>>> Too long macro name
>>>> +        timed_out = (((post_reg_val >> shift) & 0x1u) == 0x1u) ? (bool)true 
>>>> : (bool)false;
>>>> +
>>>> +        if (!timed_out) {
>>>> +            debug("%s: PBIST was not performed at all on this device for 
>>>> this core\n",
>>>> +                  __func__);
>>>> +            return -EINVAL;
>>>
>>> This is error no ? , move to dev_err instead of debug
>>>
>>
>> The return in k3_bist_probe throws a dev_err saying HW POST failed to run 
>> successfully. So these were added as debugs in case the end user wants to know 
>> exact cause of failure, I can move it as a dev_err as well.
> 
> ok, thanks
> 
> 
>>
>>>
>>>> +        } else {
>>>> +            debug("%s: PBIST was attempted but timed out for this 
>>>> section\n", __func__);
>>>> +            return -ETIMEDOUT;
>>>
>>> This is error no ? , move to dev_err instead of debug .
>>>
>>> What next, reboot SOC or just continue booting in case of error
>>
>> This is also something I wanted this RFC to address, I prefer rebooting SoC if 
>> HW POST fails. HW POST is performed by ROM based on certain switch settings, 
>> which implies that an end-user wants this check done if selected. And if it 
>> fails on the MCU domain itself, I do not think we should continue.
> 
> Ok, if you are saying BIST in MCU domain in done based upon some switch 
> settings, then please put that switch logic here .
> 
> Reading code above looks, BIST run always
> 
> if (!is_done) {
> +        /* HW POST: PBIST not completed, check if it timed out */
> +    } else {
> +        /* HW POST: PBIST was completed on this device, check the result */
> 
> So now have three conditions,
> 1) BIST not attempted
> 2) BIST ran and passed
> 3) BIST ran and failed
> 
> For 1) condition, please see if you can read some register or switch settings or 
> so.
> 
> For 3) , as default we should hang.. In case users wants to continue on failure 
> they can modify u-boot source to do so :)
> 

Okay that makes sense, I will make the change in v1.
> 
>>>
>>>
>>>> [..]
>>>> isn't some tisci call are ok to turn off the CPUs.
>>>>
>>
>> DM (Device Manager) firmware, responsible for power management is not up at 
>> this point in the boot flow (R5 SPL). Thus TISCI calls that turn on/turn off 
>> clocks and power domains are not available and we rely on the primitive 
>> clk-k3.c and ti-power-domain.c drivers to do this for us. As seen below, using 
>> the uclass functions which internally calls these primitive drivers would be 
>> the way to go.
>>
> But DM is kind implemented by dev-data and dev-clk logic ?
> 

Yes, it is. I am using the uclass API for both power and clock which internally 
uses the drivers with this logic.

> For few calls DM just talk to TIFS, which are available at this point.

Yes, but from my POV this driver does not need to know what the clock and power 
APIs internally do, it should just call the uclass APIs like this. And in case 
of power off, whatever DM would have done is already covered in ti-power-domain.c
> 
> I request to check once, if possible
> 
> 
>> I could have used the remoteproc framework to do this but currently the rproc 
>> driver uses TISCI firmware calls from DM and as mentioned above that's not 
>> possible.
>>
>> We should probably target modifying our remoteproc driver to use these generic 
>> uclass APIs instead of direct TISCI calls.
> 
> Ok,
> 
>>
>>>
>>>> [...]
>>>> +    udelay(1000);
>>>> +
>>>> +    u32 timeout_count = 0;
>>> Please move timeout_count at start of function
>>>> +
>>>> +    while ((!(readl(VIM_RAW(intr_num)) & VIM_RAW_MASK(intr_num))) &&
>>>> +           (timeout_count++ < PBIST_MAX_TIMEOUT_VALUE))
>>>> +        ;
>>>
>>> Do you want to add some delay instead of just reading in a loop
>>>
>>
>> That is possible... is there any benefit of delay over polling? Eithercase the 
>> possible time would be at max be PBIST_MAX_TIMEOUT_VALUE.
> 
> 
> You could be definitive in waiting for this register,
> 
> Say if hardware specs says, VIM register will be set in max time of 100 ms. then
> 
> keep PBIST_MAX_TIMEOUT_VALUE as 100 and read wait for 1 ms and then read.
> 
> 

I'm not sure I understand, the timeout condition if it happens first it will 
break from the loop, so the polling will not continue indefinitely.

>>
>>>
>>>> +
>>>> +    if (!(readl(VIM_RAW(intr_num)) & VIM_RAW_MASK(intr_num))) {
>>>> +        test_result = false;
>>>> +        debug("%s(dev=%p): test failed\n", __func__, dev);
>>>
>>> Fail is error , no ?
>>>
>>>
>>>> +    } else {
>>>> +        debug("%s(dev=%p): test passed\n", __func__, dev);
>>>> +    }
>>>> +
>>>> +    writel(0xffffffff, VIM_STS(intr_num));
>>>> +
>>>> +    return 0;
>>>
>>>
>>> Caller always will see success
>>>
>>
>> Right, I wasn't sure about what action to take based on the result.
> 
> 
> Failure is failure, as default please print and call hang
> 

Got it!

>>
>>>
>>>> +}
>>>> +
>>>> +/**
>>>> [..]

-- 
Thanking You
Neha Malcom Francis