[PATCH 0/6] Miscellaneous FWU fixes

Fri Sep 6 17:03:31 CEST 2024

On 9/6/24 11:47, Sughosh Ganu wrote:
> On Fri, 6 Sept 2024 at 12:05, Michal Simek <michal.simek at amd.com> wrote:
>>
>>
>>
>> On 9/5/24 11:17, Sughosh Ganu wrote:
>>> On Thu, 5 Sept 2024 at 14:07, Michal Simek <michal.simek at amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 9/5/24 10:12, Sughosh Ganu wrote:
>>>>> On Thu, 5 Sept 2024 at 13:20, Michal Simek <michal.simek at amd.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/5/24 09:43, Sughosh Ganu wrote:
>>>>>>> On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.simek at amd.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/5/24 09:38, Sughosh Ganu wrote:
>>>>>>>>> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.simek at amd.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 9/5/24 08:24, Sughosh Ganu wrote:
>>>>>>>>>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.simek at amd.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/30/24 13:40, Sughosh Ganu wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The following set of patches are miscellaneous fixes and some
>>>>>>>>>>>>> hardening of the FWU update logic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sughosh Ganu (6):
>>>>>>>>>>>>>          fwu: v2: perform some checks before reading metadata
>>>>>>>>>>>>>          fwu: v2: try reading both copies of metadata
>>>>>>>>>>>>>          fwu: v1: do a version check for the metadata
>>>>>>>>>>>>>          fwu: check all images for transitioning out of Trial State
>>>>>>>>>>>>>          fwu: add dependency checks for selecting FWU metadata version
>>>>>>>>>>>>>          fwu: do not allow capsule processing on exceeding Trial Counter
>>>>>>>>>>>>>            threshold
>>>>>>>>>>>>>
>>>>>>>>>>>>>         include/fwu.h            | 11 ++++++
>>>>>>>>>>>>>         lib/fwu_updates/Kconfig  |  1 +
>>>>>>>>>>>>>         lib/fwu_updates/fwu.c    | 31 +++++++++++++++-
>>>>>>>>>>>>>         lib/fwu_updates/fwu_v1.c | 18 +++++++--
>>>>>>>>>>>>>         lib/fwu_updates/fwu_v2.c | 80 ++++++++++++++++++++++------------------
>>>>>>>>>>>>>         5 files changed, 99 insertions(+), 42 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I found one more thing.
>>>>>>>>>>>>
>>>>>>>>>>>> I did this change
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c
>>>>>>>>>>>> index fbc2067bc12d..dab9530e499c 100644
>>>>>>>>>>>> --- a/tools/mkfwumdata.c
>>>>>>>>>>>> +++ b/tools/mkfwumdata.c
>>>>>>>>>>>> @@ -293,7 +293,7 @@ static void fwu_fill_version_specific_mdata(struct
>>>>>>>>>>>> fwu_mdata_object *mobj)
>>>>>>>>>>>>                struct fwu_mdata *mdata = mobj->mdata;
>>>>>>>>>>>>
>>>>>>>>>>>>                mdata->metadata_size = mobj->size;
>>>>>>>>>>>> -       mdata->desc_offset = sizeof(struct fwu_mdata);
>>>>>>>>>>>> +       mdata->desc_offset = 0x21; sizeof(struct fwu_mdata);
>>>>>>>>>>>>
>>>>>>>>>>>>                for (i = 0; i < MAX_BANKS_V2; i++)
>>>>>>>>>>>>                        mdata->bank_state[i] = i < mobj->banks ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> to break desc_offset location. I generated mdata structure and write it to
>>>>>>>>>>>> primary mdata partition.
>>>>>>>>>>>>
>>>>>>>>>>>> This has been spotted by (my debug) which is the reason why I did it.
>>>>>>>>>>>>
>>>>>>>>>>>>               if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) {
>>>>>>>>>>>>                       printf("mdata offset is not 0x20\n");
>>>>>>>>>>>>                       return false;
>>>>>>>>>>>>               }
>>>>>>>>>>>>
>>>>>>>>>>>> But I got one more message below which
>>>>>>>>>>>>
>>>>>>>>>>>> mdata offset is not 0x20
>>>>>>>>>>>> Both FWU metadata copies are valid but do not match. Restoring the secondary
>>>>>>>>>>>> partition from the primary
>>>>>>>>>>>
>>>>>>>>>>> The reason why this logic is in place is to handle the scenario
>>>>>>>>>>> whereby, during the metadata update, the primary copy gets updated
>>>>>>>>>>> correctly, but due to some reason (like a power cut), the secondary
>>>>>>>>>>> copy does not -- this is because the primary copy gets updated first.
>>>>>>>>>>> In such a scenario, the secondary copy of metadata can be then
>>>>>>>>>>> restored from the primary copy. In the scenario where the primary copy
>>>>>>>>>>> is corrupted, it will show up in the crc check, and will get restored
>>>>>>>>>>> from the secondary copy.
>>>>>>>>>>
>>>>>>>>>> But in this case - CRC of primary is correct but data inside is wrong because
>>>>>>>>>> offset is at 0x21 not at 0x20.
>>>>>>>>>
>>>>>>>>> How will this event play out ? Am I missing something ? When the
>>>>>>>>> metadata is getting updated, the offset will be written as 0x20, and
>>>>>>>>> not 0x21. And in case the metadata gets corrupted, the CRC check would
>>>>>>>>> fail and the primary partition would get restored from the secondary
>>>>>>>>> (assuming that the secondary copy is correct).
>>>>>>>>
>>>>>>>> It is not after update. It is initial state with incorrect mdata structure.
>>>>>>>
>>>>>>> Well, if the metadata partition is being written with incorrect data,
>>>>>>> not sure how we combat that (or even if we should). After all, the
>>>>>>> spec clearly states that the metadata cannot be protected against
>>>>>>> malicious writes. The logic that you pointed out earlier is to handle
>>>>>>> the scenario where the primary partition got updated, but the
>>>>>>> secondary did not.
>>>>>>
>>>>>> I don't disagree with you.
>>>>>>
>>>>>> Scenario here is that primary partition is updated with incorrect data content
>>>>>> (but still correct CRC). This is correctly detected that data is not right.
>>>>>> It means primary partition is wrong and shouldn't be used and it is not used.
>>>>>>
>>>>>> But because it has correct CRC syncup code logic is doing syncup.
>>>>>>
>>>>>> That's why I think we have two codes which have pretty much independent logic
>>>>>> what to do. If primary is wrong sync should be from secondary to primary but
>>>>>> that's not what code is doing.
>>>>>
>>>>> The scenario of "primary is wrong" will only play out in case of
>>>>> corruption in the metadata. The scenario that you highlight, that of
>>>>> the metadata being wrong, but the CRC check succeeds will only happen
>>>>> when the metadata has been maliciously written --
>>>>
>>>> yes. And I did it on purpose to check your code and all error condition you are
>>>> checking to make sure that code traps them.
>>>
>>> The code should detect and fix any genuine error conditions that might
>>> come up in the field. Even the check for the case where the two
>>> metadata copies are valid but different is a scenario that can
>>> actually happen when updating the metadata copies, and there is a
>>> reason why we are assuming that the primary is the correct copy and
>>> should be used to restore the secondary. If there are any valid
>>> scenarios that are not being handled, they should be identified and
>>> fixed.
>>>
>>>>
>>>>> if not, then the
>>>>> secondary partition should also be written with the wrong metadata,
>>>>> and the checks would then fail. In the case of the primary metadata
>>>>> getting corrupted, it does get restored from the secondary partition,
>>>>> assuming that the secondary partition is intact.
>>>>
>>>> When primary is cleared/has incorrect CRC there is no problem and recovery
>>>> happens but if CRC is right but data inside wrong code doesn't check it.
>>>
>>> This is where I think that this is not a genuine "error" case. So, if
>>> someone has written a metadata copy with wrong data but right CRC, and
>>> only written this bad copy to the primary partition, how do we trust
>>> that the secondary copy is actually correct ? One can generate a
>>> secondary metadata copy where the primary checks pass, but the image
>>> information in the metadata is incorrect.
>>
>> If U-Boot knows that data is not correct, based on it's check (in this case one
>> incorrect field), you should never use it. If second also won't pass this check
>> U-Boot shouldn't use it too.
> 
> The fact that we are trying to counter incorrect provisioning of
> metadata, which can well be a malicious act of writing incorrect
> data/correct CRC copy only to one partition seems like an anomaly
> which should not be handled. I would still be okay if this were a
> fool-proof way of fixing the board, but it isn't. Like I mentioned
> earlier, we can still have a secondary copy of the metadata with wrong
> data/correct CRC, and which does not get detected in the initial
> checks, as the top level structure fields are correct, but the image
> information is not.

Partially agree. You are checking in fwu_mdata_sanity_checks and even data is 
matching u-boot configuration. You are checking now num_banks, num_images. guid 
should be also known to u-boot and I can't see the reason why this functions 
can't check it too.

And more and more I am looking at the code more things that current 
fwu_mdata_sanity_checks should be called before you mark parts_ok[] = true.

>>
>> If secondary copy pass all your tests you trust that it is fine. But if not,
>> then you don't trust it.
> 
> But we cannot detect that the secondary copy is passing all the tests,
> as the issue could well be with the image description, and that would
> be detected only when an update is attempted, or the user identifies
> it (possibly) using the fwu_mdata_read command. And before that, the
> primary copy would be restored with the incorrect secondary copy, as
> part of the metadata read function.
> 
> What I think can be done is, as a middle ground, we can instead have
> this as a platform policy. So we add a function pointer for this, and
> when such a scenario is detected, the platform can have a callback
> which then restores the primary copy from the secondary. What are your
> thoughts on this ?

I have to think about this more (not on Friday afternoon).

Cheers,
Michal