[PATCH 1/2] rockchip: Add initial RK3582 support

Fri Dec 13 17:46:01 CET 2024

Hi Jonas,

On 12/13/24 4:20 PM, Jonas Karlman wrote:
> Hi Quentin,
> 
> On 2024-12-13 14:43, Quentin Schulz wrote:
>> Hi Jonas,
>>
>> On 12/10/24 11:23 PM, Jonas Karlman wrote:
>>> The RK3582 SoC is a variant of the RK3588S with some IP blocks disabled.
>>> What blocks are disabled/non-working is indicated by ip-state in OTP.
>>>
>>> This add initial support for RK3582 by using ft_system_setup() to mark
>>> any cpu and/or gpu node with status=fail as indicated by ip-state.
>>>
>>> This apply same policy as vendor U-Boot for RK3582, i.e. two big cpu
>>> cores and the gpu is always failed/disabled.
>>>
>>> Enable Kconfig option OF_SYSTEM_SETUP in board defconfig to make use of
>>> the required DT fixups for RK3582 board variants.
>>>
>>> Signed-off-by: Jonas Karlman <jonas at kwiboo.se>
>>> ---
>>>    arch/arm/mach-rockchip/rk3588/rk3588.c | 128 +++++++++++++++++++++++++
>>>    1 file changed, 128 insertions(+)
>>>
>>> diff --git a/arch/arm/mach-rockchip/rk3588/rk3588.c b/arch/arm/mach-rockchip/rk3588/rk3588.c
>>> index c1dce3ee3703..06e6318312b2 100644
>>> --- a/arch/arm/mach-rockchip/rk3588/rk3588.c
>>> +++ b/arch/arm/mach-rockchip/rk3588/rk3588.c
>>> @@ -7,6 +7,7 @@
>>>    #define LOG_CATEGORY LOGC_ARCH
>>>    
>>>    #include <dm.h>
>>> +#include <fdt_support.h>
>>>    #include <misc.h>
>>>    #include <spl.h>
>>>    #include <asm/armv8/mmu.h>
>>> @@ -185,6 +186,12 @@ int arch_cpu_init(void)
>>>    
>>>    #define RK3588_OTP_CPU_CODE_OFFSET		0x02
>>>    #define RK3588_OTP_SPECIFICATION_OFFSET		0x06
>>> +#define RK3588_OTP_IP_STATE_OFFSET		0x1d
>>> +
>>> +#define BAD_CPU_CLUSTER0		GENMASK(3, 0)
>>> +#define BAD_CPU_CLUSTER1		GENMASK(5, 4)
>>> +#define BAD_CPU_CLUSTER2		GENMASK(7, 6)
>>> +#define BAD_GPU				GENMASK(4, 1)
>>>    
>>>    int checkboard(void)
>>>    {
>>> @@ -230,3 +237,124 @@ int checkboard(void)
>>>    
>>>    	return 0;
>>>    }
>>> +
>>> +#ifdef CONFIG_OF_SYSTEM_SETUP
>>> +static int fdt_path_del_node(void *fdt, const char *path)
>>> +{
>>> +	int nodeoffset;
>>> +
>>> +	nodeoffset = fdt_path_offset(fdt, path);
>>> +	if (nodeoffset < 0)
>>> +		return nodeoffset;
>>> +
>>> +	return fdt_del_node(fdt, nodeoffset);
>>> +}
>>> +
>>> +static int fdt_path_set_name(void *fdt, const char *path, const char *name)
>>> +{
>>> +	int nodeoffset;
>>> +
>>> +	nodeoffset = fdt_path_offset(fdt, path);
>>> +	if (nodeoffset < 0)
>>> +		return nodeoffset;
>>> +
>>> +	return fdt_set_name(fdt, nodeoffset, name);
>>> +}
>>> +
>>> +int ft_system_setup(void *blob, struct bd_info *bd)
>>> +{
>>> +	static const char * const cpu_node_names[] = {
>>> +		"cpu at 0", "cpu at 100", "cpu at 200", "cpu at 300",
>>> +		"cpu at 400", "cpu at 500", "cpu at 600", "cpu at 700",
>>> +	};
>>> +	u8 cpu_code[2], ip_state[3];
>>> +	int parent, node, i, ret;
>>> +	struct udevice *dev;
>>> +
>>> +	if (!IS_ENABLED(CONFIG_ROCKCHIP_OTP) || !CONFIG_IS_ENABLED(MISC))
>>> +		return -ENOSYS;
>>> +
>>> +	ret = uclass_get_device_by_driver(UCLASS_MISC,
>>> +					  DM_DRIVER_GET(rockchip_otp), &dev);
>>> +	if (ret) {
>>> +		log_debug("Could not find otp device, ret=%d\n", ret);
>>> +		return ret;
>>> +	}
>>> +
>>> +	/* cpu-code: SoC model, e.g. 0x35 0x82 or 0x35 0x88 */
>>> +	ret = misc_read(dev, RK3588_OTP_CPU_CODE_OFFSET, cpu_code, 2);
>>> +	if (ret < 0) {
>>> +		log_debug("Could not read cpu-code, ret=%d\n", ret);
>>> +		return ret;
>>> +	}
>>> +
>>> +	log_debug("cpu-code: %02x %02x\n", cpu_code[0], cpu_code[1]);
>>> +
>>> +	/* skip on rk3588 */
>>> +	if (cpu_code[0] == 0x35 && cpu_code[1] == 0x88)
>>> +		return 0;
>>> +
>>> +	ret = misc_read(dev, RK3588_OTP_IP_STATE_OFFSET, &ip_state, 3);
>>
>> Do you have information about what;s in the third byte? It's not used in
>> this patch series for example, so wondering if it really makes sense to
>> read it and dump it?
> 
> To my knowledge information about rkvenc is stored in the third byte.
> 
> Vendor U-Boot [1] lists following:
> - ip_state[0]: bit0~7 - cpu_mask
> - ip_state[2]: bit0,2 - rkvenc_mask
> - ip_state[1]: bit6,7 - rkvdec_mask
> - ip_state[1]: bit1~4 - gpu_mask
> 
> However, vendor U-Boot also #if out the entire gpu_mask so not fully
> sure if that is correct or even filled in.
> 
> [1] https://github.com/rockchip-linux/u-boot/blob/next-dev/arch/arm/mach-rockchip/rk3588/rk3588.c#L1393-L1404
> 

Yeah, seems to be entirely ignored and the GPU always removed for RK3582 
but kept on RK3583.

>>
>>> +	if (ret < 0) {
>>> +		log_debug("Could not read ip-state, ret=%d\n", ret);
>>> +		return ret;
>>> +	}
>>> +
>>> +	log_debug("ip-state: %02x %02x %02x\n",
>>> +		  ip_state[0], ip_state[1], ip_state[2]);
>>> +
>>> +	if (cpu_code[0] == 0x35 && cpu_code[1] == 0x82) {
>>> +		/* policy: always disable gpu */
>>> +		ip_state[1] |= BAD_GPU;
>>> +
>>> +		/* policy: always disable one big core cluster */
>>> +		if (!(ip_state[0] & (BAD_CPU_CLUSTER1 | BAD_CPU_CLUSTER2)))
>>> +			ip_state[0] |= BAD_CPU_CLUSTER2;
>>
>> That's a very interesting choice, wondering what's prompted this
>> decision (which I assume comes from vendor tree :) ).
> 
> Not sure myself, my rk3582 boards only have a single rkvdec, kkvenc or
> cpu core marked as bad, have not seen any board with multiple cores
> marked as bad.
> 
> I did a simple tests with only disable the exact cores that was marked
> as bad on the board with a bad cpu core and left all 8 cores enabled on
> the other boards and that seem to run fine.
> 
> Could be related to marketing or thermal, the policy was copied from
> vendor as-is.
> 

I misread the code and thought we only disabled the cluster but this 
actually disables all cores in the second big cluster (and the cluster 
itself). So my comments about removing the cluster node and not 
disabling the cores inside the cluster (which doesn't exist anymore) 
were incorrect.

>>
>>> +	}
>>> +
>>> +	if (ip_state[0] & BAD_CPU_CLUSTER1) {
>>> +		/* fail entire cluster when one or more core is bad */
>>> +		ip_state[0] |= BAD_CPU_CLUSTER1;
>>> +		fdt_path_del_node(blob, "/cpus/cpu-map/cluster1");
>>> +	}
>>> +
>>> +	if (ip_state[0] & BAD_CPU_CLUSTER2) {
>>> +		/* fail entire cluster when one or more core is bad */
>>> +		ip_state[0] |= BAD_CPU_CLUSTER2;
>>> +		fdt_path_del_node(blob, "/cpus/cpu-map/cluster2");
>>> +	} else if (ip_state[0] & BAD_CPU_CLUSTER1) {
>>> +		/* cluster nodes must be named in a continuous series */
>>> +		fdt_path_set_name(blob, "/cpus/cpu-map/cluster2", "cluster1");
>>> +	}
>>> +
>>
>> Nitpick on readability (to me at least, matter of a taste :) )
>>
>> Could suggest:
>>
>> if (ip_state[0] & BAD_CPU_CLUSTER1) {
>>       /* cluster nodes must be named in a continuous series */
>>       fdt_path_set_name(blob, "/cpus/cpu-map/cluster2", "cluster1");
>>
>>       if (!(ip_state[0] & BAD_CPU_CLUSTER2))
>>           /* cluster nodes must be named in a continuous series */
>>           fdt_path_set_name(blob, "/cpus/cpu-map/cluster2", "cluster1");
>> }
>>
>> if (ip_state[0] & BAD_CPU_CLUSTER2) {
>>       /* fail entire cluster when one or more core is bad */
>>       ip_state[0] |= BAD_CPU_CLUSTER2;
>>       fdt_path_del_node(blob, "/cpus/cpu-map/cluster2");
>> }
> 
> Main reason for separation with above rk3582 policy handling was because
> I added the rk3582 policy last second, and it also made it possible to
> easily remove the entire cpu_code == 3582 block if one wants to use all
> cores not marked as bad.
> 
> Also in the vendor U-Boot there is a mention and different policy for a
> RK3583 variant, so current separation may be preferred but will try to
> add some comments in a v2.
> 

Up to you.

I would do:

1. For RK3582, detect if at least one core in a big cluster is broken, 
then mark all cores in the cluster as broken. Do that for both clusters.

2. If all cores in a cluster are broken, remove the cluster and if 
cluster2 still exists, rename to cluster1.

3. Disable the cores according to the bitmask.

The logic hopefully should be similar for all other binning versions of 
RK3582 and if someone wants to ifdef/remove the RK3582 logic of 
disabling the whole cluster, it would be simple without changing the 
rest of the code.

>>
>> For CPU clusters, it seems the bitmask matches how many cores there are
>> in a cluster. Right now we are removing the whole cluster even if only
>> one core in the cluster is broken. But I see later that we only disable
>> the cores marked as bad. The dt-binding says clusters are for one or
>> more cores, so maybe we are too eager here to remove a cluster even if
>> not all cores in the cluster are broken? Do you know what are the
>> possible side effects of removing a cluster but still having one of its
>> cores enabled in DT?
> 
> The code do ip_state[0] |= BAD_CPU_CLUSTER2 when one of the cores in
> cluster 2 is marked as bad, so both cores will be marked as failed.
> 

Yes, misread, sorry for the noise.

> Howerver, the code should probably just remove the cluster if both cores
> are bad, should make it easier if one want to test without any of the
> policy. Will try to re-work this a little bit to make it more readable
> and easy to just comment out any of the policy lines.
> 

Agreed. Cleaner separation of the logic for the cluster removal and the 
core removal.

>>
>>> +	/* gpu: ip_state[1]: bit1~4 */
>>> +	if (ip_state[1] & BAD_GPU) {
>>> +		log_debug("fail gpu\n");
>>> +		fdt_status_fail_by_pathf(blob, "/gpu at fb000000");
>>> +	}
>>> +
>>
>> Here I'm a bit perplexed. We don't define multiple cores in the GPU node
>> on Linux kernel IIRC. I have not a single clue why the bitmask for the
>> GPU is 3 bits, do you have any idea or information?
> 
> The gpu_mask in vendor tree is not used and the gpu is instead always
> disabled for RK3582, and left enabled for RK3583.
> 
> Was not sure if we should try to use the ip-state values or just, so I
> went with a similar approach as for cpu cores, fill all bits in policy
> part and check for any bit in the fail part.
> 

If we still want to read it, we could print its value but zero out the 
bitmask anyway by default. Then force-set it for RK3582.

>>
>>> +	parent = fdt_path_offset(blob, "/cpus");
>>> +	if (parent < 0) {
>>> +		log_debug("Could not find /cpus, parent=%d\n", parent);
>>
>> Any reason for not making this an error message? Are we expecting this
>> code path to be hit in "normal" behavior?
> 
> I do not think it should be possible, maybe only if the code where to
> run on the xPL control FDT.
> 

It depends on OTP and MISC symbols being defined, so probably fine. 
Also, we may actually want to have this working in xPL so that U-Boot 
proper/TF-A/OP-TEE gets the appropriate DTB.

> The return 0 is probably just something left from the early code I
> tested where it just printed what should happen. Will check what happen
> if we return an error here and adjust for v2.
> 

Hehe, didn't even notice that one :)

Otherwise, looks ok to me.

Cheers,
Quentin