[PATCH] cmd: mtd: fix speed measurement in the speed benchmark

Tue Aug 26 16:59:59 CEST 2025

On 26/08/2025 at 17:33:40 +03, Mikhail Kshevetskiy <mikhail.kshevetskiy at iopsys.eu> wrote:

> On 26.08.2025 17:23, Miquel Raynal wrote:
>> Hello Mikhail,
>>
>> On 26/08/2025 at 02:48:29 +03, Mikhail Kshevetskiy <mikhail.kshevetskiy at iopsys.eu> wrote:
>>
>>> The shown speed inverse linearly depends on size of data.
>>> See the output:
>>>
>>>   spi-nand: spi_nand nand at 0: Micron SPI NAND was found.
>>>   spi-nand: spi_nand nand at 0: 256 MiB, block size: 128 KiB, page size: 2048, OOB size: 128
>>>   ...
>>>   => mtd read.benchmark spi-nand0 $loadaddr 0 0x40000
>>>   Reading 262144 byte(s) (128 page(s)) at offset 0x00000000
>>>   Read speed: 63kiB/s
>>>   => mtd read.benchmark spi-nand0 $loadaddr 0 0x20000
>>>   Reading 131072 byte(s) (64 page(s)) at offset 0x00000000
>>>   Read speed: 127kiB/s
>>>   => mtd read.benchmark spi-nand0 $loadaddr 0 0x10000
>>>   Reading 65536 byte(s) (32 page(s)) at offset 0x00000000
>>>   Read speed: 254kiB/s
>>>
>>> In the spi-nand case 'io_op.len' is not the same as 'len',
>>> thus we divide a size of the single block on total time.
>>> This is wrong, we should divide on the time for a single
>>> block.
>>>
>>> Signed-off-by: Mikhail Kshevetskiy <mikhail.kshevetskiy at iopsys.eu>
>> Happy to see this is useful :-) But you're totally right, it didn't use
>> the correct length. Maybe I would rephrase a bit the last two sentences
>> to make the commit clearer:
>>
>> "In the spi-nand case 'io_op.len' is not always the same as 'len', thus
>> we are using the wrong amount of data to derive the speed."
>>
>> However, regarding the diff,
>>
>>> @@ -594,9 +594,10 @@ static int do_mtd_io(struct cmd_tbl *cmdtp, int flag, int argc,
>>>  
>>>  	if (benchmark && bench_start) {
>>>  		bench_end = timer_get_us();
>>> +		block_time = (bench_end - bench_start) / (len / io_op.len);
>>>  		printf("%s speed: %lukiB/s\n",
>>>  		       read ? "Read" : "Write",
>>> -		       ((io_op.len * 1000000) / (bench_end - bench_start)) / 1024);
>>> +		       ((io_op.len * 1000000) / block_time) / 1024);
>> Why not just dividing the length by the benchmark time instead of
>> reducing and rounding the denominator in the first place, which I
>> believe makes the final result less precise?
>
> Do we use 64 bit math? If not we may easily get an overflow.
> Actually for 32-bit math it's better use a less precise formula:
> (io_op.len * (1000000/1024)) / block_time; thus we will have about 22
> bit for length.

I considered overflow out of topic (see the v1 of the benchmark
introduction) as we do not run bootloaders for hours. Yes it is
definitely reachable, but for a development/benchmarking tool, I didn't
consider this as a problem.

Thanks,
Miquèl