Re: Progress reporting for pg_verify_checksums - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: Progress reporting for pg_verify_checksums
Date
Msg-id alpine.DEB.2.21.1809290813200.10583@lancre
Whole thread Raw
In response to Re: Progress reporting for pg_verify_checksums  (Michael Banck <michael.banck@credativ.de>)
List pgsql-hackers
>>>> The time() granularity to the second makes the result awkward on small
>>>> tests:
>>>>
>>>> 8/1554552 kB (0%, 8 kB/s)
>>>> 1040856/1554552 kB (66%, 1040856 kB/s)
>>>> 1554552/1554552 kB (100%, 1554552 kB/s)
>>>>
>>>> I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
>>>> much better precision.
>>
>> I still think it would make sense to use that instead of low-precision
>> time().
>
> As the use-case of this is not for small tests,

Even for a long run, the induce error by the 1 second imprecision on both 
points is significant at the beginning of the scan.

> I don't think it is useful to make the code more complicated for this.

I do not think that it would be much more complicated to use the 
portability macros to get a precise time.

>>> The display is exactly the same as in pg_basebackup (except the
>>> bandwith is shown as well), so right now I think it is more useful to be
>>> consistent here.
>>
>> Hmmm... units are units, and the display is wrong. The fact that other
>> commands are wrong as well does not look like a good argument to persist in
>> an error.
>
> I've had a look around, and "kB" seems to be a common unit for 1024
> bytes, e.g. in pg_test_fsync etc., unless I am mistaken?

I can only repeat my above comment: the fact that postgres is wrong 
elsewhere is not a good reason to persist in reproducing an error.


See the mother of all knowledge https://en.wikipedia.org/wiki/Kilobyte

  - SI (decimal, 1000): kB, MB, GB, ...
  - IEC (binary 1024): KiB, MiB, GiB, ...
  - JEDEC (binary 1024, used for memory): KB, MB, GB.

Summary:
  - 1 kB = 1000 bytes
  - 1 KB = 1 KiB = 1024 bytes

Decimal is used for storage (HDD, SSD), binary for memory. That is life, 
and IMHO Postgres code is not the place to invent new units.

Moreover, I still think that the progress should use MB defined as 1000000 
bytes for showing the progress.

>> I would be okay with a progress printing function shared between some
>> commands. It just needs some place. pg_rewind also has a --rewind option.
>
> I guess you mean pg_rewind also has a --progress option.

Indeed.

> I do agree it makes sense to refactor that, but I don't think this
> should be part of this patch.

That's a point. I'd suggest to put the new progress function directly in 
the common part, and other patches could take advantage of it for other 
commands when someone feels like it.

Other comments:

function toggle_progress_report has empty lines after { and before },
but this style is not used elsewhere, I think that they should be removed.

The report_progress API should be ready to be used by another client 
application, which suggest that global variables should be avoided.

Maybe:

   void report_progress(current, total, force)

and the scan start and last times could be a static variable inside the 
function initialized on the first call, which would suggest to call the 
function at the beginning of the scan, probably with current == 0.

Or maybe:

   time_type report_progress(current, total, start, last, force)

Which would return the last time, and the caller has responsability for 
initializing a start time.

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Optional message to user when terminating/cancelling backend
Next
From: Matteo Beccati
Date:
Subject: Re: [HACKERS] kqueue