Re: Progress reporting for pg_verify_checksums - Mailing list pgsql-hackers

From Michael Banck
Subject Re: Progress reporting for pg_verify_checksums
Date
Msg-id 1553702481.4884.37.camel@credativ.de
Whole thread Raw
In response to Re: Progress reporting for pg_verify_checksums  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: Progress reporting for pg_verify_checksums
List pgsql-hackers
Hi,

thanks again for your review!

Am Mittwoch, den 27.03.2019, 15:34 +0100 schrieb Fabien COELHO:
> Hallo Michael,
> 
> About patch v12:
> 
> Patch applies cleanly, compiles. make check ok, but feature is not tested. 
> Doc build ok.
> 
> Although I'm in favor of it, I'm not sure that the signal think will make 
> it for 12. Maybe it is worth compromising, doing a simple version for now, 
> and resubmitting the signal feature later?

Let's wait one more round. I think that feature is quite useful and
isolated enough that it could stay, but I'll remove it for the next
patch version if needed.

> ISTM that someone suggested 4 Hz was the good eye speed, but you wait for 
> 400 ms, which is 2.5 Hz. What about it?

Uh right, I messed that up, fixed.

> I seems that current_size and total_size are not in the same unit:
> 
>   +   if (current_size > total_size)
>   +      total_size = current_size / MEGABYTES;
> 
> But they should both really be bytes, always.

Yeah, they are both bytes and the "/ MEGABYTES" is wrong, removed.

> I think that the computations should be inverted:
>   - first adjust total_size to current_size if needed
>   - then compute percent
>   - remove the percent adjustement as it is <= 100
>     since current_size <= total_size

Ok, done so.

> I still think that the speed should compute a double to avoid integer 
> rounding errors within the computation. ISTM that rounding should only be 
> done for display in the end.

Ok, also done this way. I decided to print only full MB and not any
further digits as those don't seem very relevant.

> I'm okay with calling the report on each file even if this means every few 
> GB...

For my I/O throttling branch I've backed off to only call the report
every 100 blocks (and at the end of the file). I've added that logic to
this patch as well.

> Someone suggested ETA, and it seems rather simple to do. What about 
> adding it?

I don't know, just computing the ETA is easy of course. But you'd have
to fiddle with seconds/minutes/hours conversions cause the runtime could
be hours. Then, you don't want to have it bumping around weirdly when
your I/O rate changes, cf. https://xkcd.com/612/ That doesn't sounds
simpler than the signal trigger we might get rid of.

I have attached a POC which just prints the remaining seconds. If
somebody wants to contribute a full implementation as a co-author, I'm
happy to include it. Otherwise, I might take a stab at it over the next
days, but it is not on the top of my TODO list.

New patch attached. I've also changed the reporting line so that it
prints a couple of blanks at the end cause I noticed that if the
previously reported line was longer than the current line, then the end
of it is still visible.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pgbench - doCustom cleanup
Next
From: Chapman Flack
Date:
Subject: Re: SET LOCAL ROLE NO RESET -- sandbox transactions