Re: Progress reporting for pg_verify_checksums - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: Progress reporting for pg_verify_checksums
Date
Msg-id alpine.DEB.2.21.1809191549340.8683@lancre
Whole thread Raw
In response to Re: Progress reporting for pg_verify_checksums  (Michael Banck <michael.banck@credativ.de>)
Responses Re: Progress reporting for pg_verify_checksums
List pgsql-hackers
Hallo Michael,

>> This optionally prints the progress of pg_verify_checksums via read
>> kilobytes to the terminal with the new command line argument -P.
>>
>> If -P was forgotten and pg_verify_checksums operates on a large cluster,
>> the caller can send SIGUSR1 to pg_verify_checksums to turn progress
>> status reporting on during runtime.
>
> Version 2 of the patch is attached. This is rebased to master as of
> 422952ee and changes the signal handling to be a toggle as suggested by
> Alvaro.

Patch applies cleanly and compiles.

About tests: "make check" is okay, but ITSM that the command is not 
started once, ever, in any test:-( It is unclear whether any test triggers 
checksums anyway...

The time() granularity to the second makes the result awkward on small 
tests:

  8/1554552 kB (0%, 8 kB/s)
  1040856/1554552 kB (66%, 1040856 kB/s)
  1554552/1554552 kB (100%, 1554552 kB/s)

I'd suggest to reuse the "portability/instr_time.h" stuff which allows a 
much better precision.

The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying 
1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are 
about storage which is usually measured in powers of 1,000, so I'd suggest 
to use thousands.

Another reserve I have is that on any hardware it is likely that there 
will be a lot of kilos, so maybe megas (MB) should be used instead.

I'm wondering whether the bandwidth display should be local (from the last 
display) or global (from the start of the command), but for the last 
forced one. This is an open question.

Maybe it would be nice to show elapsed time and expected completion time 
based on the current speed.

I would be in favor or having this turned on by default, and silenced with 
some option. I'm not sure other people would agree, though, so it is an 
open question as well.

About the code:

  +               if (show_progress)
  +                       show_progress = false;
  +               else
  +                       show_progress = true;

Why not a simple:

     /* invert current state */
     show_progress = !show_progress;

I do not see much advantage in using intermediate string buffers for 
values:

  + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
  +          total_size / 1024);
  + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
  +          current_size / 1024);
  + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
  +          (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
  + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
  +          currentstr, totalstr, total_percent, currspeedstr);

ISTM that just one simple fprintf would be fine:

   fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...",
           formulas for each slot...);

ISTM that this line overwriting trick deserves a comment:

  + if (isatty(fileno(stderr)))
  +   fprintf(stderr, "\r");
  + else
  +   fprintf(stderr, "\n");

And it runs a little amok with "-v".

  + memset(&act, '\0', sizeof(act));

pg uses 0 instead of '\0' everywhere else.

   +   /* Make sure we just report at least once a second */
   +   if ((now == last_progress_update) && !force) return;

The comments seems contradictory, I understand the code makes sure that it 
is "at most" once a second. Pgbench uses "-P XXX" to strigger a progress 
report every XXX second. I'm unsure whether it would be useful to allow 
the user to change the 1 second display interval.

I'm not sure why you removed one unrelated line:

   #include "storage/checksum.h"
   #include "storage/checksum_impl.h"

  -
   static int64 files = 0;
   static int64 blocks = 0;
   static int64 badblocks = 0;


There is a problem in the scan_file code: The added sizeonly parameter is 
not used. It should be removed.

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: doc - add missing documentation for "acldefault"
Next
From: Joe Conway
Date:
Subject: Re: doc - add missing documentation for "acldefault"