Thread: Some code cleanup for pgbench and pg_verifybackup

Some code cleanup for pgbench and pg_verifybackup

From

Michael Paquier

Date:

26 July 2021, 07:01:08

Hi all,

While looking at the code areas of $subject, I got surprised about a
couple of things:
- pgbench has its own parsing routines for int64 and double, with
an option to skip errors.  That's not surprising in itself, but, for
strtodouble(), errorOK is always true, meaning that the error strings
are dead.  For strtoint64(), errorOK is false only when parsing a
Variable, where a second error string is generated.  I don't really
think that we need to be that pedantic about the type of errors
generated in those code paths when failing to parse a variable, so I'd
like to propose a simplification of the code where we reuse the same
error message as for double, cutting a bit the number of translatable
strings.
- pgbench and pg_verifybackup make use of pg_log_fatal() to report
some failures in code paths dedicated to command-line options, which
is inconsistent with all the other tools that use pg_log_error().

Please find attached a patch to clean up all those inconsistencies.

Thoughts?
--
Michael

Attachment

Re: Some code cleanup for pgbench and pg_verifybackup

From

Fabien COELHO

Date:

26 July 2021, 08:13:27

Bonjour Michaël,

My 0.02€:

> - pgbench has its own parsing routines for int64 and double, with
> an option to skip errors.  That's not surprising in itself, but, for
> strtodouble(), errorOK is always true, meaning that the error strings
> are dead.

Indeed. However, there are "atof" calls for option parsing which should 
rather use strtodouble, and that may or may not call it with errorOk as 
true or false, it may depend.

> For strtoint64(), errorOK is false only when parsing a Variable, where a 
> second error string is generated.

ISTM that it just returns false, there is no message about the parsing 
error, hence the message is generated in the function.

> I don't really think that we need to be that pedantic about the type of 
> errors generated in those code paths when failing to parse a variable, 
> so I'd like to propose a simplification of the code where we reuse the 
> same error message as for double, cutting a bit the number of 
> translatable strings.

ISTM that point is that errors from the parser are handled differently (by 
calling some "yyerror" function which do different things), so they need a 
special call for that.

For other cases we would not to have to replicate generating an error 
messages for each caller, so it is best done directly in the function. Ok, 
currently there is only one call, but there can be more, eg I have a 
not-yet submitted patch to add a new option which will need to parse an 
int64.

> - pgbench and pg_verifybackup make use of pg_log_fatal() to report
> some failures in code paths dedicated to command-line options, which
> is inconsistent with all the other tools that use pg_log_error().

The semantics for fatal vs error is that an error is somehow handled while 
a fatal is not. If the log message is followed by an cold exit, ISTM that 
fatal is the right call, and I cannot help if other commands do not do 
that. ISTM more logical to align other commands to fatal when appropriate.

> Thoughts?

I'd be in favor of letting it as it is.

-- 
Fabien.

Re: Some code cleanup for pgbench and pg_verifybackup

From

Alvaro Herrera

Date:

26 July 2021, 19:35:29

On 2021-Jul-26, Fabien COELHO wrote:

> > - pgbench and pg_verifybackup make use of pg_log_fatal() to report
> > some failures in code paths dedicated to command-line options, which
> > is inconsistent with all the other tools that use pg_log_error().
> 
> The semantics for fatal vs error is that an error is somehow handled while a
> fatal is not. If the log message is followed by an cold exit, ISTM that
> fatal is the right call, and I cannot help if other commands do not do that.
> ISTM more logical to align other commands to fatal when appropriate.

I was surprised to discover a few weeks ago that pg_log_fatal() did not
terminate the program, which was my expectation.  If every single call
to pg_log_fatal() is followed by exit(1), why not make pg_log_fatal()
itself exit?  Apparently this coding pattern confuses many people -- for
example pg_verifybackup.c lines 291ff fail to exit(1) after "throwing" a
fatal error, while the block at lines 275 does the right thing.

(In reality we cannot literally just exit(1) in pg_log_fatal(), because
there are quite a few places that do some other thing after the log
call and before exit(1), or terminate the program in some other way than
exit().)

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)

Re: Some code cleanup for pgbench and pg_verifybackup

From

Michael Paquier

Date:

26 July 2021, 23:53:52

On Mon, Jul 26, 2021 at 03:35:29PM -0400, Alvaro Herrera wrote:
> On 2021-Jul-26, Fabien COELHO wrote:
>> The semantics for fatal vs error is that an error is somehow handled while a
>> fatal is not. If the log message is followed by an cold exit, ISTM that
>> fatal is the right call, and I cannot help if other commands do not do that.
>> ISTM more logical to align other commands to fatal when appropriate.

I disagree.  pgbench is an outlier here.  There are 71 calls to
pg_log_fatal() in src/bin/, and pgbench counts for 54 of them.  It
would be more consistent to align pgbench with the others.

> I was surprised to discover a few weeks ago that pg_log_fatal() did not
> terminate the program, which was my expectation.  If every single call
> to pg_log_fatal() is followed by exit(1), why not make pg_log_fatal()
> itself exit?  Apparently this coding pattern confuses many people -- for
> example pg_verifybackup.c lines 291ff fail to exit(1) after "throwing" a
> fatal error, while the block at lines 275 does the right thing.

I remember having the same reaction when those logging APIs got
introduced (I may be wrong here), and I also recall that this point
has been discussed, where the conclusion was that the logging should
never exit() directly.

> (In reality we cannot literally just exit(1) in pg_log_fatal(), because
> there are quite a few places that do some other thing after the log
> call and before exit(1), or terminate the program in some other way than
> exit().)

Yes, that's obviously wrong.  I am wondering if there is more of
that.
--
Michael

Attachment

signature.asc

Re: Some code cleanup for pgbench and pg_verifybackup

From

Fabien COELHO

Date:

27 July 2021, 04:36:15

Bonjour Michaël-san,

>>> The semantics for fatal vs error is that an error is somehow handled while a
>>> fatal is not. If the log message is followed by an cold exit, ISTM that
>>> fatal is the right call, and I cannot help if other commands do not do that.
>>> ISTM more logical to align other commands to fatal when appropriate.
>
> I disagree.  pgbench is an outlier here.  There are 71 calls to
> pg_log_fatal() in src/bin/, and pgbench counts for 54 of them.  It
> would be more consistent to align pgbench with the others.

I do not understand your disagreement. Do you disagree about the expected 
semantics of fatal? Then why provide fatal if it should not be used?
What is the expected usage of fatal?

I do not dispute that pgbench is a statistical outlier. However, Pgbench 
is somehow special because it does not handle a lot of errors, hence a lot 
of "fatal & exit" pattern is used, and the user has to restart. There are 
76 calls to "exit" from pgbench, but only 23 from psql which is much 
larger. ISTM that most "normal" pg programs won't do that because they are 
nice and try to handle errors.

So for me "fatal" is the right choice before exiting with a non zero 
status, but if "error" is called instead I do not think it matters much, 
you do as you please.

>> I was surprised to discover a few weeks ago that pg_log_fatal() did not
>> terminate the program, which was my expectation.

Mine too.

-- 
Fabien.

Re: Some code cleanup for pgbench and pg_verifybackup

From

Michael Paquier

Date:

27 July 2021, 06:56:01

On Tue, Jul 27, 2021 at 06:36:15AM +0200, Fabien COELHO wrote:
> I do not understand your disagreement. Do you disagree about the expected
> semantics of fatal? Then why provide fatal if it should not be used?
> What is the expected usage of fatal?

I disagree about the fact that pgbench uses pg_log_fatal() in ways
that other binaries don't do.  For example, other things use
pg_log_error() followed by an exit(), but not this code.  I am not
going to fight hard on that, though.

That's a set of inconsistences I bumped into while plugging in
option_parse_int() within pgbench.
--
Michael

Attachment

signature.asc

Re: Some code cleanup for pgbench and pg_verifybackup

From

Fabien COELHO

Date:

27 July 2021, 09:45:07

Hello,

>> I do not understand your disagreement. Do you disagree about the 
>> expected>> semantics of fatal? Then why provide fatal if it should not 
>> be used? What is the expected usage of fatal?
>
> I disagree about the fact that pgbench uses pg_log_fatal() in ways
> that other binaries don't do.

Sure. Then what should be the expected usage of fatal? Doc says:

   * Severe errors that cause program termination.  (One-shot programs may
   * chose to label even fatal errors as merely "errors".  The distinction
   * is up to the program.)

pgbench is consistent with the doc. I prefer fatal for this purpose to 
distinguish these clearly from recoverable errors, i.e. the programs goes 
on despite the error, or at least for some time. I think it is good to 
have such a distinction, and bgpench has many errors and many fatals, 
although maybe some error should be fatal and some fatal should be error…

> For example, other things use pg_log_error() followed by an exit(), but 
> not this code.

Sure.

> I am not going to fight hard on that, though.

Me neither.

> That's a set of inconsistences I bumped into while plugging in 
> option_parse_int()

Which is a very good thing! I have already been bitten by atoi.

-- 
Fabien.

Re: Some code cleanup for pgbench and pg_verifybackup

From

Michael Paquier

Date:

27 July 2021, 10:58:39

On Tue, Jul 27, 2021 at 11:45:07AM +0200, Fabien COELHO wrote:
> Sure. Then what should be the expected usage of fatal? Doc says:
>
>   * Severe errors that cause program termination.  (One-shot programs may
>   * chose to label even fatal errors as merely "errors".  The distinction
>   * is up to the program.)
>
> pgbench is consistent with the doc. I prefer fatal for this purpose to
> distinguish these clearly from recoverable errors, i.e. the programs goes on
> despite the error, or at least for some time. I think it is good to have
> such a distinction, and bgpench has many errors and many fatals, although
> maybe some error should be fatal and some fatal should be error.

Hm?  Incorrect option values are recoverable errors, no?  The root
cause of the problem is the user.  One can note that pg_log_fatal() vs
pg_log_error() results in a score of 54 vs 50 in src/bin/pgbench/, so
I am not quite sure your last statement is true.

>> That's a set of inconsistences I bumped into while plugging in
>> option_parse_int()
>
> Which is a very good thing! I have already been bitten by atoi.

By the way, if you can write a patch that makes use of strtodouble()
for the float option parsing in pgbench with the way you see things,
I'd welcome that.  This is a local change as only pgbench needs to
care about that.
--
Michael

Attachment

signature.asc

Re: Some code cleanup for pgbench and pg_verifybackup

From

Daniel Gustafsson

Date:

27 July 2021, 11:02:34

> On 27 Jul 2021, at 01:53, Michael Paquier <michael@paquier.xyz> wrote:

> ..and I also recall that this point has been discussed, where the conclusion
> was that the logging should never exit() directly.

That's a characteristic of the API which still holds IMO.  If we want
something, it's better to have an explicit exit function which takes a log
string than a log function that exits.

--
Daniel Gustafsson        https://vmware.com/

Re: Some code cleanup for pgbench and pg_verifybackup

From

Michael Paquier

Date:

28 July 2021, 03:18:12

On Tue, Jul 27, 2021 at 08:53:52AM +0900, Michael Paquier wrote:
> On Mon, Jul 26, 2021 at 03:35:29PM -0400, Alvaro Herrera wrote:
>> (In reality we cannot literally just exit(1) in pg_log_fatal(), because
>> there are quite a few places that do some other thing after the log
>> call and before exit(1), or terminate the program in some other way than
>> exit().)
>
> Yes, that's obviously wrong.  I am wondering if there is more of
> that.

I have been looking at that.  Here are some findings:
- In pg_archivecleanup, CleanupPriorWALFiles() does not bother
exit()'ing with an error code.  Shouldn't we worry about reporting
that correctly?  It seems strange to not inform users about paths that
would be broken, as that could bloat the archives without one knowing
about it.
- In pg_basebackup.c, ReceiveTarAndUnpackCopyChunk() would not
exit() when failing to change permissions for non-WIN32.
- pg_recvlogical is missing a failure handling for fstat(), as of
5c0de38.
- pg_verifybackup is in the wrong, as mentioned upthread.

Thoughts?  All that does not seem to enter into the category of things
worth back-patching, except for pg_verifybackup.
--
Michael

Attachment

Re: Some code cleanup for pgbench and pg_verifybackup

From

Fabien COELHO

Date:

28 July 2021, 06:49:12

> [...] Thoughts?

For pgbench it is definitely ok to add the exit. For others the added 
exits look reasonable, but I do not know them intimately enough to be sure 
that it is the right thing to do in all cases.

> All that does not seem to enter into the category of things worth 
> back-patching, except for pg_verifybackup.

Yes.

-- 
Fabien.

Re: Some code cleanup for pgbench and pg_verifybackup

From

Robert Haas

Date:

28 July 2021, 14:28:13

On Tue, Jul 27, 2021 at 11:18 PM Michael Paquier <michael@paquier.xyz> wrote:
> I have been looking at that.  Here are some findings:
> - In pg_archivecleanup, CleanupPriorWALFiles() does not bother
> exit()'ing with an error code.  Shouldn't we worry about reporting
> that correctly?  It seems strange to not inform users about paths that
> would be broken, as that could bloat the archives without one knowing
> about it.
> - In pg_basebackup.c, ReceiveTarAndUnpackCopyChunk() would not
> exit() when failing to change permissions for non-WIN32.
> - pg_recvlogical is missing a failure handling for fstat(), as of
> 5c0de38.
> - pg_verifybackup is in the wrong, as mentioned upthread.
>
> Thoughts?  All that does not seem to enter into the category of things
> worth back-patching, except for pg_verifybackup.

I think all of these are reasonable fixes. In the case of
pg_basebackup, a chmod() failure doesn't necessarily oblige us to give
up and exit; we could presumably still write the data. But it may be
best to give up and exit. The other cases appear to be clear bugs.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Some code cleanup for pgbench and pg_verifybackup

From

Michael Paquier

Date:

29 July 2021, 07:16:59

On Wed, Jul 28, 2021 at 10:28:13AM -0400, Robert Haas wrote:
> I think all of these are reasonable fixes. In the case of
> pg_basebackup, a chmod() failure doesn't necessarily oblige us to give
> up and exit; we could presumably still write the data. But it may be
> best to give up and exit. The other cases appear to be clear bugs.

Yeah, there are advantages in both positions, still it is more natural
to me to not ignore this kind of failures.  Note the inconsistency
with initdb for example.  So, done.
--
Michael

Attachment

signature.asc