Thread: Some code cleanup for pgbench and pg_verifybackup
Hi all, While looking at the code areas of $subject, I got surprised about a couple of things: - pgbench has its own parsing routines for int64 and double, with an option to skip errors. That's not surprising in itself, but, for strtodouble(), errorOK is always true, meaning that the error strings are dead. For strtoint64(), errorOK is false only when parsing a Variable, where a second error string is generated. I don't really think that we need to be that pedantic about the type of errors generated in those code paths when failing to parse a variable, so I'd like to propose a simplification of the code where we reuse the same error message as for double, cutting a bit the number of translatable strings. - pgbench and pg_verifybackup make use of pg_log_fatal() to report some failures in code paths dedicated to command-line options, which is inconsistent with all the other tools that use pg_log_error(). Please find attached a patch to clean up all those inconsistencies. Thoughts? -- Michael
Attachment
Bonjour Michaël, My 0.02€: > - pgbench has its own parsing routines for int64 and double, with > an option to skip errors. That's not surprising in itself, but, for > strtodouble(), errorOK is always true, meaning that the error strings > are dead. Indeed. However, there are "atof" calls for option parsing which should rather use strtodouble, and that may or may not call it with errorOk as true or false, it may depend. > For strtoint64(), errorOK is false only when parsing a Variable, where a > second error string is generated. ISTM that it just returns false, there is no message about the parsing error, hence the message is generated in the function. > I don't really think that we need to be that pedantic about the type of > errors generated in those code paths when failing to parse a variable, > so I'd like to propose a simplification of the code where we reuse the > same error message as for double, cutting a bit the number of > translatable strings. ISTM that point is that errors from the parser are handled differently (by calling some "yyerror" function which do different things), so they need a special call for that. For other cases we would not to have to replicate generating an error messages for each caller, so it is best done directly in the function. Ok, currently there is only one call, but there can be more, eg I have a not-yet submitted patch to add a new option which will need to parse an int64. > - pgbench and pg_verifybackup make use of pg_log_fatal() to report > some failures in code paths dedicated to command-line options, which > is inconsistent with all the other tools that use pg_log_error(). The semantics for fatal vs error is that an error is somehow handled while a fatal is not. If the log message is followed by an cold exit, ISTM that fatal is the right call, and I cannot help if other commands do not do that. ISTM more logical to align other commands to fatal when appropriate. > Thoughts? I'd be in favor of letting it as it is. -- Fabien.
On 2021-Jul-26, Fabien COELHO wrote: > > - pgbench and pg_verifybackup make use of pg_log_fatal() to report > > some failures in code paths dedicated to command-line options, which > > is inconsistent with all the other tools that use pg_log_error(). > > The semantics for fatal vs error is that an error is somehow handled while a > fatal is not. If the log message is followed by an cold exit, ISTM that > fatal is the right call, and I cannot help if other commands do not do that. > ISTM more logical to align other commands to fatal when appropriate. I was surprised to discover a few weeks ago that pg_log_fatal() did not terminate the program, which was my expectation. If every single call to pg_log_fatal() is followed by exit(1), why not make pg_log_fatal() itself exit? Apparently this coding pattern confuses many people -- for example pg_verifybackup.c lines 291ff fail to exit(1) after "throwing" a fatal error, while the block at lines 275 does the right thing. (In reality we cannot literally just exit(1) in pg_log_fatal(), because there are quite a few places that do some other thing after the log call and before exit(1), or terminate the program in some other way than exit().) -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ Essentially, you're proposing Kevlar shoes as a solution for the problem that you want to walk around carrying a loaded gun aimed at your foot. (Tom Lane)
On Mon, Jul 26, 2021 at 03:35:29PM -0400, Alvaro Herrera wrote: > On 2021-Jul-26, Fabien COELHO wrote: >> The semantics for fatal vs error is that an error is somehow handled while a >> fatal is not. If the log message is followed by an cold exit, ISTM that >> fatal is the right call, and I cannot help if other commands do not do that. >> ISTM more logical to align other commands to fatal when appropriate. I disagree. pgbench is an outlier here. There are 71 calls to pg_log_fatal() in src/bin/, and pgbench counts for 54 of them. It would be more consistent to align pgbench with the others. > I was surprised to discover a few weeks ago that pg_log_fatal() did not > terminate the program, which was my expectation. If every single call > to pg_log_fatal() is followed by exit(1), why not make pg_log_fatal() > itself exit? Apparently this coding pattern confuses many people -- for > example pg_verifybackup.c lines 291ff fail to exit(1) after "throwing" a > fatal error, while the block at lines 275 does the right thing. I remember having the same reaction when those logging APIs got introduced (I may be wrong here), and I also recall that this point has been discussed, where the conclusion was that the logging should never exit() directly. > (In reality we cannot literally just exit(1) in pg_log_fatal(), because > there are quite a few places that do some other thing after the log > call and before exit(1), or terminate the program in some other way than > exit().) Yes, that's obviously wrong. I am wondering if there is more of that. -- Michael
Attachment
Bonjour Michaël-san, >>> The semantics for fatal vs error is that an error is somehow handled while a >>> fatal is not. If the log message is followed by an cold exit, ISTM that >>> fatal is the right call, and I cannot help if other commands do not do that. >>> ISTM more logical to align other commands to fatal when appropriate. > > I disagree. pgbench is an outlier here. There are 71 calls to > pg_log_fatal() in src/bin/, and pgbench counts for 54 of them. It > would be more consistent to align pgbench with the others. I do not understand your disagreement. Do you disagree about the expected semantics of fatal? Then why provide fatal if it should not be used? What is the expected usage of fatal? I do not dispute that pgbench is a statistical outlier. However, Pgbench is somehow special because it does not handle a lot of errors, hence a lot of "fatal & exit" pattern is used, and the user has to restart. There are 76 calls to "exit" from pgbench, but only 23 from psql which is much larger. ISTM that most "normal" pg programs won't do that because they are nice and try to handle errors. So for me "fatal" is the right choice before exiting with a non zero status, but if "error" is called instead I do not think it matters much, you do as you please. >> I was surprised to discover a few weeks ago that pg_log_fatal() did not >> terminate the program, which was my expectation. Mine too. -- Fabien.
On Tue, Jul 27, 2021 at 06:36:15AM +0200, Fabien COELHO wrote: > I do not understand your disagreement. Do you disagree about the expected > semantics of fatal? Then why provide fatal if it should not be used? > What is the expected usage of fatal? I disagree about the fact that pgbench uses pg_log_fatal() in ways that other binaries don't do. For example, other things use pg_log_error() followed by an exit(), but not this code. I am not going to fight hard on that, though. That's a set of inconsistences I bumped into while plugging in option_parse_int() within pgbench. -- Michael
Attachment
Hello, >> I do not understand your disagreement. Do you disagree about the >> expected>> semantics of fatal? Then why provide fatal if it should not >> be used? What is the expected usage of fatal? > > I disagree about the fact that pgbench uses pg_log_fatal() in ways > that other binaries don't do. Sure. Then what should be the expected usage of fatal? Doc says: * Severe errors that cause program termination. (One-shot programs may * chose to label even fatal errors as merely "errors". The distinction * is up to the program.) pgbench is consistent with the doc. I prefer fatal for this purpose to distinguish these clearly from recoverable errors, i.e. the programs goes on despite the error, or at least for some time. I think it is good to have such a distinction, and bgpench has many errors and many fatals, although maybe some error should be fatal and some fatal should be error… > For example, other things use pg_log_error() followed by an exit(), but > not this code. Sure. > I am not going to fight hard on that, though. Me neither. > That's a set of inconsistences I bumped into while plugging in > option_parse_int() Which is a very good thing! I have already been bitten by atoi. -- Fabien.
On Tue, Jul 27, 2021 at 11:45:07AM +0200, Fabien COELHO wrote: > Sure. Then what should be the expected usage of fatal? Doc says: > > * Severe errors that cause program termination. (One-shot programs may > * chose to label even fatal errors as merely "errors". The distinction > * is up to the program.) > > pgbench is consistent with the doc. I prefer fatal for this purpose to > distinguish these clearly from recoverable errors, i.e. the programs goes on > despite the error, or at least for some time. I think it is good to have > such a distinction, and bgpench has many errors and many fatals, although > maybe some error should be fatal and some fatal should be error. Hm? Incorrect option values are recoverable errors, no? The root cause of the problem is the user. One can note that pg_log_fatal() vs pg_log_error() results in a score of 54 vs 50 in src/bin/pgbench/, so I am not quite sure your last statement is true. >> That's a set of inconsistences I bumped into while plugging in >> option_parse_int() > > Which is a very good thing! I have already been bitten by atoi. By the way, if you can write a patch that makes use of strtodouble() for the float option parsing in pgbench with the way you see things, I'd welcome that. This is a local change as only pgbench needs to care about that. -- Michael
Attachment
> On 27 Jul 2021, at 01:53, Michael Paquier <michael@paquier.xyz> wrote: > ..and I also recall that this point has been discussed, where the conclusion > was that the logging should never exit() directly. That's a characteristic of the API which still holds IMO. If we want something, it's better to have an explicit exit function which takes a log string than a log function that exits. -- Daniel Gustafsson https://vmware.com/
On Tue, Jul 27, 2021 at 08:53:52AM +0900, Michael Paquier wrote: > On Mon, Jul 26, 2021 at 03:35:29PM -0400, Alvaro Herrera wrote: >> (In reality we cannot literally just exit(1) in pg_log_fatal(), because >> there are quite a few places that do some other thing after the log >> call and before exit(1), or terminate the program in some other way than >> exit().) > > Yes, that's obviously wrong. I am wondering if there is more of > that. I have been looking at that. Here are some findings: - In pg_archivecleanup, CleanupPriorWALFiles() does not bother exit()'ing with an error code. Shouldn't we worry about reporting that correctly? It seems strange to not inform users about paths that would be broken, as that could bloat the archives without one knowing about it. - In pg_basebackup.c, ReceiveTarAndUnpackCopyChunk() would not exit() when failing to change permissions for non-WIN32. - pg_recvlogical is missing a failure handling for fstat(), as of 5c0de38. - pg_verifybackup is in the wrong, as mentioned upthread. Thoughts? All that does not seem to enter into the category of things worth back-patching, except for pg_verifybackup. -- Michael
Attachment
> [...] Thoughts? For pgbench it is definitely ok to add the exit. For others the added exits look reasonable, but I do not know them intimately enough to be sure that it is the right thing to do in all cases. > All that does not seem to enter into the category of things worth > back-patching, except for pg_verifybackup. Yes. -- Fabien.
On Tue, Jul 27, 2021 at 11:18 PM Michael Paquier <michael@paquier.xyz> wrote: > I have been looking at that. Here are some findings: > - In pg_archivecleanup, CleanupPriorWALFiles() does not bother > exit()'ing with an error code. Shouldn't we worry about reporting > that correctly? It seems strange to not inform users about paths that > would be broken, as that could bloat the archives without one knowing > about it. > - In pg_basebackup.c, ReceiveTarAndUnpackCopyChunk() would not > exit() when failing to change permissions for non-WIN32. > - pg_recvlogical is missing a failure handling for fstat(), as of > 5c0de38. > - pg_verifybackup is in the wrong, as mentioned upthread. > > Thoughts? All that does not seem to enter into the category of things > worth back-patching, except for pg_verifybackup. I think all of these are reasonable fixes. In the case of pg_basebackup, a chmod() failure doesn't necessarily oblige us to give up and exit; we could presumably still write the data. But it may be best to give up and exit. The other cases appear to be clear bugs. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Jul 28, 2021 at 10:28:13AM -0400, Robert Haas wrote: > I think all of these are reasonable fixes. In the case of > pg_basebackup, a chmod() failure doesn't necessarily oblige us to give > up and exit; we could presumably still write the data. But it may be > best to give up and exit. The other cases appear to be clear bugs. Yeah, there are advantages in both positions, still it is more natural to me to not ignore this kind of failures. Note the inconsistency with initdb for example. So, done. -- Michael