Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date
Msg-id alpine.DEB.2.21.1808170917510.20841@lancre
Whole thread Raw
In response to Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors  (Marina Polyakova <m.polyakova@postgrespro.ru>)
Responses Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
List pgsql-hackers
Hello Marina,

>> Detailed -r report. I understand from the doc that the retry number on 
>> the detailed per-statement report is to identify at what point errors 
>> occur? Probably this is more or less always at the same point on a 
>> given script, so that the most interesting feature is to report the 
>> number of retries at the script level.
>
> This may depend on various factors.. for example:
> [...]
>        21.239           5          36  UPDATE xy3 SET y = y + :delta WHERE x 
> = :x3;
>        21.360           5          39  UPDATE xy2 SET y = y + :delta WHERE x 
> = :x2;

Ok, not always the same point, and you confirm that it identifies where 
the error is raised which leads to a retry.

> And you can always get the number of retries at the script level from the 
> main report (if only one script is used) or from the report for each script 
> (if multiple scripts are used).

Ok.

>> ISTM that "skipped" transactions are NOT "successful" so there are a 
>> problem with comments. I believe that your formula are probably right, 
>> it has more to do with what is "success". For cnt decomposition, ISTM 
>> that "other transactions" are really "directly successful 
>> transactions".
>
> I agree with you, but I also think that skipped transactions should not be 
> considered errors.

I'm ok with having a special category for them in the explanations, which 
is neither success nor error.

> So we can write something like this:

> All the transactions are divided into several types depending on their 
> execution. Firstly, they can be divided into transactions that we started to 
> execute, and transactions which were skipped (it was too late to execute 
> them). Secondly, running transactions fall into 2 main types: is there any 
> command that got a failure during the last execution of the transaction 
> script or not? Thus

Here is an attempt at having a more precise and shorter version, not sure 
it is much better than yours, though:

"""
Transactions are counted depending on their execution and outcome. First
a transaction may have started or not: skipped transactions occur under 
--rate and --latency-limit when the client is too late to execute them. 
Secondly, a started transaction may ultimately succeed or fail on some 
error, possibly after some retries when --max-tries is not one. Thus
"""

> the number of all transactions =
>  skipped (it was too late to execute them)
>  cnt (the number of successful transactions) +
>  ecnt (the number of failed transactions).
>
> A successful transaction can have several unsuccessful tries before a
> successfull run. Thus
>
> cnt (the number of successful transactions) =
>  retried (they got a serialization or a deadlock failure(s), but were
>           successfully retried from the very beginning) +
>  directly successfull transactions (they were successfully completed on
>                                     the first try).

These above description is clearer for me.

>> I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise 
>> "another" does not make sense yet.
>
> Maybe firstly put a general group, and then special cases?...

I understand it more as a catch all default "none of the above" case.

>> In TState, field "uint32 retries": maybe it would be simpler to count 
>> "tries", which can be compared directly to max tries set in the option?
>
> If you mean retries in CState - on the one hand, yes, on the other hand, 
> statistics always use the number of retries...

Ok.


>> The automaton skips to FAILURE on every possible error. I'm wondering 
>> whether it could do so only on SQL errors, because other fails will 
>> lead to ABORTED anyway? If there is no good reason to skip to FAILURE 
>> from some errors, I'd suggest to keep the previous behavior. Maybe the 
>> good reason is to do some counting, but this means that on eg 
>> metacommand errors now the script would loop over instead of aborting, 
>> which does not look like a desirable change of behavior.
>
> Even in the case of meta command errors we must prepare for CSTATE_END_TX and 
> the execution of the next script: if necessary, clear the conditional stack 
> and rollback the current transaction block.

Seems ok.

>> commandFailed: I'm not thrilled by the added boolean, which is partially
>> redundant with the second argument.
>
> Do you mean that it is partially redundant with the argument "cmd" and, for 
> example, the meta commands errors always do not cause the abortions of the 
> client?

Yes. And also I'm not sure we should want this boolean at all.

> [...]
> If in such cases one command is placed on several lines, ISTM that the code 
> is more understandable if curly brackets are used...

Hmmm. Such basic style changes are avoided because they break 
backpatching, so we try to avoid gratuitous changes unless there is a 
strong added value, which does not seem to be the case here.

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Tatsuro Yamada
Date:
Subject: Re: Fix help option of contrib/oid2name
Next
From: Paul Bonaud
Date:
Subject: Re: Doc patch: pg_upgrade page and checkpoint location consistencywith replicas