Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date
Msg-id alpine.DEB.2.21.1809111703590.30244@lancre
Whole thread Raw
In response to Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors  (Marina Polyakova <m.polyakova@postgrespro.ru>)
Responses Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
List pgsql-hackers
Hello Marina,

> Hmm, but we can say the same for serialization or deadlock errors that were 
> not retried (the client test code itself could not run correctly or the SQL 
> sent was somehow wrong, which is also the client's fault), can't we?

I think not.

If a client asks for something "legal", but some other client in parallel 
happens to make an incompatible change which result in a serialization or 
deadlock error, the clients are not responsible for the raised errors, it 
is just that they happen to ask for something incompatible at the same 
time. So there is no user error per se, but the server is reporting its 
(temporary) inability to process what was asked for. For these errors, 
retrying is fine. If the client was alone, there would be no such errors, 
you cannot deadlock with yourself. This is really an isolation issue 
linked to parallel execution.

> Why not handle client errors that can occur (but they may also not 
> occur) the same way? (For example, always abort the client, or 
> conversely do not make aborts in these cases.) Here's an example of such 
> error:

> client 5 got an error in command 1 (SQL) of script 0; ERROR:  division by zero

This is an interesting case. For me we must stop the script because the 
client is asking for something "stupid", and retrying the same won't 
change the outcome, the division will still be by zero. It is the client 
responsability not to ask for something stupid, the bench script is buggy, 
it should not submit illegal SQL queries. This is quite different from 
submitting something legal which happens to fail.

> Maybe we can limit the number of failures in one statement, and abort the 
> client if this limit is exceeded?...

I think this is quite debatable, and that the best option is to leavze 
this point out of the current patch, so that we could have retry on 
serial/deadlock errors.

Then you can submit another patch for a feature about other errors if you 
feel that there is a use case for going on in some cases. I think that the 
previous behavior made sense, and that changing it should only be 
considered as an option. As it involves discussing and is not obvious, 
later is better.

> To get a clue about the actual issue you can use the options 
> --failures-detailed (to find out out whether this is a serialization failure 
> / deadlock failure / other SQL failure / meta command failure) and/or 
> --print-errors (to get the complete error message).

Yep, but for me it should haved stopped immediately, as it did before.

> If you use the option --latency-limit, the time of tries will be limited 
> regardless of the use of the option -t. Therefore ISTM that an unlimited 
> number of tries can be used only if the time of tries is limited by the 
> options -T and/or -L.

Indeed, I'm ok with forbidding unlimitted retries when under -t.

>> I'm not sure that having "--debug" implying this option
>> is useful: As there are two distinct options, the user may be allowed
>> to trigger one or the other as they wish?
>
> I'm not sure that the main debugging output will give a good clue of what's 
> happened without full messages about errors, retries and failures...

I'm more argumenting about letting the user decide what they want.

> These lines are quite long - do you suggest to wrap them this way?

Sure, if it is too long, then wrap.

>> Function getTransactionStatus name does not seem to correspond fully to 
>> what the function does. There is a passthru case which should be either 
>> avoided or clearly commented.
>
> I don't quite understand you - do you mean that in fact this function finds 
> out whether we are in a (failed) transaction block or not? Or do you mean 
> that the case of PQTRANS_INTRANS is also ok?...

The former: although the function is named "getTransactionStatus", it does 
not really return the "status" of the transaction (aka PQstatus()?).

> I tried not to break the limit of 80 characters, but if you think that this 
> is better, I'll change it.

Hmmm. 80 columns, indeed...

>> I'd insist in a comment that "cnt" does not include "skipped" transactions
>> (anymore).
>
> If you mean CState.cnt I'm not sure if this is practically useful because the 
> code uses only the sum of all client transactions including skipped and 
> failed... Maybe we can rename this field to nxacts or total_cnt?

I'm fine with renaming the field if it makes thinks clearer. They are all 
counters, so naming them "cnt" or "total_cnt" does not help much. Maybe 
"succeeded" or "success" to show what is really counted?

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Flexible configuration for full-text search
Next
From: Andres Freund
Date:
Subject: Re: StandbyAcquireAccessExclusiveLock doesn't necessarily