Home > mailing lists
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors - Mailing list pgsql-hackers

From	Marina Polyakova
Subject	Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date	July 13, 2017 17:34:14
Msg-id	6ce8613f061001105654673506348c13@postgrespro.ru Whole thread Raw
In response to	Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses	Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
List	pgsql-hackers
Tree view
> Here are a round of comments on the current version of the patch:

Thank you very much again!

> There is a latent issue about what is a transaction. For pgbench a
> transaction is a full script execution.
> For postgresql, it is a statement or a BEGIN/END block, several of
> which may appear in a script. From a retry
> perspective, you may retry from a SAVEPOINT within a BEGIN/END
> block... I'm not sure how to make general sense
> of all this, so this is just a comment without attached action for now.

Yes it is. That's why I wrote several notes about it in documentation 
where there may be a misunderstanding:

+        Transactions with serialization or deadlock failures (or with 
both
+        of them if used script contains several transactions; see
+        <xref linkend="transactions-and-scripts"
+        endterm="transactions-and-scripts-title"> for more information) 
are
+        marked separately and their time is not reported as for skipped
+        transactions.

+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the 
<quote>Transaction</> Actually Performed in 
<application>pgbench</application>?</title>

+    If a transaction has serialization and/or deadlock failures, its
+   <replaceable>time</> will be reported as <literal>serialization 
failure</>,
+   <literal>deadlock failure</>, or
+   <literal>serialization and deadlock failures</>, respectively.   </para>
+  <note>
+   <para>
+     Transactions can have both serialization and deadlock failures if 
the
+     used script contained several transactions.  See
+     <xref linkend="transactions-and-scripts"
+     endterm="transactions-and-scripts-title"> for more information.
+    </para>
+  </note>

+  <note>
+   <para>
+    The number of transactions attempts within the interval can be 
greater than
+    the number of transactions within this interval multiplied by the 
maximum
+    attempts number.  See <xref linkend="transactions-and-scripts"
+    endterm="transactions-and-scripts-title"> for more information.
+   </para>
+  </note>

+       <note>
+         <para>The total sum of per-command failures of each type can 
be greater
+         than the number of transactions with reported failures.
+         See <xref linkend="transactions-and-scripts"
+         endterm="transactions-and-scripts-title"> for more 
information.
+         </para>
+       </note>

And I didn't make rollbacks to savepoints after the failure because they 
cannot help for serialization failures at all: after rollback to 
savepoint a new attempt will be always unsuccessful.

> I would consider using "try/tries" instead of "attempt/attempts" as it
> is shorter. An English native speaker
> opinion would be welcome on that point.

Thank you, I'll change it.

> I'm fine with renaming "is_latencies" to "report_per_command", which
> is more logical & generic.

Glad to hear it!

> "max_attempt_number": I'm against typing fields again in their name,
> aka "hungarian naming". I'd suggest
> "max_tries" or "max_attempts".

Ok!

> "SimpleStats attempts": I disagree with using this floating poiunt
> oriented structures to count integers.
> I would suggest "int64 tries" instead, which should be enough for the 
> purpose.

I'm not sure that it is enough. Firstly it may be several transactions 
in script so to count the average attempts number you should know the 
total number of runned transactions. Secondly I think that stddev for 
attempts number can be quite interesting and often it is not close to 
zero.

> LastBeginState -> RetryState? I'm not sure why this state is a pointer
> in CState. Putting the struct would avoid malloc/free cycles. Index
> "-1" may be used to tell it is not set if necessary.

Thanks, I agree that it's better to do in this way.

> "CSTATE_RETRY_FAILED_TRANSACTION" -> "CSTATE_RETRY" is simpler and 
> clear enough.

Ok!

> In CState and some code, a failure is a failure, maybe one boolean
> would be enough. It need only be differentiated when counting, and you
> have (deadlock_failure || serialization_failure) everywhere.

I agree with you. I'll change it.

> Some variables, such as "int attempt_number", should be in the client
> structure, not in the client? Generally, try to use block variables if
> possible to keep the state clearly disjoints. If there could be NO new
> variable at the doCustom level that would be great, because that would
> ensure that there is no machine state mixup hidden in these variables.

Do you mean the code cleanup for doCustom function? Because if I do so 
there will be two code styles for state blocks and their variables in 
this function..

> I wondering whether the RETRY & FAILURE states could/should be merged:
> 
>   on RETRY:
>     -> count retry
>     -> actually retry if < max_tries (reset client state, jump to 
> command)
>     -> else count failure and skip to end of script
> 
> The start and end of transaction detection seem expensive (malloc,
> ...) and assume a one statement per command (what about "BEGIN \; ...
> \; COMMIT;", which is not necessarily the case, this limitation should
> be documented. ISTM that the space normalization should be avoided,
> and something simpler/lighter should be devised? Possibly it should
> consider handling SAVEPOINT.

I divided these states because if there's a failed transaction block you 
should end it before retrying. It means to go to states 
CSTATE_START_COMMAND -> CSTATE_WAIT_RESULT -> CSTATE_END_COMMAND with 
the appropriate command. How do you propose not to go to these states?

About malloc - I agree with you that it should be done without 
malloc/free.

About savepoints - as I wrote you earlier I didn't make rollbacks to 
savepoints after the failure. Because they cannot help for serialization 
failures at all: after rollback to savepoint a new attempt will be 
always unsuccessful.

> I disagree about exit in ParseScript if the transaction block is not
> completed, especially as it misses out on combined statements/queries
> (BEGIN \; stuff... \; COMMIT") and would break an existing feature.

Thanks, I'll fix it for usual transaction blocks that don't end in the 
scripts.

> There are strange characters things in comments, eg "??ontinuous".

Oh, I'm sorry. I'll fix it too.

> Option "max-attempt-number" -> "max-tries"

> I would put the client random state initialization with the state
> intialization, not with the connection.

> * About tracing
> 
> Progress is expected to be short, not detailed. Only add the number of
> failures and retries if max retry is not 1.

Ok!

> * About reporting
> 
> I think that too much is reported. I advised to do that, but
> nevertheless it is a little bit steep.
> 
> At least, it should not report the number of tries/attempts when the
> max number is one.

Ok!

> Simple counting should be reported for failures,
> not floats...
> 
> I would suggest a more compact one-line report about failures:
> 
>   "number of failures: 12 (0.001%, deadlock: 7, serialization: 5)"

I think, there may be a misunderstanding. Because script can contain 
several transactions and get both failures.

> * About the TAP tests
> 
> They are too expensive, with 3 initdb. I think that they should be
> integrated in the existing tests, as a patch has been submitted to
> rework the whole pgbench tap test infrastructure.
> 
> For now, at most one initdb and several small tests inside.

Ok!

> * About the documentation
> 
> I'm not sure that the feature needs pre-emminence in the
> documentation, because most of the time there is no retry as none is
> needed, there is no failure, so this rather a special (although
> useful) case for people playing with serializable and other advanced
> features.
> 
> Smaller updates, without dedicated examples, should be enough.

Maybe there should be some examples to prepare people what they can see 
in the output of the program? Of course now failures are special cases 
because they disconnect its clients to the end of the program and ruin
all the results. I hope that if this patch is committed there will be 
much more cases with retried failures.

> If a transaction is skipped, there was no tries, so the corresponding
> number of attempts is 0, not one.

Oh, I'm sorry, it is a typo in the documentation.

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
pgsql-hackers by date:
From: Heikki Linnakangas
Date: 13 July 2017, 15:52:44
Subject: Re: [HACKERS] Race condition in GetOldestActiveTransactionId()
From: Marina Polyakova
Date: 13 July 2017, 17:34:30
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors - Mailing list pgsql-hackers

Previous

Next