Re: [HACKERS] Speedup twophase transactions - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] Speedup twophase transactions
Date
Msg-id CAB7nPqSjrtRQMYnFKXCnjYcO=ni2VH1C1hDcuOo+_O2qNkmtNw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Speedup twophase transactions  (Nikhil Sontakke <nikhils@2ndquadrant.com>)
Responses Re: [HACKERS] Speedup twophase transactions  (Nikhil Sontakke <nikhils@2ndquadrant.com>)
List pgsql-hackers
On Sat, Mar 11, 2017 at 7:26 PM, Nikhil Sontakke
<nikhils@2ndquadrant.com> wrote:
> Hi David and Michael,
>> It would be great to get this thread closed out after 14 months and many
>> commits.
>
> PFA, latest patch which addresses Michael's comments.

Thanks for the new version. Let's head toward a final patch.

>> +           ereport(WARNING,
>> +                   (errmsg("removing future two-phase state data from
>> memory \"%u\"",
>> +                           xid)));
>> +           PrepareRedoRemove(xid);
>> +           continue
>> Those are not normal (partially because unlink is atomic, but not
>> durable)... But they match the correct coding pattern regarding
>> incorrect 2PC entries... I'd really like to see those switched to a
>> FATAL with unlink() made durable for those calls.
>
> Hmm, not sure what exactly we need to do here. If you look at the prior
> checks, there we already skip on-disk entries. So, typically, the entries
> that we encounter here will be in shmem only.

As long as we don't have an alternative to offer a durable unlink,
let's do nothing then. This is as well consistent with the other code
paths handling corrupted or incorrect 2PC entries.

>> +       /* Deconstruct header */
>> +       hdr = (TwoPhaseFileHeader *) buf;
>> +       Assert(TransactionIdEquals(hdr->xid, xid));
>> +
>> +       if (TransactionIdPrecedes(xid, result))
>> +           result = xid;
>> This portion is repeated three times and could be easily refactored.
>> You could just have a routine that returns the oldes transaction ID
>> used, and ignore the result for StandbyRecoverPreparedTransactions()
>> by casting a (void) to let any kind of static analyzer understand that
>> we don't care about the result for example. Handling for subxids is
>> necessary as well depending on the code path. Spliting things into a
>> could of sub-routines may be more readable as well. There are really a
>> couple of small parts that can be gathered and strengthened.
>
> Have added a new function to do this now. It reads either from disk or
> shared memory and produces error/log messages accordingly as well now.

And that's ProcessTwoPhaseBufferAndReturn in the patch.
ProcessTwoPhaseBuffer may be a better name.

>> +       /*
>> +        * Recreate its GXACT and dummy PGPROC
>> +        */
>> +       gxactnew = MarkAsPreparing(xid, gid,
>> +                               hdr->prepared_at,
>> +                               hdr->owner, hdr->database,
>> +                               gxact->prepare_start_lsn,
>> +                               gxact->prepare_end_lsn);
>> MarkAsPreparing() does not need to be extended with two new arguments.
>> RecoverPreparedTransactions() is used only at the end of recovery,
>> where it is not necessary to look at the 2PC state files from the
>> records. In this code path inredo is also set to false :)
>
> That's not true. We will have entries with inredo set at the end of recovery
> as well. Infact the MarkAsPreparing() call from
> RecoverPreparedTransactions() is the one which will remove these inredo
> entries and convert them into regular entries. We have optimized the
> recovery code path as well.
>
>> +           /* Delete TwoPhaseState gxact entry and/or 2PC file. */
>> +           PrepareRedoRemove(parsed.twophase_xid);
>> Both things should not be present, no? If the file is pushed to disk
>> it means that the checkpoint horizon has already moved.
>>
> PREPARE in redo, followed by a checkpoint, followed by a COMMIT/ROLLBACK. We
> can have both the bits set in this case.

Oh, I see where our thoughts don't overlap. I actually thought that
the shared memory entry and the on-disk file cannot co-exist (or if
you want a file flushed at checkpoint should have its shmem entry
removed). But you are right and I am wrong. In order to have the error
handling done properly if the maximum amount of 2PC transactions is
reached. Still....

>> -           ereport(ERROR,
>> +           /* It's ok to find an entry in the redo/recovery case */
>> +           if (!gxact->inredo)
>> +               ereport(ERROR,
>>                     (errcode(ERRCODE_DUPLICATE_OBJECT),
>>                      errmsg("transaction identifier \"%s\" is already in
>> use",
>>                             gid)));
>> +           else
>> +           {
>> +               found = true;
>> +               break;
>> +           }
>> I would not have thought so.
>
> Since we are using the TwoPhaseState structure to track redo entries, at end
> of recovery, we will find existing entries. Please see my comments above for
> RecoverPreparedTransactions()

This is absolutely not good, because it is a direct result of the
interactions of the first loop of RecoverPreparedTransaction() with
its second loop, and because MarkAsPreparing() can finished by being
called *twice* from the same transaction. I really think that this
portion should be removed and that RecoverPreparedTransactions()
should be more careful when scanning the entries in pg_twophase by
looking up at what exists as well in shared memory, instead of doing
that in MarkAsPreparing().

Here are some more comments:

+       /*
+        * Recreate its GXACT and dummy PGPROC
+        */
+       gxactnew = MarkAsPreparing(xid, gid,
+                               hdr->prepared_at,
+                               hdr->owner, hdr->database,
+                               gxact->prepare_start_lsn,
+                               gxact->prepare_end_lsn);
+
+       Assert(gxactnew == gxact);
Here it would be better to set ondisk to false. This makes the code
more consistent with the previous loop, and the intention clear.

The first loop of RecoverPreparedTransactions() has a lot in common
with its second loop. You may want to refactor a little bit more here.

+/*
+ * PrepareRedoRemove
+ *
+ * Remove the corresponding gxact entry from TwoPhaseState. Also
+ * remove the 2PC file.
+ */
This could be a bit more expanded. The removal of the 2PC does not
happen after removing the in-memory data, it would happen if the
in-memory data is not found.

+MarkAsPreparingInRedo(TransactionId xid, const char *gid,
+               TimestampTz prepared_at, Oid owner, Oid databaseid,
+               XLogRecPtr prepare_start_lsn, XLogRecPtr prepare_end_lsn)
+{
+   GlobalTransaction gxact;
+
+   LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
MarkAsPreparingInRedo is internal to twophase.c. There is no need to
expose it externally and it is just used in PrepareRedoAdd so you
could just group both.
   bool        valid;          /* TRUE if PGPROC entry is in proc array */   bool        ondisk;         /* TRUE if
preparestate file is on disk */
 
+   bool        inredo;         /* TRUE if entry was added via xlog_redo */
We could have a set of flags here, that's the 3rd boolean of the
structure used for a status.
-- 
Michael



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] SQL/JSON in PostgreSQL
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Changing references of password encryption to hashing