Re: [HACKERS] Speedup twophase transactions - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] Speedup twophase transactions
Date
Msg-id CAB7nPqTUx8quw_J+T_jfxhruTEEZCOd__vMyAUkjd6eMT4oC9g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Speedup twophase transactions  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [HACKERS] Speedup twophase transactions  (Nikhil Sontakke <nikhils@2ndquadrant.com>)
List pgsql-hackers
On Thu, Mar 16, 2017 at 9:25 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Mar 16, 2017 at 7:18 PM, Nikhil Sontakke
> <nikhils@2ndquadrant.com> wrote:
>>> + *      * RecoverPreparedTransactions(),
>>> StandbyRecoverPreparedTransactions()
>>> + *        and PrescanPreparedTransactions() have been modified to go
>>> throug
>>> + *        gxact->inredo entries that have not made to disk yet.
>>>
>>> It seems to me that there should be an initial scan of pg_twophase at
>>> the beginning of recovery, discarding on the way with a WARNING
>>> entries that are older than the checkpoint redo horizon. This should
>>> fill in shmem entries using something close to PrepareRedoAdd(), and
>>> mark those entries as inredo. Then, at the end of recovery,
>>> PrescanPreparedTransactions does not need to look at the entries in
>>> pg_twophase. And that's the case as well of
>>> RecoverPreparedTransaction(). I think that you could get the patch
>>> much simplified this way, as any 2PC data can be fetched directly from
>>> WAL segments and there is no need to rely on scans of pg_twophase,
>>> this is replaced by scans of entries in TwoPhaseState.
>>>
>>
>> I don't think this will work. We cannot replace pg_twophase with shmem
>> entries + WAL pointers. This is because we cannot expect to have WAL entries
>> around for long running prepared queries which survive across checkpoints.
>
> But at the beginning of recovery, we can mark such entries with ondisk
> and inredo, in which case the WAL pointers stored in the shmem entries
> do not matter because the data is already on disk.

Nikhil, do you mind if I try something like that? As we already know
what is the first XID when beginning redo via
ShmemVariableCache->nextXid it is possible to discard 2PC files that
should not be here. What makes me worry is the control of the maximum
number of entries in shared memory. If there are legit 2PC files that
are flushed on disk at checkpoint, you would finish with potentially
more 2PC transactions than what should be possible (even if updates of
max_prepared_xacts are WAL-logged).
-- 
Michael



pgsql-hackers by date:

Previous
From: "Tsunakawa, Takayuki"
Date:
Subject: Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] Radix tree for character conversion