Re: [HACKERS] PANIC in pg_commit_ts slru after crashes - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] PANIC in pg_commit_ts slru after crashes
Date
Msg-id CAB7nPqQfU=ULOo7u=tnPyVFJsdDgGkdSP76TQYFcw=TYrhBgQQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] PANIC in pg_commit_ts slru after crashes  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [HACKERS] PANIC in pg_commit_ts slru after crashes  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Sun, Apr 16, 2017 at 6:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 15 April 2017 at 21:30, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Fri, Apr 14, 2017 at 9:33 PM, Pavan Deolasee <pavan.deolasee@gmail.com>
>> wrote:
>>
>>>
>>> Since all those offsets fall on a page boundary, my guess is that we're
>>> somehow failing to handle a new page correctly.
>>>
>>> Looking at the patch itself, my feeling is that the following code in
>>> src/backend/access/transam/twophase.c might be causing the problem.
>>>
>>> 1841
>>> 1842     /* update nextXid if needed */
>>> 1843     if (TransactionIdFollowsOrEquals(maxsubxid,
>>> ShmemVariableCache->nextXid))
>>> 1844     {
>>> 1845         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
>>> 1846         ShmemVariableCache->nextXid = maxsubxid;
>>> 1847         TransactionIdAdvance(ShmemVariableCache->nextXid);
>>> 1848         LWLockRelease(XidGenLock);
>>> 1849     }
>>>
>>> The function PrescanPreparedTransactions() gets called at the start of the
>>> redo recovery and this specific block will get exercised irrespective of
>>> whether there are any prepared transactions or not. What I find particularly
>>> wrong here is that we are initialising maxsubxid to current value of
>>> ShmemVariableCache->nextXid when the function enters, but this block would
>>> then again increment ShmemVariableCache->nextXid, when there are no prepared
>>> transactions in the system.
>>>
>>> I wonder if we should do as in attached patch.
>>
>>
>> That solves it for me.
>
> Thanks for patching and testing. I'll review and probably commit tomorrow.

Actually I think that the fix proposed is not correct as we should
just never take the risk to call TransactionIdAdvance() if there are
no 2PC files to scan, and it adds more difficulty in understanding
something that the 2PC restore code has introduced and already made
harder to catch (2nd thoughts and regrets here). If you look closely
at the code, ProcessTwoPhaseBuffer() uses *result and *maxsubid only
for PrescanPreparedTransactions() so there is a link between both.
Attached is a patch to rework ProcessTwoPhaseBuffer() by removing
those two arguments and replace them by a boolean to decide if the
next cached transaction ID should be updated or not. The cached next
TXID gets updated only if caller of ProcessTwoPhaseBuffer() wants to
do so and if the subxid scanned follows the XID of the scanned 2PC
file. This looks safer to me in the long run.

Jeff, does this patch make the situation better? The fix is rather
simple as it just makes sure that the next XID never gets updated if
there are no 2PC files.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Next
From: Noah Misch
Date:
Subject: Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?