Re: [HACKERS] PANIC in pg_commit_ts slru after crashes - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: [HACKERS] PANIC in pg_commit_ts slru after crashes
Date
Msg-id CAMkU=1xqfE3=O8v7AexGk+L17+A9dwRyhW8QJ=k-cuC7Gi=vWg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] PANIC in pg_commit_ts slru after crashes  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: [HACKERS] PANIC in pg_commit_ts slru after crashes  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Fri, Apr 14, 2017 at 9:33 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:


Since all those offsets fall on a page boundary, my guess is that we're somehow failing to handle a new page correctly.

Looking at the patch itself, my feeling is that the following code in src/backend/access/transam/twophase.c might be causing the problem. 

1841 
1842     /* update nextXid if needed */
1843     if (TransactionIdFollowsOrEquals(maxsubxid, ShmemVariableCache->nextXid))
1844     {
1845         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
1846         ShmemVariableCache->nextXid = maxsubxid;
1847         TransactionIdAdvance(ShmemVariableCache->nextXid);
1848         LWLockRelease(XidGenLock);
1849     }

The function PrescanPreparedTransactions() gets called at the start of the redo recovery and this specific block will get exercised irrespective of whether there are any prepared transactions or not. What I find particularly wrong here is that we are initialising maxsubxid to current value of ShmemVariableCache->nextXid when the function enters, but this block would then again increment ShmemVariableCache->nextXid, when there are no prepared transactions in the system.

I wonder if we should do as in attached patch.

That solves it for me.

Thanks,

Jeff

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] logical replication launcher crash on buildfarm
Next
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)