Thread: [HACKERS] PANIC in pg_commit_ts slru after crashes

[HACKERS] PANIC in pg_commit_ts slru after crashes

From
Jeff Janes
Date:

In the first statement executed after crash recovery, I sometimes get this error:

PANIC:  XX000: could not access status of transaction 207580505
DETAIL:  Could not read from file "pg_commit_ts/1EF0" at offset 131072: Success.
LOCATION:  SlruReportIOError, slru.c:918
STATEMENT:  create temporary table aldjf (x serial)

Other examples:

PANIC:  XX000: could not access status of transaction 3483853232
DETAIL:  Could not read from file "pg_commit_ts/20742" at offset 237568: Success.
LOCATION:  SlruReportIOError, slru.c:918
STATEMENT:  create temporary table aldjf (x serial)

PANIC:  XX000: could not access status of transaction 802552883
DETAIL:  Could not read from file "pg_commit_ts/779E" at offset 114688: Success.
LOCATION:  SlruReportIOError, slru.c:918
STATEMENT:  create temporary table aldjf (x serial)

Based on the errno, I'm assuming the read was successful but returned the wrong number of bytes (which was zero in the case I saw after changing the code to log short reads).

It then goes through recovery again and the problem does not immediately re-occur if you attempt to connect again.  I don't know why the file size would have changed between attempts.


The problem bisects to the commit:

commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
Author: Simon Riggs <simon@2ndQuadrant.com>
Date:   Tue Apr 4 15:56:56 2017 -0400

    Speedup 2PC recovery by skipping two phase state files in normal path


It is not obvious to me how that is relevant.  My test doesn't use prepared transactions (and leaves the guc at zero), and this commit doesn't touch the slru.c.

I'm attaching the test harness.  There is a patch which injects the crash-faults and also allows xid fast-forward, a perl script that runs until crash and assesses the consistency of the post-crash results, and a shell script which sets up the database and then calls the perl script in a loop.  On 8 CPU machine, it takes about an hour for the PANIC to occur.  

The attached script bails out once it sees the PANIC (count.pl line 158) if it didn't do that then it will proceed to connect again and retry, and works fine.  No apparent permanent data corruption.

Any clues on how to investigate this further?

Cheers,

Jeff
Attachment

Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

From
Pavan Deolasee
Date:


On Sat, Apr 15, 2017 at 12:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

In the first statement executed after crash recovery, I sometimes get this error:

PANIC:  XX000: could not access status of transaction 207580505
DETAIL:  Could not read from file "pg_commit_ts/1EF0" at offset 131072: Success.
LOCATION:  SlruReportIOError, slru.c:918
STATEMENT:  create temporary table aldjf (x serial)

Other examples:

PANIC:  XX000: could not access status of transaction 3483853232
DETAIL:  Could not read from file "pg_commit_ts/20742" at offset 237568: Success.
LOCATION:  SlruReportIOError, slru.c:918
STATEMENT:  create temporary table aldjf (x serial)

PANIC:  XX000: could not access status of transaction 802552883
DETAIL:  Could not read from file "pg_commit_ts/779E" at offset 114688: Success.
LOCATION:  SlruReportIOError, slru.c:918
STATEMENT:  create temporary table aldjf (x serial)

Based on the errno, I'm assuming the read was successful but returned the wrong number of bytes (which was zero in the case I saw after changing the code to log short reads).

It then goes through recovery again and the problem does not immediately re-occur if you attempt to connect again.  I don't know why the file size would have changed between attempts.


The problem bisects to the commit:

commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
Author: Simon Riggs <simon@2ndQuadrant.com>
Date:   Tue Apr 4 15:56:56 2017 -0400

    Speedup 2PC recovery by skipping two phase state files in normal path


It is not obvious to me how that is relevant.  My test doesn't use prepared transactions (and leaves the guc at zero), and this commit doesn't touch the slru.c.

I'm attaching the test harness.  There is a patch which injects the crash-faults and also allows xid fast-forward, a perl script that runs until crash and assesses the consistency of the post-crash results, and a shell script which sets up the database and then calls the perl script in a loop.  On 8 CPU machine, it takes about an hour for the PANIC to occur.  

The attached script bails out once it sees the PANIC (count.pl line 158) if it didn't do that then it will proceed to connect again and retry, and works fine.  No apparent permanent data corruption.

Any clues on how to investigate this further?

Since all those offsets fall on a page boundary, my guess is that we're somehow failing to handle a new page correctly.

Looking at the patch itself, my feeling is that the following code in src/backend/access/transam/twophase.c might be causing the problem. 

1841 
1842     /* update nextXid if needed */
1843     if (TransactionIdFollowsOrEquals(maxsubxid, ShmemVariableCache->nextXid))
1844     {
1845         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
1846         ShmemVariableCache->nextXid = maxsubxid;
1847         TransactionIdAdvance(ShmemVariableCache->nextXid);
1848         LWLockRelease(XidGenLock);
1849     }

The function PrescanPreparedTransactions() gets called at the start of the redo recovery and this specific block will get exercised irrespective of whether there are any prepared transactions or not. What I find particularly wrong here is that we are initialising maxsubxid to current value of ShmemVariableCache->nextXid when the function enters, but this block would then again increment ShmemVariableCache->nextXid, when there are no prepared transactions in the system.

I wonder if we should do as in attached patch.

It's not entirely clear to me why only CommitTS fails and not other slrus. One possible reason could be that CommitTS is started before this function gets called whereas CLOG and SUBTRANS get started after the function returns.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

From
Jeff Janes
Date:
On Fri, Apr 14, 2017 at 9:33 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:


Since all those offsets fall on a page boundary, my guess is that we're somehow failing to handle a new page correctly.

Looking at the patch itself, my feeling is that the following code in src/backend/access/transam/twophase.c might be causing the problem. 

1841 
1842     /* update nextXid if needed */
1843     if (TransactionIdFollowsOrEquals(maxsubxid, ShmemVariableCache->nextXid))
1844     {
1845         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
1846         ShmemVariableCache->nextXid = maxsubxid;
1847         TransactionIdAdvance(ShmemVariableCache->nextXid);
1848         LWLockRelease(XidGenLock);
1849     }

The function PrescanPreparedTransactions() gets called at the start of the redo recovery and this specific block will get exercised irrespective of whether there are any prepared transactions or not. What I find particularly wrong here is that we are initialising maxsubxid to current value of ShmemVariableCache->nextXid when the function enters, but this block would then again increment ShmemVariableCache->nextXid, when there are no prepared transactions in the system.

I wonder if we should do as in attached patch.

That solves it for me.

Thanks,

Jeff

Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

From
Simon Riggs
Date:
On 15 April 2017 at 21:30, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Apr 14, 2017 at 9:33 PM, Pavan Deolasee <pavan.deolasee@gmail.com>
> wrote:
>
>>
>> Since all those offsets fall on a page boundary, my guess is that we're
>> somehow failing to handle a new page correctly.
>>
>> Looking at the patch itself, my feeling is that the following code in
>> src/backend/access/transam/twophase.c might be causing the problem.
>>
>> 1841
>> 1842     /* update nextXid if needed */
>> 1843     if (TransactionIdFollowsOrEquals(maxsubxid,
>> ShmemVariableCache->nextXid))
>> 1844     {
>> 1845         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
>> 1846         ShmemVariableCache->nextXid = maxsubxid;
>> 1847         TransactionIdAdvance(ShmemVariableCache->nextXid);
>> 1848         LWLockRelease(XidGenLock);
>> 1849     }
>>
>> The function PrescanPreparedTransactions() gets called at the start of the
>> redo recovery and this specific block will get exercised irrespective of
>> whether there are any prepared transactions or not. What I find particularly
>> wrong here is that we are initialising maxsubxid to current value of
>> ShmemVariableCache->nextXid when the function enters, but this block would
>> then again increment ShmemVariableCache->nextXid, when there are no prepared
>> transactions in the system.
>>
>> I wonder if we should do as in attached patch.
>
>
> That solves it for me.

Thanks for patching and testing. I'll review and probably commit tomorrow.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

From
Michael Paquier
Date:
On Sun, Apr 16, 2017 at 6:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 15 April 2017 at 21:30, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Fri, Apr 14, 2017 at 9:33 PM, Pavan Deolasee <pavan.deolasee@gmail.com>
>> wrote:
>>
>>>
>>> Since all those offsets fall on a page boundary, my guess is that we're
>>> somehow failing to handle a new page correctly.
>>>
>>> Looking at the patch itself, my feeling is that the following code in
>>> src/backend/access/transam/twophase.c might be causing the problem.
>>>
>>> 1841
>>> 1842     /* update nextXid if needed */
>>> 1843     if (TransactionIdFollowsOrEquals(maxsubxid,
>>> ShmemVariableCache->nextXid))
>>> 1844     {
>>> 1845         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
>>> 1846         ShmemVariableCache->nextXid = maxsubxid;
>>> 1847         TransactionIdAdvance(ShmemVariableCache->nextXid);
>>> 1848         LWLockRelease(XidGenLock);
>>> 1849     }
>>>
>>> The function PrescanPreparedTransactions() gets called at the start of the
>>> redo recovery and this specific block will get exercised irrespective of
>>> whether there are any prepared transactions or not. What I find particularly
>>> wrong here is that we are initialising maxsubxid to current value of
>>> ShmemVariableCache->nextXid when the function enters, but this block would
>>> then again increment ShmemVariableCache->nextXid, when there are no prepared
>>> transactions in the system.
>>>
>>> I wonder if we should do as in attached patch.
>>
>>
>> That solves it for me.
>
> Thanks for patching and testing. I'll review and probably commit tomorrow.

Actually I think that the fix proposed is not correct as we should
just never take the risk to call TransactionIdAdvance() if there are
no 2PC files to scan, and it adds more difficulty in understanding
something that the 2PC restore code has introduced and already made
harder to catch (2nd thoughts and regrets here). If you look closely
at the code, ProcessTwoPhaseBuffer() uses *result and *maxsubid only
for PrescanPreparedTransactions() so there is a link between both.
Attached is a patch to rework ProcessTwoPhaseBuffer() by removing
those two arguments and replace them by a boolean to decide if the
next cached transaction ID should be updated or not. The cached next
TXID gets updated only if caller of ProcessTwoPhaseBuffer() wants to
do so and if the subxid scanned follows the XID of the scanned 2PC
file. This looks safer to me in the long run.

Jeff, does this patch make the situation better? The fix is rather
simple as it just makes sure that the next XID never gets updated if
there are no 2PC files.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

From
Jeff Janes
Date:
On Sun, Apr 16, 2017 at 6:59 PM, Michael Paquier <michael.paquier@gmail.com> wrote:


Jeff, does this patch make the situation better? The fix is rather
simple as it just makes sure that the next XID never gets updated if
there are no 2PC files.

Yes, that fixes the reported case when 2PC are not being used.

Thanks,

Jeff
 

Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

From
Michael Paquier
Date:
On Tue, Apr 18, 2017 at 12:33 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Sun, Apr 16, 2017 at 6:59 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>>
>>
>>
>> Jeff, does this patch make the situation better? The fix is rather
>> simple as it just makes sure that the next XID never gets updated if
>> there are no 2PC files.
>
>
> Yes, that fixes the reported case when 2PC are not being used.

Thanks for the confirmation. I am able to run your test suite by the
way after a couple of tweaks, and I can see the failures. Still
investigating the 2nd report...
-- 
Michael
VMware vCenter Server
www.vmware.com



Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

From
Simon Riggs
Date:
On 17 April 2017 at 16:33, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Sun, Apr 16, 2017 at 6:59 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>>
>>
>>
>> Jeff, does this patch make the situation better? The fix is rather
>> simple as it just makes sure that the next XID never gets updated if
>> there are no 2PC files.
>
>
> Yes, that fixes the reported case when 2PC are not being used.

Thanks Jeff.

I certainly prefer the simplicity of Michael's approach. I'll move to commit.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

From
Simon Riggs
Date:
On 18 April 2017 at 09:51, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 17 April 2017 at 16:33, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Sun, Apr 16, 2017 at 6:59 PM, Michael Paquier <michael.paquier@gmail.com>
>> wrote:
>>>
>>>
>>>
>>> Jeff, does this patch make the situation better? The fix is rather
>>> simple as it just makes sure that the next XID never gets updated if
>>> there are no 2PC files.
>>
>>
>> Yes, that fixes the reported case when 2PC are not being used.
>
> Thanks Jeff.
>
> I certainly prefer the simplicity of Michael's approach. I'll move to commit.

Minor change to patch.

I've added a recheck in ProcessTwoPhaseBuffer() after we acquire the lock.

If its worth acquiring the lock its worth checking we don't have a race.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

From
Michael Paquier
Date:
On Tue, Apr 18, 2017 at 7:05 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I've added a recheck in ProcessTwoPhaseBuffer() after we acquire the lock.
>
> If its worth acquiring the lock its worth checking we don't have a race.

I see. No objections to that.
-- 
Michael