Home > mailing lists

Thread: [HACKERS] Failed recovery with new faster 2PC code

[HACKERS] Failed recovery with new faster 2PC code

From

Jeff Janes

Date:

16 April 2017, 01:37:15

After this commit, I get crash recovery failures when using prepared transactions.

commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71

Author: Simon Riggs <simon@2ndQuadrant.com>

Date: Tue Apr 4 15:56:56 2017 -0400

Speedup 2PC recovery by skipping two phase state files in normal path

After the induced crash, I get this failure in recovery:

FATAL: could not access status of transaction 334419347

DETAIL: Could not open file "pg_xact/013E": No such file or directory.

LOG: startup process (PID 60106) exited with exit code 1

LOG: aborting startup due to startup process failure

LOG: database system is shut down

The earliest file which exists in pg_xact is 0176

Other examples:

FATAL: could not access status of transaction 121729737

DETAIL: Could not open file "pg_xact/0074": No such file or directory.

LOG: startup process (PID 23720) exited with exit code 1

FATAL: could not access status of transaction 181325554

DETAIL: Could not open file "pg_xact/00AC": No such file or directory.

LOG: startup process (PID 8375) exited with exit code 1

I experience this in about 1 out of 15 crash-recovery cycles on 8 CPUs.

The patch Pavan posted here did not make any difference:

https://www.postgresql.org/message-id/CABOikdMdhS4nYX7xHaF+m=P=q_zAJBCYsZ++VN26AZzDRf_xFA@mail.gmail.com

I've attached the test harness, which I think will look familiar to y'all. It is the usual injection of torn-page-write crashes with consistency checks after recovery (which makes no difference, as the issue is that recovery does not happen), modified to include a very crude transaction manager to make use of 2PC.

Cheers,

Jeff

Attachment

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Simon Riggs

Date:

17 April 2017, 12:14:41

On 15 April 2017 at 23:37, Jeff Janes <jeff.janes@gmail.com> wrote:
> After this commit, I get crash recovery failures when using prepared
> transactions.
>
> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
> Author: Simon Riggs <simon@2ndQuadrant.com>
> Date:   Tue Apr 4 15:56:56 2017 -0400
>
>     Speedup 2PC recovery by skipping two phase state files in normal path

Thanks Jeff for your tests.

So that's now two crash bugs in as many days and lack of clarity about
how to fix it.

Stas, I thought this patch was very important to you, yet two releases
in a row we are too-late-and-buggy.

If anybody has a reason why I shouldn't revert this, please say so now
fairly soon.
Any further attempts to fix must run Jeff's tests.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Stas Kelvich

Date:

17 April 2017, 12:25:42

> On 17 Apr 2017, at 12:14, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On 15 April 2017 at 23:37, Jeff Janes <jeff.janes@gmail.com> wrote:
>> After this commit, I get crash recovery failures when using prepared
>> transactions.
>>
>> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
>> Author: Simon Riggs <simon@2ndQuadrant.com>
>> Date:   Tue Apr 4 15:56:56 2017 -0400
>>
>>    Speedup 2PC recovery by skipping two phase state files in normal path
>
> Thanks Jeff for your tests.
>
> So that's now two crash bugs in as many days and lack of clarity about
> how to fix it.
>
> Stas, I thought this patch was very important to you, yet two releases
> in a row we are too-late-and-buggy.

I’m looking at pgstat issue in nearby thread right now and will switch to this
shortly.

If that’s possible, I’m asking to delay revert for several days.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Nikhil Sontakke

Date:

17 April 2017, 12:32:01

>> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
>> Author: Simon Riggs <simon@2ndQuadrant.com>
>> Date: Tue Apr 4 15:56:56 2017 -0400
>>
>> Speedup 2PC recovery by skipping two phase state files in normal path
>
> Thanks Jeff for your tests.
>
> So that's now two crash bugs in as many days and lack of clarity about
> how to fix it.
>

I am trying to reproduce and debug it from my end as well.

Regards,

Nikhils

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Nikhil Sontakke

Date:

18 April 2017, 09:39:45

On 17 April 2017 at 15:02, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:

>> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
>> Author: Simon Riggs <simon@2ndQuadrant.com>
>> Date: Tue Apr 4 15:56:56 2017 -0400
>>
>> Speedup 2PC recovery by skipping two phase state files in normal path
>
> Thanks Jeff for your tests.
>
> So that's now two crash bugs in as many days and lack of clarity about
> how to fix it.
>

The issue seems to be that a prepared transaction is yet to be committed. But autovacuum comes in and causes the clog to be truncated beyond this prepared transaction ID in one of the runs.

We only add the corresponding pgproc entry for a surviving 2PC transaction on completion of recovery. So could be a race condition here. Digging in further.

Regards,

Nikhils

Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Nikhil Sontakke

Date:

18 April 2017, 11:17:28

Hi,

There was a bug in the redo 2PC remove code path. Because of which, autovac would think that the 2PC is gone and cause removal of the corresponding clog entry earlier than needed.

Please find attached, the bug fix: 2pc_redo_remove_bug.patch.

I have been testing this on top of Michael's 2pc-restore-fix.patch and things seem to be ok for the past one+ hour. Will keep it running for long.

Jeff, thanks for these very useful scripts. I am going to make a habit to run these scripts on my side from now on. Do you have any other script that I could try against these patches? Please let me know.

Regards,

Nikhils

On 18 April 2017 at 12:09, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:

On 17 April 2017 at 15:02, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:

>> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
>> Author: Simon Riggs <simon@2ndQuadrant.com>
>> Date: Tue Apr 4 15:56:56 2017 -0400
>>
>> Speedup 2PC recovery by skipping two phase state files in normal path
>
> Thanks Jeff for your tests.
>
> So that's now two crash bugs in as many days and lack of clarity about
> how to fix it.
>

The issue seems to be that a prepared transaction is yet to be committed. But autovacuum comes in and causes the clog to be truncated beyond this prepared transaction ID in one of the runs.

We only add the corresponding pgproc entry for a surviving 2PC transaction on completion of recovery. So could be a race condition here. Digging in further.

Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Attachment

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Nikhil Sontakke

Date:

18 April 2017, 11:57:10

Please find attached a second version of my bug fix which is stylistically better and clearer than the first one.

Regards,

Nikhils

On 18 April 2017 at 13:47, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:

Hi,

There was a bug in the redo 2PC remove code path. Because of which, autovac would think that the 2PC is gone and cause removal of the corresponding clog entry earlier than needed.

Please find attached, the bug fix: 2pc_redo_remove_bug.patch.

I have been testing this on top of Michael's 2pc-restore-fix.patch and things seem to be ok for the past one+ hour. Will keep it running for long.

Jeff, thanks for these very useful scripts. I am going to make a habit to run these scripts on my side from now on. Do you have any other script that I could try against these patches? Please let me know.

Regards,
Nikhils

On 18 April 2017 at 12:09, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:

On 17 April 2017 at 15:02, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:

>> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
>> Author: Simon Riggs <simon@2ndQuadrant.com>
>> Date: Tue Apr 4 15:56:56 2017 -0400
>>
>> Speedup 2PC recovery by skipping two phase state files in normal path
>
> Thanks Jeff for your tests.
>
> So that's now two crash bugs in as many days and lack of clarity about
> how to fix it.
>

The issue seems to be that a prepared transaction is yet to be committed. But autovacuum comes in and causes the clog to be truncated beyond this prepared transaction ID in one of the runs.

We only add the corresponding pgproc entry for a surviving 2PC transaction on completion of recovery. So could be a race condition here. Digging in further.

Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Attachment

2pc_redo_remove_bug_v2.0.patch

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Simon Riggs

Date:

18 April 2017, 13:54:30

On 18 April 2017 at 09:57, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:

> Please find attached a second version of my bug fix which is stylistically
> better and clearer than the first one.

Yeh, this is better. Pushed.

The bug was that the loop set gxact to be the last entry in the array,
causing the exit condition to fail and us then to remove the last
gxact from memory even when it didn't match the xid, removing a valid
entry too early. That then allowed xmin to move forwards, which causes
autovac to remove pg_xact entries earlier than needed.

Well done for finding that one, thanks for the patch.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Michael Paquier

Date:

18 April 2017, 15:12:43

On Tue, Apr 18, 2017 at 7:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Yeh, this is better. Pushed.

I have been outraced on this one, the error is obvious once you see it ;)

Thanks for the investigation and the fix! I have spent a couple of
hours reviewing the interactions between the shmem entries of 2PC
state data created at the beginning of recovery and all the lookups in
procarray.c and varsup.c, noticing nothing by the way.

> The bug was that the loop set gxact to be the last entry in the array,
> causing the exit condition to fail and us then to remove the last
> gxact from memory even when it didn't match the xid, removing a valid
> entry too early. That then allowed xmin to move forwards, which causes
> autovac to remove pg_xact entries earlier than needed.
>
> Well done for finding that one, thanks for the patch.

Running Jeff's test suite, I can confirm that there are no problems now.
-- 
Michael

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Simon Riggs

Date:

18 April 2017, 15:38:50

On 18 April 2017 at 13:12, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 7:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Yeh, this is better. Pushed.
>
> I have been outraced on this one, the error is obvious once you see it ;)

Didn't realise you were working on it, nothing competitive about it.

It's clear this needed fixing, whether or not it fixes Jeff's report.

I do think it explains the report, so I'm hopeful.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Jeff Janes

Date:

19 April 2017, 04:48:45

On Tue, Apr 18, 2017 at 5:38 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 18 April 2017 at 13:12, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 7:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Yeh, this is better. Pushed.
>
> I have been outraced on this one, the error is obvious once you see it ;)

Didn't realise you were working on it, nothing competitive about it.

It's clear this needed fixing, whether or not it fixes Jeff's report.

I do think it explains the report, so I'm hopeful.

The git HEAD code (c727f12) has been surviving so far, with both test cases.

Thanks,

Jeff

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Jeff Janes

Date:

19 April 2017, 05:09:00

On Tue, Apr 18, 2017 at 1:17 AM, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:

Hi,

There was a bug in the redo 2PC remove code path. Because of which, autovac would think that the 2PC is gone and cause removal of the corresponding clog entry earlier than needed.

Please find attached, the bug fix: 2pc_redo_remove_bug.patch.

I have been testing this on top of Michael's 2pc-restore-fix.patch and things seem to be ok for the past one+ hour. Will keep it running for long.

Jeff, thanks for these very useful scripts. I am going to make a habit to run these scripts on my side from now on. Do you have any other script that I could try against these patches? Please let me know.

This script is the only one I have that specifically targets 2PC. I wrote it last year when the previous round of speed-up code (which avoided writing the files upon "PREPARE" by delaying them until the next checkpoint) was developed. I just decided to dust that test off to try again here. I don't know how to change it to make it more targeted towards this set of patches. Would this bug have been seen in a replica server in the absence of crashes, or was it only vulnerable during crash recovery rather than streaming replication?

Cheers,

Jeff

Re: [HACKERS] Failed recovery with new faster 2PC code

From

Michael Paquier

Date:

19 April 2017, 05:29:08

On Wed, Apr 19, 2017 at 11:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Would this bug have been seen in a replica server in the absence of crashes,
> or was it only vulnerable during crash recovery rather than streaming
> replication?

This issue could have been seen on a streaming standby as well,
letting around a TwoPhaseState impacts as well their linked PGPROC so
CLOG truncation would have been messed up as well. That's also the
case of the first issue involving as well incorrect XID updates,
though the chances to see it are I think lower as only the beginning
of recovery was impacted.
-- 
Michael