Re: [HACKERS] Failed recovery with new faster 2PC code - Mailing list pgsql-hackers

From Nikhil Sontakke
Subject Re: [HACKERS] Failed recovery with new faster 2PC code
Date
Msg-id CAMGcDxeykkrKCk0FY9Pzt5JusLWw4woKXs8NoqjbOZfQQZ-i2Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Failed recovery with new faster 2PC code  (Nikhil Sontakke <nikhils@2ndquadrant.com>)
Responses Re: [HACKERS] Failed recovery with new faster 2PC code  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Please find attached a second version of my bug fix which is stylistically better and clearer than the first one. 

Regards,
Nikhils

On 18 April 2017 at 13:47, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:
Hi,

There was a bug in the redo 2PC remove code path. Because of which, autovac would think that the 2PC is gone and cause removal of the corresponding clog entry earlier than needed. 

Please find attached, the bug fix: 2pc_redo_remove_bug.patch. 

I have been testing this on top of Michael's 2pc-restore-fix.patch and things seem to be ok for the past one+ hour. Will keep it running for long. 

Jeff, thanks for these very useful scripts. I am going to make a habit to run these scripts on my side from now on. Do you have any other script that I could try against these patches? Please let me know. 

Regards,
Nikhils 

On 18 April 2017 at 12:09, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:


On 17 April 2017 at 15:02, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:
 
>> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
>> Author: Simon Riggs <simon@2ndQuadrant.com>
>> Date:   Tue Apr 4 15:56:56 2017 -0400
>>
>>    Speedup 2PC recovery by skipping two phase state files in normal path
>
> Thanks Jeff for your tests.
>
> So that's now two crash bugs in as many days and lack of clarity about
> how to fix it.
>

The issue seems to be that a prepared transaction is yet to be committed. But autovacuum comes in and causes the clog to be truncated beyond this prepared transaction ID in one of the runs.

We only add the corresponding pgproc entry for a surviving 2PC transaction on completion of recovery. So could be a race condition here. Digging in further. 

Regards,
Nikhils
--
 Nikhil Sontakke                   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



--
 Nikhil Sontakke                   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services



--
 Nikhil Sontakke                   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [HACKERS] CREATE TRIGGER document typo
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] Passing values to a dynamic background worker