Re: [HACKERS] Failed recovery with new faster 2PC code - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] Failed recovery with new faster 2PC code
Date
Msg-id CAB7nPqQ_RnV8QTYxtm7=hudY56jjtG1tbZgrFOuYF8AuDdFiZA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Failed recovery with new faster 2PC code  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [HACKERS] Failed recovery with new faster 2PC code  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Tue, Apr 18, 2017 at 7:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Yeh, this is better. Pushed.

I have been outraced on this one, the error is obvious once you see it ;)

Thanks for the investigation and the fix! I have spent a couple of
hours reviewing the interactions between the shmem entries of 2PC
state data created at the beginning of recovery and all the lookups in
procarray.c and varsup.c, noticing nothing by the way.

> The bug was that the loop set gxact to be the last entry in the array,
> causing the exit condition to fail and us then to remove the last
> gxact from memory even when it didn't match the xid, removing a valid
> entry too early. That then allowed xmin to move forwards, which causes
> autovac to remove pg_xact entries earlier than needed.
>
> Well done for finding that one, thanks for the patch.

Running Jeff's test suite, I can confirm that there are no problems now.
-- 
Michael



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [HACKERS] Logical replication and synchronous replication
Next
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check