Home > mailing lists

Re: [HACKERS] Failed recovery with new faster 2PC code - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: [HACKERS] Failed recovery with new faster 2PC code
Date	April 18, 2017 18:12:43
Msg-id	CAB7nPqQ_RnV8QTYxtm7=hudY56jjtG1tbZgrFOuYF8AuDdFiZA@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Failed recovery with new faster 2PC code (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: [HACKERS] Failed recovery with new faster 2PC code (Simon Riggs <simon@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Tue, Apr 18, 2017 at 7:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Yeh, this is better. Pushed.

I have been outraced on this one, the error is obvious once you see it ;)

Thanks for the investigation and the fix! I have spent a couple of
hours reviewing the interactions between the shmem entries of 2PC
state data created at the beginning of recovery and all the lookups in
procarray.c and varsup.c, noticing nothing by the way.

> The bug was that the loop set gxact to be the last entry in the array,
> causing the exit condition to fail and us then to remove the last
> gxact from memory even when it didn't match the xid, removing a valid
> entry too early. That then allowed xmin to move forwards, which causes
> autovac to remove pg_xact entries earlier than needed.
>
> Well done for finding that one, thanks for the patch.

Running Jeff's test suite, I can confirm that there are no problems now.
-- 
Michael

pgsql-hackers by date:

From: Craig Ringer
Date: 18 April 2017, 18:05:55
Subject: Re: [HACKERS] Logical replication and synchronous replication

From: Andrew Dunstan
Date: 18 April 2017, 18:13:08
Subject: Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check

Re: [HACKERS] Failed recovery with new faster 2PC code - Mailing list pgsql-hackers

Previous

Next