[HACKERS] Failed recovery with new faster 2PC code - Mailing list pgsql-hackers

From Jeff Janes
Subject [HACKERS] Failed recovery with new faster 2PC code
Date
Msg-id CAMkU=1xBP8cqdS5eK8APHL=X6RHMMM2vG5g+QamduuTsyCwv9g@mail.gmail.com
Whole thread Raw
Responses Re: [HACKERS] Failed recovery with new faster 2PC code  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
After this commit, I get crash recovery failures when using prepared transactions.  

commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
Author: Simon Riggs <simon@2ndQuadrant.com>
Date:   Tue Apr 4 15:56:56 2017 -0400

    Speedup 2PC recovery by skipping two phase state files in normal path


After the induced crash, I get this failure in recovery:


FATAL:  could not access status of transaction 334419347
DETAIL:  Could not open file "pg_xact/013E": No such file or directory.
LOG:  startup process (PID 60106) exited with exit code 1
LOG:  aborting startup due to startup process failure
LOG:  database system is shut down

The earliest file which exists in pg_xact is 0176

Other examples:

FATAL:  could not access status of transaction 121729737
DETAIL:  Could not open file "pg_xact/0074": No such file or directory.
LOG:  startup process (PID 23720) exited with exit code 1

FATAL:  could not access status of transaction 181325554
DETAIL:  Could not open file "pg_xact/00AC": No such file or directory.
LOG:  startup process (PID 8375) exited with exit code 1


I experience this in about 1 out of 15 crash-recovery cycles on 8 CPUs.

The patch Pavan posted here did not make any difference:


I've attached the test harness, which I think will look familiar to y'all.  It is the usual injection of torn-page-write crashes with consistency checks after recovery (which makes no difference, as the issue is that recovery does not happen), modified to include a very crude transaction manager to make use of 2PC.

Cheers,

Jeff

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] OpenSSL support in our back branches
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] OpenSSL support in our back branches