race condition when writing pg_control - Mailing list pgsql-hackers

From Bossart, Nathan
Subject race condition when writing pg_control
Date
Msg-id 70BF24D6-DC51-443F-B55A-95735803842A@amazon.com
Whole thread Raw
Responses Re: race condition when writing pg_control
List pgsql-hackers
Hi hackers,

I believe I've discovered a race condition between the startup and
checkpointer processes that can cause a CRC mismatch in the pg_control
file.  If a cluster crashes at the right time, the following error
appears when you attempt to restart it:

        FATAL:  incorrect checksum in control file

This appears to be caused by some code paths in xlog_redo() that
update ControlFile without taking the ControlFileLock.  The attached
patch seems to be sufficient to prevent the CRC mismatch in the
control file, but perhaps this is a symptom of a bigger problem with
concurrent modifications of ControlFile->checkPointCopy.nextFullXid.

Nathan


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: do {} while (0) nitpick
Next
From: Robert Haas
Date:
Subject: Re: design for parallel backup