Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Date
Msg-id 20131119142001.GA10498@alap2.anarazel.de
Whole thread Raw
In response to Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Andres Freund <andres@2ndquadrant.com>)
Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Andrew Dunstan <andrew@dunslane.net>)
Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
Hi,

On 2013-11-18 23:15:59 +0100, Andres Freund wrote:
> Afaics it's likely a combination/interaction of bugs and fixes between:
> * the initial HS code
> * 5a031a5556ff83b8a9646892715d7fef415b83c3
> * f44eedc3f0f347a856eea8590730769125964597

Yes, the combination of those is guilty.

Man, this is (to a good part my) bad.

> But that'd mean nobody noticed it during 9.3's beta...

It's fairly hard to reproduce artificially since a) there have to be
enough transactions starting and committing from the start of the
checkpoint the standby is starting from to the point it does
LogStandbySnapshot() to cross a 32768 boundary b) hint bits often save
the game by not accessing clog at all anymore and thus not noticing the
corruption.
I've reproduced the issue by having an INSERT ONLY table that's never
read from. It's helpful to disable autovacuum.

Imo something the attached patch should be done. The description I came
up with is:
   Fix Hot-Standby initialization of clog and subtrans.
   These bugs can cause data loss on standbys started with hot_standby=on   at the moment they start to accept read
onlyqueries by marking   committed transactions as uncommited. The likelihood of such   corruptions is small unless the
primaryhas a high transaction rate.
 
   5a031a5556ff83b8a9646892715d7fef415b83c3 fixed bugs in HS's startup   logic by maintaining less state until at least
 STANDBY_SNAPSHOT_PENDING state was reached, missing the fact that both   clog and subtrans are written to before that.
Thisonly failed to fail   in common cases because the usage of ExtendCLOG in procarray.c was   superflous since clog
extensionsare actually WAL logged.
 
   f44eedc3f0f347a856eea8590730769125964597/I then tried to fix the   missing extensions of pg_subtrans due to the
formercommit's changes -   which are not WAL logged - by performing the extensions when switching   to a state >
STANDBY_INITIALIZEDand not performing xid assignments   before that - again missing the fact that ExtendCLOG is
unneccessary-   but screwed up twice: Once because latestObservedXid wasn't updated   anymore in that state due to the
earliercommit and once by having an   off-by-one error in the loop performing extensions.   This means that whenever a
CLOG_XACTS_PER_PAGE(32768 with default   settings) boundary was crossed between the start of the checkpoint   recovery
startedfrom and the first xl_running_xact record old   transactions commit bits in pg_clog could be overwritten if they
 started and committed in that window.
 
   Fix this mess by not performing ExtendCLOG() in HS at all anymore   since it's unneeded and evidently dangerous and
byperforming subtrans   extensions even before reaching STANDBY_SNAPSHOT_PENDING.
 

Imo this warrants and expedited point release :(

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Replication Node Identifiers and crashsafe Apply Progress
Next
From: Andres Freund
Date:
Subject: Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1