Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Date
Msg-id 20131120234141.GI18801@awork2.anarazel.de
Whole thread Raw
In response to Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On 2013-11-20 10:48:41 -0800, Josh Berkus wrote:
> > Presumably a replica created while all traffic was halted on the master
> > would be clean, correct?  This bug can only be triggered if there's
> > heavy write load on the master, right?

Kinda. It's unfortunately necessary to understand how HS works to some
degree:
Everytime a server is (re-)started with a recovery.conf present and
hot_standby=on (be it streaming, archive based replication or PITR) the
Hot Standby code is used.
(Crash|Replication)-Recovery starts by reading the last checkpoint (from
pg_control or, if present, backup.label) and then replays WAL from the
'redo' point included in the checkpoint. The bug then occurs when it
first (or, in some case second time) replays a 'xl_running_xacts'
record. That's used to reconstruct information needed to allow queries.

Everytime the server in HS mode allows connections ("consistent recovery state
reached at ..." and "database system is ready to accept read only
connections" in the log), the bug can be triggered. If there weren't too
many transactions at that point, the problem won't occur until the
standby is restarted.

> If someone is doing PITR based on a snapshot taken with pg_basebackup,
> that will only trip this corruption bug if the user has hot_standby=on
> in their config *while restoring*?  Or is it critical if they have
> hot_standby=on while backing up?

hot_standby=on only has an effect while starting up with a recovery.conf
present. So, if you have an old base backup around and all WAL files,
you can start from that.

Does that answer your questsions?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Next
From: Craig Ringer
Date:
Subject: Can we trust fsync?