Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Date
Msg-id 20131120145320.GF25406@awork2.anarazel.de
Whole thread Raw
In response to Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
On 2013-11-20 06:21:13 -0800, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-11-20 05:59:58 -0800, Kevin Grittner wrote:
> 
> >> I don't understand where that would make sense; especially since
> >> I thought that a database FREEZE followed by a checkpoint
> >> releases old clog space anyway.
> >
> > It only releases them up to the (cluster wide) xmin horizon. So
> > if there are older snapshots or prepared xacts around...
> 
> So as long as there are no open transactions or prepared
> transactions on the master which started before the release with
> the fix is applied, VACUUM FREEZE would be guaranteed to work? 
> Since I don't see how a non-prepared transaction would be running
> from before a minor release upgrade, that just means we have to
> make sure there are no prepared transactions from before the
> upgrade?

That's not a bad point. So the way to fix it would be:

1) Restart the standby to the new minor release, wait for catchup
2) Restart the primary (fast or smart) to the new minor release
3) Acquire enough new xids to make sure we cross a clog page (?)
4) Jot down a new xid:  SELECT txid_current()::bigint % (1::bigint<<33-1)
5) vacuumdb -z -a
6) Ensure that there are no prepared xacts older than 3) around
SELECT *
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC LIMIT 1;
7) Ensure the xmin horizon is above the one from: 3:
SELECT datname, datfrozenxid
FROM pg_database
WHERE datname != 'template0'
ORDER BY age(datfrozenxid) DESC LIMIT 1;
8) Get the current lsn: SELECT pg_current_xlog_location();
9) verify on each standby that SELECT pg_last_xlog_receive_location() is  past 7)
10) be happy

I am not sure how we can easily compute that 6) and 7) are past 3) in
the presence of xid wraparounds.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Easily reading debug_print_plan
Next
From: Tom Lane
Date:
Subject: Re: Autoconf 2.69 update