Re: BUG #11264: Auto vacuum wraparound job blocking everything - trying again - Mailing list pgsql-bugs

From David Gould
Subject Re: BUG #11264: Auto vacuum wraparound job blocking everything - trying again
Date
Msg-id 20141001140945.71681b0f@jekyl.lan
Whole thread Raw
In response to Re: BUG #11264: Auto vacuum wraparound job blocking everything  (David Gould <daveg@sonic.net>)
List pgsql-bugs
I sent the following a couple days ago but it seems to have been
overlooked. I think this issue may impact a lot of 9.3 sites that have
upgraded from earlier versions. Please take a look at this.

-dg

On Tue, 30 Sep 2014 00:44:11 +0000 (UTC) David Gould <daveg@sonic.net>
wrote:

> Alvaro Herrera <alvherre <at> 2ndquadrant.com> writes:
>
> > Well, yes, 9.3.4 had a bug fixed by this commit:
> >
> > Author: Bruce Momjian <bruce <at> momjian.us>
> > Branch: master [a61daa14d] 2014-07-02 15:29:38 -0400
> > Branch: REL9_4_STABLE [b446a384b] 2014-07-02 15:29:38 -0400
> > Branch: REL9_3_STABLE Release: REL9_3_5 [3d2e18510] 2014-07-02 15:29:38
> -0400
> >
> >     pg_upgrade:  preserve database and relation minmxid values
> >
> >     Also set these values for pre-9.3 old clusters that don't have values
> to
> >     preserve.
> >
> >     Analysis by Alvaro
> >
> >     Backpatch through 9.3
> >
> > > How do we fix the current issue with this one server? Is there an easy
> fix?
> > > Thanks.
> >
> > As far as I am aware, you should
> >   UPDATE pg_database SET datminmxid=20783
> >
> > and that should fix it.  The oldestMulti value in pg_control would get
> > updated by itself some time later.  If you experience stalls before
> > oldestMulti fixes itself, you could stop the server (cleanly!) and then
> > pg_resetxlog -m x,y where x is the correct nextMulti value from
> > pg_controldata and y is 20783.
>
> We ran into this on two instances last night and I'm starting to worry that
> many others may hit it as well. As far as I can tell any postgres instance
> that was upgraded to 9.3 by a pre-9.3.5 pg_upgrade is at risk as they all
> will have:
>
>   Latest checkpoint's oldestMultiXid:   1
>
> The workaround above is good if you still have the old cluster around from
> before the upgrade. We did not, that was all cleaned up months ago. Which
> raises the question: how do you fix a 9.3 instance that has oldestMultiXid =
> 1 without access to the pre-upgrade instance. That is, where do you get the
> correct value of oldestMultiXid to set into pg_database.datxminmxid?
>
> I took a guess that the oldest pg_class.relminmxid for all the tables in all
> the databases would be ok and updated pg_database.datminmxid witt that. That
> is, in each database I ran:
>
>   select relname, relminmxid, age(relminmxid)
>     from pg_class
>     where relkind = 'r'
>       and age(relminmxid) < 2147483647
>     order by 3 desc limit 1 ;
>
> And then used the oldest one to update pg_database.datminmxid. After a
> checkpoint and shutdown/restart I see that it was written to pg_controldata
> too. Afterwards I was able to run a vacuum freeze on the problem table.
>
> Questions:
>
> 1) Is the procedure above safe and effective for this, or did I just hose my
> databases?
>
> 2) If the procedure above is not safe, what do we do instead?
>
> 3) Is this likely to affect all 9.3.x<5 upgraded databases? If so, how do we
> fix it before too much damage happens in the wider world?
\


--
David Gould                              daveg <at> sonic.net
If simplicity worked, the world would be overrun with insects.

pgsql-bugs by date:

Previous
From: willmis@gmail.com
Date:
Subject: BUG #11550: Error messages contain not encodable characters (Latin9)
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #11264: Auto vacuum wraparound job blocking everything