Re: BUG #11264: Auto vacuum wraparound job blocking everything - Mailing list pgsql-bugs

From David Gould
Subject Re: BUG #11264: Auto vacuum wraparound job blocking everything
Date
Msg-id loom.20140930T022539-927@post.gmane.org
Whole thread Raw
In response to BUG #11264: Auto vacuum wraparound job blocking everything  (dbhandary@switchfly.com)
Responses Re: BUG #11264: Auto vacuum wraparound job blocking everything - trying again
Re: BUG #11264: Auto vacuum wraparound job blocking everything
List pgsql-bugs
Alvaro Herrera <alvherre <at> 2ndquadrant.com> writes:

> Well, yes, 9.3.4 had a bug fixed by this commit:
>
> Author: Bruce Momjian <bruce <at> momjian.us>
> Branch: master [a61daa14d] 2014-07-02 15:29:38 -0400
> Branch: REL9_4_STABLE [b446a384b] 2014-07-02 15:29:38 -0400
> Branch: REL9_3_STABLE Release: REL9_3_5 [3d2e18510] 2014-07-02 15:29:38
-0400
>
>     pg_upgrade:  preserve database and relation minmxid values
>
>     Also set these values for pre-9.3 old clusters that don't have values
to
>     preserve.
>
>     Analysis by Alvaro
>
>     Backpatch through 9.3
>
> > How do we fix the current issue with this one server? Is there an easy
fix?
> > Thanks.
>
> As far as I am aware, you should
>   UPDATE pg_database SET datminmxid=20783
>
> and that should fix it.  The oldestMulti value in pg_control would get
> updated by itself some time later.  If you experience stalls before
> oldestMulti fixes itself, you could stop the server (cleanly!) and then
> pg_resetxlog -m x,y where x is the correct nextMulti value from
> pg_controldata and y is 20783.

We ran into this on two instances last night and I'm starting to worry that
many others may hit it as well. As far as I can tell any postgres instance
that was upgraded to 9.3 by a pre-9.3.5 pg_upgrade is at risk as they all
will have:

  Latest checkpoint's oldestMultiXid:   1

The workaround above is good if you still have the old cluster around from
before the upgrade. We did not, that was all cleaned up months ago. Which
raises the question: how do you fix a 9.3 instance that has oldestMultiXid =
1 without access to the pre-upgrade instance. That is, where do you get the
correct value of oldestMultiXid to set into pg_database.datxminmxid?

I took a guess that the oldest pg_class.relminmxid for all the tables in all
the databases would be ok and updated pg_database.datminmxid witt that. That
is, in each database I ran:

  select relname, relminmxid, age(relminmxid)
    from pg_class
    where relkind = 'r'
      and age(relminmxid) < 2147483647
    order by 3 desc limit 1 ;

And then used the oldest one to update pg_database.datminmxid. After a
checkpoint and shutdown/restart I see that it was written to pg_controldata
too. Afterwards I was able to run a vacuum freeze on the problem table.

Questions:

1) Is the procedure above safe and effective for this, or did I just hose my
databases?

2) If the procedure above is not safe, what do we do instead?

3) Is this likely to affect all 9.3.x<5 upgraded databases? If so, how do we
fix it before too much damage happens in the wider world?

-dg

--
David Gould                               daveg <at> sonic.net
If simplicity worked, the world would be overrun with insects.

pgsql-bugs by date:

Previous
From: thangalin@gmail.com
Date:
Subject: BUG #11520: Cannot uncheck to set false
Next
From: Michael Paquier
Date:
Subject: Re: BUG #11518: full_page_writes is off after executing pg_start_backup()