Re: BUG #11264: Auto vacuum wraparound job blocking everything - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: BUG #11264: Auto vacuum wraparound job blocking everything
Date
Msg-id 20141001213211.GW5311@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: BUG #11264: Auto vacuum wraparound job blocking everything  (David Gould <daveg@sonic.net>)
Responses Re: BUG #11264: Auto vacuum wraparound job blocking everything  (Bernhard Schrader <bernhard.schrader@innogames.de>)
List pgsql-bugs
David Gould wrote:

> We ran into this on two instances last night and I'm starting to worry that
> many others may hit it as well. As far as I can tell any postgres instance
> that was upgraded to 9.3 by a pre-9.3.5 pg_upgrade is at risk as they all
> will have:
>
>   Latest checkpoint's oldestMultiXid:   1
>
> The workaround above is good if you still have the old cluster around from
> before the upgrade. We did not, that was all cleaned up months ago. Which
> raises the question: how do you fix a 9.3 instance that has oldestMultiXid =
> 1 without access to the pre-upgrade instance. That is, where do you get the
> correct value of oldestMultiXid to set into pg_database.datxminmxid?
>
> I took a guess that the oldest pg_class.relminmxid for all the tables in all
> the databases would be ok and updated pg_database.datminmxid witt that. That
> is, in each database I ran:
>
>   select relname, relminmxid, age(relminmxid)
>     from pg_class
>     where relkind = 'r'
>       and age(relminmxid) < 2147483647
>     order by 3 desc limit 1 ;

Unfortunately, the age() function is only valid for transaction IDs, not
multixacts.  In certain cases the values returned might be sensible, but
they don't have to be, and if the counters are not relatively close to
one another, most likely they won't be sensible.  A new function was
added to 9.4, but that doesn't help those in 9.3.
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=36ad1a87a368d837d5c2260b2fd2ff83e976425a
(I guess you could have an extension containing that function.)
My bet is that is the reason you had to add the condition that the value
is less than 2^31 - 1.

Offhand, other than the way to obtain the multixact age, the procedure
seems sensible.

A better way not involving mxid_age() would be to use pg_controldata to
extract the current value of the mxid counter, then subtract the current
relminmxid from that value.

> 3) Is this likely to affect all 9.3.x<5 upgraded databases?

Yes.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

Previous
From: David Gould
Date:
Subject: Re: BUG #11264: Auto vacuum wraparound job blocking everything - trying again
Next
From: "Burgess, Freddie"
Date:
Subject: Unreliable results returned with PostgreSQL Spatial query?