Re: Interesting glitch in autovacuum - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Interesting glitch in autovacuum
Date
Msg-id 20080910171726.GH4399@alvh.no-ip.org
Whole thread Raw
In response to Interesting glitch in autovacuum  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Interesting glitch in autovacuum  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> I observed a curious bug in autovac just now.  Since plain vacuum avoids
> calling GetTransactionSnapshot, an autovac worker that happens not to
> analyze any tables will never call GetTransactionSnapshot at all.
> This means it will arrive at vac_update_datfrozenxid with
> RecentGlobalXmin never having been changed from its boot value of
> FirstNormalTransactionId, which means that it will fail to update the
> database's datfrozenxid ... or, if the current value of datfrozenxid
> is past 2 billion, that it will improperly advance datfrozenxid to
> sometime in the future.

Ouch :-(


> I've only directly tested this in HEAD, but I suspect the problem goes
> back a ways.

Well, this logic was introduced in 8.2; I'm not sure if there's a
problem in 8.1, but I don't think so.

> On reflection I'm not even sure that this is strictly an autovacuum
> bug.  It can be cast more generically as "RecentGlobalXmin getting
> used without ever having been set", and it sure looks to me like the
> HOT patch may have introduced a few risks of that sort.

Agreed.

Maybe we should boot RecentGlobalXmin with InvalidOid, and ensure where
it's going to be used that it's not that.

> I'm thinking that maybe an appropriate fix is to insert a
> GetTransactionSnapshot call at the beginning of InitPostgres'
> transaction, thus ensuring that every backend has some vaguely sane
> value for RecentGlobalXmin before it tries to do any database access.

AFAIR there's an "initial transaction" in InitPostgres or something like
that.  Since it goes away quickly, it'd be a good place to ensure the
snapshot does not last much longer.

> Another thought is that even with that, an autovac worker is likely
> to reach vac_update_datfrozenxid with a RecentGlobalXmin value that
> was computed at the start of its run, and is thus rather old.
> I wonder why vac_update_datfrozenxid is using the variable at all
> rather than doing GetOldestXmin?  It's not like that function is
> so performance-critical that it needs to avoid calling GetOldestXmin.

The function is called only once per autovacuum iteration, and once in
manually-invoked vacuum, so certainly it's not performance-critical.

> Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData,
> is it actually necessary for lazy vacuum to avoid setting a snapshot?
> It seems like it might be a good idea for it to do so in order to
> keep its RecentGlobalXmin reasonably current.

Hmm, I think I'd rather be inclined to get a snapshot just when it's
going to finish.  That way, RecentGlobalXmin will be up to date even if
the 

> I've only looked at this in HEAD, but I am thinking that we have
> a real problem here in both HEAD and 8.3.  I'm less sure how bad
> things are in the older branches.

8.2 does contain the vac_update_datfrozenxid problem at the very least.
Older versions do not have that logic, so they are probably safe.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Common Table Expressions (WITH RECURSIVE) patch
Next
From: Tom Lane
Date:
Subject: Re: Interesting glitch in autovacuum