Re: VACUUM FULL versus system catalog cache invalidation - Mailing list pgsql-hackers

From Tom Lane
Subject Re: VACUUM FULL versus system catalog cache invalidation
Date
Msg-id 2603.1313176447@sss.pgh.pa.us
Whole thread Raw
In response to Re: VACUUM FULL versus system catalog cache invalidation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: VACUUM FULL versus system catalog cache invalidation
Re: VACUUM FULL versus system catalog cache invalidation
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Aug 12, 2011 at 2:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 2. Forget about targeting catcache invals by TID, and instead just use the
>> key hash value to determine which cache entries to drop.

>> Right at the moment I'm leaning to approach #2. �I wonder if anyone
>> sees it differently, or has an idea for a third approach?

> To be honest, I am not real keen on back-patching either of the
> options you list above, either.  I think we should probably do
> something about the problem Dave Gould reported (to wit: sinval resets
> need to be smarter about the order they do things in), but I am not
> sure it's a good idea to back-patch what seems like a fairly radical
> overhaul of the sinval mechanism to fix a problem nobody's actually
> complained about hitting in production, especially given there's no
> certainty that this is the last bug.

What radical overhaul?  I'm talking about removing one if-test in
CatalogCacheIdInvalidate, viz
        nextelt = DLGetSucc(elt);
        if (hashValue != ct->hash_value)            continue;        /* ignore non-matching hash values */

-            if (ct->negative ||
-                ItemPointerEquals(pointer, &ct->tuple.t_self))        {            if (ct->refcount > 0 ||
  (ct->c_list && ct->c_list->refcount > 0))            {
 

In any case, it is now clear to me that this bug is capable of eating
peoples' databases, as in "what just happened to our most critical
table?  Uh, it's not there anymore, boss, but we seem to have duplicate
pg_class entries for this other table".  This may not have happened
yet in the field, but that's probably only because 9.0 hasn't been in
production that long.  Doing nothing about it in the released branches
is unacceptable IMO.  People that that happens to will not be satisfied
with a response like "you shouldn't have been using VACUUM FULL on
system catalogs, it's buggy".

For that matter, I'm not convinced that Gould hasn't seen some form of
this --- he's mentioned seeing bizarre errors above and beyond the
"could not find pg_class tuple for index 2662" one.

As for it not being the last bug, no, it's probably not.  That's a
pretty sucky excuse for not fixing it too.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: Further news on Clang - spurious warnings
Next
From: Marko Kreen
Date:
Subject: Re: sha1, sha2 functions into core?