Re: [HACKERS] Another nasty cache problem - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Another nasty cache problem
Date
Msg-id 9850.949332263@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Another nasty cache problem  (Peter Eisentraut <e99re41@DoCS.UU.SE>)
Responses Re: [HACKERS] Another nasty cache problem
List pgsql-hackers
Peter Eisentraut <e99re41@DoCS.UU.SE> writes:
> This sort of thing should be documented,

... or changed ...

> Anyway, I just counted 254 uses of SearchSysCacheTuple in the backend tree
> and a majority of these are probably obviously innocent. Since I don't
> have any more developing planned, I would volunteer to take a look at all
> of those and look for violations of second cache look up, heap_open, and
> CommandCounterIncrement, fixing them where possible, or at least pointing
> them out to more experienced people. That might save you from going out of
> your way and instituting some reference count or whatever, and it would be
> an opportunity for me to read some code.

I appreciate the offer, but I don't really want to fix it that way.
If that's how things have to work, then the code will be *extremely*
fragile --- any routine that opens a relation or looks up a cache tuple
will potentially break its callers as well as itself.  And since the
probability of failure is so low, we'll never find it; we'll just keep
getting the occasional irreproducible failure report from the field.
I think we need a designed-in solution rather than a restrictive coding
rule.

Also, I am not sure that the existing uses are readily fixable.  For
example, I saw a number of crashes in the parser last night, most of
which traced to uses of Operator or Type pointers --- which are really
SearchSysCacheTuple results, but the parser passes them around with wild
abandon.  I don't see any easy way of restructuring that code to avoid
this.

I am starting to think that Bruce's idea might be the way to go: lock
down any cache entry that's been referenced since the last transaction
start or CommandCounterIncrement, and elog() if it's changed by
invalidation.  Then the only coding rule needed is "cached tuples don't
stay valid across CommandCounterIncrement", which is relatively
simple to check for.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Case-folding bogosity in new psql
Next
From: The Hermit Hacker
Date:
Subject: Re: [HACKERS] Re: Case-folding bogosity in new psql