Re: [HACKERS] Another nasty cache problem - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] Another nasty cache problem
Date
Msg-id 200001301757.MAA12393@candle.pha.pa.us
Whole thread Raw
In response to Another nasty cache problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Another nasty cache problem
List pgsql-hackers
> I'm down to the point where the parallel tests mostly work with a small
> SI buffer --- but they do still sometimes fail.  I've realized that
> there is a whole class of bugs along the following lines:
> 
> There are plenty of routines that do two or more SearchSysCacheTuple
> calls to get the information they need.  As the code stands, it is
> unsafe to continue accessing the tuple returned by SearchSysCacheTuple
> after making a second such call, because the second call could possibly
> cause an SI cache reset message to be processed, thereby flushing the
> contents of the caches.

Yes, I have always been aware of this problem.  The issue is that since
cache entries are removed on a oldest-removed-first basis, I never
thought that several cache lookups would be a problem.  If you do many
cache lookups and expect very old ones to still exist, that could be a
problem.

However, a full reset of the cache could cause major problems.  Is there
a way to re-load the cache after the reset with the most recently cached
entries?  Seems that would be easier.  However, your issue is probably
that the new cache entries would have different locations from the old
entries.  Is it possible to delay the cache reset of the five most
recent cache entries, and do them later?  I don't see many places where
more than 2-3 cache entries are kept.  Maybe we need to keep them around
somehow during cache reset.

> I am not sure what to do about it.  One solution path is to make
> all the potential trouble spots do SearchSysCacheTupleCopy and then
> pfree the copied tuple when done.  However, that adds a nontrivial
> amount of overhead, and it'd be awfully easy to miss some trouble
> spots or to introduce new ones in the future.

Sounds like a lot of overhead to do the copy.

> 
> Another possibility is to introduce some sort of notion of a reference
> count, and to make the standard usage pattern be
>     tuple = SearchSysCacheTuple(...);
>     ... use tuple ...
>     ReleaseSysCacheTuple(tuple);
> The idea here is that a tuple with positive refcount would not be
> deleted during a cache reset, but would simply be removed from its
> cache, and then finally deleted when released (or during elog
> recovery).

If you can do that, can't you just keep the few most recent ones by
default.  Seems that would be very clean.

> This might allow us to get rid of SearchSysCacheTupleCopy, too,
> since the refcount should be just as good as palloc'ing one's own
> copy for most purposes.

Yes, that would be nice.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Another nasty cache problem
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Another nasty cache problem