Thread: A note about hash-based catcache invalidations

A note about hash-based catcache invalidations

From
Tom Lane
Date:
I'm looking into the idea I mentioned earlier:

> All is not entirely lost, however: there's still some possible
> performance benefit to be gained here, if we go to the scheme of
> identifying victim catcache entries by hashvalue only.  Currently,
> each heap_update in a cached catalog has to issue two sinval messages
> (per cache!): one against the old TID and one against the new TID.
> We'd be able to reduce that to one message in the common case where the
> hashvalue remains the same because the cache key columns didn't change.

Removing the tuple ID from sinval messages turns out to have slightly
wider impact than I thought at first, because the current coding passes
those to callback functions registered with
CacheRegisterSyscacheCallback, and there are a lot of 'em.  However,
only one of them seems to be doing anything with the tuplePtr argument,
namely PlanCacheFuncCallback.  We can make it work with the hash value
instead, which will be about as good as what we're doing now.

Any objections to that plan?
        regards, tom lane


Re: A note about hash-based catcache invalidations

From
Heikki Linnakangas
Date:
On 17.08.2011 00:17, Tom Lane wrote:
> I'm looking into the idea I mentioned earlier:
>
>> All is not entirely lost, however: there's still some possible
>> performance benefit to be gained here, if we go to the scheme of
>> identifying victim catcache entries by hashvalue only.  Currently,
>> each heap_update in a cached catalog has to issue two sinval messages
>> (per cache!): one against the old TID and one against the new TID.
>> We'd be able to reduce that to one message in the common case where the
>> hashvalue remains the same because the cache key columns didn't change.
>
> Removing the tuple ID from sinval messages turns out to have slightly
> wider impact than I thought at first, because the current coding passes
> those to callback functions registered with
> CacheRegisterSyscacheCallback, and there are a lot of 'em.  However,
> only one of them seems to be doing anything with the tuplePtr argument,
> namely PlanCacheFuncCallback.  We can make it work with the hash value
> instead, which will be about as good as what we're doing now.
>
> Any objections to that plan?

A callback might be using the tuple ID in a way that fails if VACUUM 
FULL moves the tuple, so I think we *have* to change it. (as you did 
already)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: A note about hash-based catcache invalidations

From
Simon Riggs
Date:
On Tue, Aug 16, 2011 at 10:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Any objections to that plan?

None at all, but some questions.

This overhaul of the cache mechanism has been extensive, so you're now
very well placed to answer related questions.

As you know, I've been trying to reduce the lock strength of some DDL
operations. When that was last discussed there were two "options". The
first was to re-write SnapshotNow, which in my opinion is necessary
but solves only part of the problem. I proposed explicit locking
around catalog access, which would affect the cache path/code. I don't
like that, but I don't see another way.

From where you are now, do have any insight about how to tackle the
locking problem? Thanks.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: A note about hash-based catcache invalidations

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> A callback might be using the tuple ID in a way that fails if VACUUM 
> FULL moves the tuple, so I think we *have* to change it. (as you did 
> already)

Yeah, I thought about that too.  As things stand in 9.0 and 9.1, there's
at least a theoretical possibility of this:

1. Process A prepares a plan that includes an inline'd copy of a SQL
function.  It labels the plan with the function's pg_proc TID.

2. Process B executes "VACUUM FULL pg_proc", moving the SQL function's
tuple to a different TID.

3. Process C modifies the SQL function via CREATE OR REPLACE FUNCTION,
and sends out an inval against the new TID.

4. Process A doesn't invalidate its cached plan because it thinks the
TID is for some other function; so it continues to use the obsolete
version of the function.

The only way I can see to fix that is to back-patch the last set of
changes I committed yesterday.  I think that's entirely unworkable for
9.0, because of the risk of breaking third-party code that registers
syscache callbacks.  Even in 9.1 it seems a bit late to be changing that
API, so I'm thinking we should leave it alone.  The odds of anyone
actually getting burnt in the field by the above scenario seem lower
than the odds of causing problems with a late API change.
        regards, tom lane


Re: A note about hash-based catcache invalidations

From
Tom Lane
Date:
BTW, while we're thinking about this ...

The plpython patch Jan just submitted reminds me that several of the PLs
detect whether they have obsolete cached data by noting whether the
tuple's xmin *and* TID are the same as previously seen.

Unlike depending on TID alone, I think this is probably safe.  It can
obviously give a false positive (thinks tuple changed when it didn't)
after a catalog VACUUM FULL; but an error in that direction is safe.
What would be problematic is a false negative (failure to notice a
real change), and I think the inclusion of the xmin in the test protects
us against that.  An example scenario is:

1. We cache the data, saving xmin X1 and TID T1.

2. VACUUM FULL moves the tuple to TID T2.

3. Somebody else updates the tuple, by chance moving it right back to
T1.  But they will assign a new xmin X2, so we will know it changed.

Can anyone think of a situation this does not cover?
        regards, tom lane


Re: A note about hash-based catcache invalidations

From
Robert Haas
Date:
On Wed, Aug 17, 2011 at 1:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> BTW, while we're thinking about this ...
>
> The plpython patch Jan just submitted reminds me that several of the PLs
> detect whether they have obsolete cached data by noting whether the
> tuple's xmin *and* TID are the same as previously seen.
>
> Unlike depending on TID alone, I think this is probably safe.  It can
> obviously give a false positive (thinks tuple changed when it didn't)
> after a catalog VACUUM FULL; but an error in that direction is safe.
> What would be problematic is a false negative (failure to notice a
> real change), and I think the inclusion of the xmin in the test protects
> us against that.  An example scenario is:
>
> 1. We cache the data, saving xmin X1 and TID T1.
>
> 2. VACUUM FULL moves the tuple to TID T2.
>
> 3. Somebody else updates the tuple, by chance moving it right back to
> T1.  But they will assign a new xmin X2, so we will know it changed.
>
> Can anyone think of a situation this does not cover?

What about this:

1. We cache the data, saving xmin X1 and TID T1.

2. VACUUM FULL moves the tuple to TID T2 but stores some other tuple in TID T1.

3. If the tuple that is now at TID T1 happens to have xmin = X1, we're
in trouble.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: A note about hash-based catcache invalidations

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Aug 17, 2011 at 1:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The plpython patch Jan just submitted reminds me that several of the PLs
>> detect whether they have obsolete cached data by noting whether the
>> tuple's xmin *and* TID are the same as previously seen.
>> Can anyone think of a situation this does not cover?

> What about this:

> 1. We cache the data, saving xmin X1 and TID T1.

> 2. VACUUM FULL moves the tuple to TID T2 but stores some other tuple in TID T1.

> 3. If the tuple that is now at TID T1 happens to have xmin = X1, we're
> in trouble.

No, because remember that we're also effectively demanding a match on
OID (because we fetch the tuple by OID to begin with) and that the tuple
be live (else we won't fetch it at all).  There should not be another
live tuple with the same OID that vacuum could move to T1 --- if there
is, we've got worse problems than a broken caching check.
        regards, tom lane