Re: Latent cache flush hazard in RelationInitIndexAccessInfo - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Latent cache flush hazard in RelationInitIndexAccessInfo |
Date | |
Msg-id | CA+TgmoZYiEwwAcNmw5mB_a_HRCyi8HRTP4bci2sPdopq4OSKpg@mail.gmail.com Whole thread Raw |
In response to | Latent cache flush hazard in RelationInitIndexAccessInfo (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Latent cache flush hazard in RelationInitIndexAccessInfo
(Michael Paquier <michael.paquier@gmail.com>)
|
List | pgsql-hackers |
On Wed, Sep 9, 2015 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Some stuff Salesforce has been doing turned up the fact that > RelationInitIndexAccessInfo is not safe against a relcache flush on > its target index. Specifically, the failure goes like this: > > * RelationInitIndexAccessInfo loads up relation->rd_indextuple. > > * Within one of the subsequent catalog fetches, eg the pg_am read, a > relcache flush occurs (this can be forced with CLOBBER_CACHE_ALWAYS). > > * RelationClearRelation keeps the relcache entry, since it has > positive refcount, but doesn't see a reason to keep the index-related > fields. > > * RelationInitIndexAccessInfo dumps core when it tries to fetch > fields out of relation->rd_indextuple, which has been reset to NULL. > > Now, it turns out this scenario cannot happen in the standard Postgres > code as it stands today (if it could, we'd have seen it in the buildfarm's > CLOBBER_CACHE_ALWAYS testing). The reason for that is that > RelationInitIndexAccessInfo is actually only called on newly-minted > Relation structs that are not yet installed into the relcache hash table; > hence RelationCacheInvalidate can't find them to zap them. It can happen > in Salesforce's modified code though, and it's not hard to imagine that > future changes in the community version might expose the same hazard. > (For one reason, the current technique implies that an error halfway > through relcache load results in leaking the Relation struct and > subsidiary data; we might eventually decide that's not acceptable.) > > We could just ignore the problem until/unless it becomes real for us, > but now that I've figured it out I'm inclined to do something before > the knowledge disappears again. > > The specific reason there's a problem is that there's a disconnect between > RelationClearRelation's test for whether a relcache entry has valid index > info (it checks whether rd_indexcxt != NULL) and the order of operations > in RelationInitIndexAccessInfo (creating the index context is not the > first thing it does). Given that if RelationClearRelation thinks the > info is valid, what it does is call RelationReloadIndexInfo which needs > rd_index to be valid, I'm thinking the best fix is to leave the order of > operations in RelationInitIndexAccessInfo alone and instead make the > "relcache entry has index info" check be "rd_index != NULL". There might > need to be some additional work to keep RelationReloadIndexInfo from > setting rd_isvalid = true too soon. > > Thoughts? Doesn't seem like a bad thing to clean up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: