Re: Latent cache flush hazard in RelationInitIndexAccessInfo - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Latent cache flush hazard in RelationInitIndexAccessInfo |
Date | |
Msg-id | CAB7nPqRFpsormMv42SfhAwcmj9ysWuBRBfVDUfXjE7XfxtHi2A@mail.gmail.com Whole thread Raw |
In response to | Re: Latent cache flush hazard in RelationInitIndexAccessInfo (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Latent cache flush hazard in RelationInitIndexAccessInfo
|
List | pgsql-hackers |
On Fri, Sep 11, 2015 at 11:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 9, 2015 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Some stuff Salesforce has been doing turned up the fact that >> RelationInitIndexAccessInfo is not safe against a relcache flush on >> its target index. Specifically, the failure goes like this: >> >> * RelationInitIndexAccessInfo loads up relation->rd_indextuple. >> >> * Within one of the subsequent catalog fetches, eg the pg_am read, a >> relcache flush occurs (this can be forced with CLOBBER_CACHE_ALWAYS). >> >> * RelationClearRelation keeps the relcache entry, since it has >> positive refcount, but doesn't see a reason to keep the index-related >> fields. >> >> * RelationInitIndexAccessInfo dumps core when it tries to fetch >> fields out of relation->rd_indextuple, which has been reset to NULL. >> >> Now, it turns out this scenario cannot happen in the standard Postgres >> code as it stands today (if it could, we'd have seen it in the buildfarm's >> CLOBBER_CACHE_ALWAYS testing). The reason for that is that >> RelationInitIndexAccessInfo is actually only called on newly-minted >> Relation structs that are not yet installed into the relcache hash table; >> hence RelationCacheInvalidate can't find them to zap them. It can happen >> in Salesforce's modified code though, and it's not hard to imagine that >> future changes in the community version might expose the same hazard. >> (For one reason, the current technique implies that an error halfway >> through relcache load results in leaking the Relation struct and >> subsidiary data; we might eventually decide that's not acceptable.) >> >> We could just ignore the problem until/unless it becomes real for us, >> but now that I've figured it out I'm inclined to do something before >> the knowledge disappears again. >> >> The specific reason there's a problem is that there's a disconnect between >> RelationClearRelation's test for whether a relcache entry has valid index >> info (it checks whether rd_indexcxt != NULL) and the order of operations >> in RelationInitIndexAccessInfo (creating the index context is not the >> first thing it does). Given that if RelationClearRelation thinks the >> info is valid, what it does is call RelationReloadIndexInfo which needs >> rd_index to be valid, I'm thinking the best fix is to leave the order of >> operations in RelationInitIndexAccessInfo alone and instead make the >> "relcache entry has index info" check be "rd_index != NULL". There might >> need to be some additional work to keep RelationReloadIndexInfo from >> setting rd_isvalid = true too soon. >> >> Thoughts? > > Doesn't seem like a bad thing to clean up. Doesn't sound bad to me either to reinforce things on HEAD... It's been months since this has been posted, are you working on a patch? -- Michael
pgsql-hackers by date: