Home > mailing lists

Re: Latent cache flush hazard in RelationInitIndexAccessInfo - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Latent cache flush hazard in RelationInitIndexAccessInfo
Date	September 11, 2015 18:29:26
Msg-id	CA+TgmoZYiEwwAcNmw5mB_a_HRCyi8HRTP4bci2sPdopq4OSKpg@mail.gmail.com Whole thread Raw
In response to	Latent cache flush hazard in RelationInitIndexAccessInfo (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Latent cache flush hazard in RelationInitIndexAccessInfo
List	pgsql-hackers

Tree view

On Wed, Sep 9, 2015 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Some stuff Salesforce has been doing turned up the fact that
> RelationInitIndexAccessInfo is not safe against a relcache flush on
> its target index.  Specifically, the failure goes like this:
>
> * RelationInitIndexAccessInfo loads up relation->rd_indextuple.
>
> * Within one of the subsequent catalog fetches, eg the pg_am read, a
> relcache flush occurs (this can be forced with CLOBBER_CACHE_ALWAYS).
>
> * RelationClearRelation keeps the relcache entry, since it has
> positive refcount, but doesn't see a reason to keep the index-related
> fields.
>
> * RelationInitIndexAccessInfo dumps core when it tries to fetch
> fields out of relation->rd_indextuple, which has been reset to NULL.
>
> Now, it turns out this scenario cannot happen in the standard Postgres
> code as it stands today (if it could, we'd have seen it in the buildfarm's
> CLOBBER_CACHE_ALWAYS testing).  The reason for that is that
> RelationInitIndexAccessInfo is actually only called on newly-minted
> Relation structs that are not yet installed into the relcache hash table;
> hence RelationCacheInvalidate can't find them to zap them.  It can happen
> in Salesforce's modified code though, and it's not hard to imagine that
> future changes in the community version might expose the same hazard.
> (For one reason, the current technique implies that an error halfway
> through relcache load results in leaking the Relation struct and
> subsidiary data; we might eventually decide that's not acceptable.)
>
> We could just ignore the problem until/unless it becomes real for us,
> but now that I've figured it out I'm inclined to do something before
> the knowledge disappears again.
>
> The specific reason there's a problem is that there's a disconnect between
> RelationClearRelation's test for whether a relcache entry has valid index
> info (it checks whether rd_indexcxt != NULL) and the order of operations
> in RelationInitIndexAccessInfo (creating the index context is not the
> first thing it does).  Given that if RelationClearRelation thinks the
> info is valid, what it does is call RelationReloadIndexInfo which needs
> rd_index to be valid, I'm thinking the best fix is to leave the order of
> operations in RelationInitIndexAccessInfo alone and instead make the
> "relcache entry has index info" check be "rd_index != NULL".  There might
> need to be some additional work to keep RelationReloadIndexInfo from
> setting rd_isvalid = true too soon.
>
> Thoughts?

Doesn't seem like a bad thing to clean up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: "Syed, Rahila"
Date: 11 September 2015, 18:24:57
Subject: Re: [PROPOSAL] VACUUM Progress Checker.

From: Robert Haas
Date: 11 September 2015, 18:30:52
Subject: Re: Do Layered Views/Relations Preserve Sort Order ?

Re: Latent cache flush hazard in RelationInitIndexAccessInfo - Mailing list pgsql-hackers

Previous

Next