Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS
Date
Msg-id 1437972.1600229462@sss.pgh.pa.us
Whole thread Raw
In response to Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS
List pgsql-hackers
Amit Kapila <amit.kapila16@gmail.com> writes:
> So, can we assume that the current code can only cause the problem in
> CCA builds bot not in any practical scenario because after having a
> lock on relation probably there shouldn't be any invalidation which
> leads to this problem?

No.  The reason we expend so much time and effort on CCA testing is
that cache flushes are unpredictable, and they can happen even when
you have a lock on the object(s) in question.  In particular, the
easiest real-world case that could cause the described problem is an
sinval queue overrun that we detect during the GetSubscriptionRelState
call at the bottom of logicalrep_rel_open.  In that case we'll
come back to the caller with the LogicalRepRelMapEntry marked as
already needing revalidation, because the cache inval callback
will have marked *all* of them that way.  That's actually a harmless
condition, because we have lock on the rel so nothing really
changed ... but if we blew away localreloid and the caller needs
to use that, kaboom.

We could imagine marking the entry valid at the very bottom of
logicalrep_rel_open, but that just moves the problem somewhere else.
Any caller that does *any* catalog access while holding open a
LogicalRepRelMapEntry would not be able to rely on its localreloid
staying valid.  That's a recipe for irreproducible bugs, and it's
unnecessary.  In practice the entry is good as long as
we continue to hold a lock on the local relation.  So we should
mark LogicalRepRelMapEntries as potentially-needing-revalidation
in a way that doesn't interfere with active users of the entry.

In short: the value of CCA testing is to model sinval overruns
happening at any point where they could happen.  The real-world
odds of one happening at any given instant are low, but they're
never zero.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Transactions involving multiple postgres foreign servers, take 2