Re: Bogus logic in RelationBuildPartitionDesc - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Bogus logic in RelationBuildPartitionDesc
Date
Msg-id CA+TgmoZSig0uAmbWafO=fgQTb+W6hbQbX+bxkkPD7m8jnOy7UQ@mail.gmail.com
Whole thread Raw
In response to Bogus logic in RelationBuildPartitionDesc  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Dec 21, 2019 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I just had to retrieve my jaw from the floor after reading this
> bit in RelationBuildPartitionDesc:
>
>          * The system cache may be out of date; if so, we may find no pg_class
>          * tuple or an old one where relpartbound is NULL.  In that case, try
>          * the table directly.  We can't just AcceptInvalidationMessages() and
>          * retry the system cache lookup because it's possible that a
>          * concurrent ATTACH PARTITION operation has removed itself to the
>          * ProcArray but yet added invalidation messages to the shared queue;
>          * InvalidateSystemCaches() would work, but seems excessive.
>
> As far as I can see, this argument is wrong on every count, and if it
> were right, the code it is defending would still be wrong.
>
> In the first place, it's claiming, based on no evidence, that our whole
> approach to syscaches is wrong.  If this code needs to deal with obsolete
> syscache entries then so do probably thousands of other places.

No, because ATTACH PARTITION uses only ShareUpdateExclusiveLock,
unlike most other DDL.

> In the second place, the argument that AcceptInvalidationMessages wouldn't
> work is BS, because the way that that *actually* works is that transaction
> commit updates clog and sends inval messages before it releases locks.
> So if you've acquired enough of a lock to be sure that the data you want
> to read is stable, then you do not need to worry about whether you've
> received any relevant inval messages.  You have --- and you don't even
> need to call AcceptInvalidationMessages for yourself, because lock
> acquisition already did, see e.g. LockRelationOid.

No, because ATTACH PARTITION uses only ShareUpdateExclusiveLock, which
does not conflict with the AccessShareLock required to build a cache
entry.

> In the third place, if this imaginary risk that the syscache was out of
> date were real, the code would be completely failing to deal with it,
> because all it is testing is whether it found a null relpartbound value.
> That wouldn't handle the case where a non-null relpartbound is obsolete,
> which is what you'd expect after ATTACH PARTITION.

No, that's what you'd expect after DETACH PARTITION, but that takes
AccessExclusiveLock, so the problem doesn't occur.

> Furthermore, if all of the above can be rebutted, then what's the argument
> that reading pg_class directly will produce a better answer?  The only way
> that any of this could be useful is if you're trying to read data that is
> changing under you because you didn't take an adequate lock.  In that case
> there's no guarantee that what you will read from pg_class is up-to-date
> either.

The argument is that the only possible change is the concurrent
addition of a partition, and therefore the only thing that happen is
to go from NULL to non-NULL. It can't go from non-NULL to NULL, nor
from one non-NULL value to another.

> Unsurprisingly, the code coverage report shows that this code path is
> never taken.  I think we could dike out partdesc.c lines 113-151
> altogether, and make the code just above there look more like every
> other syscache access in the backend.

It turns out that I tested this, and that if you do that, it's
possible to produce failures. It's very hard to do so in the context
of a regression test because they are low-probability, but they do
happen. I believe some of the testing details are in the original
thread that's probably linked from the commit message that added those
lines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Robert Haas
Date:
Subject: Re: backup manifests