Re: Bogus logic in RelationBuildPartitionDesc - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Bogus logic in RelationBuildPartitionDesc |
Date | |
Msg-id | CA+TgmoZSig0uAmbWafO=fgQTb+W6hbQbX+bxkkPD7m8jnOy7UQ@mail.gmail.com Whole thread Raw |
In response to | Bogus logic in RelationBuildPartitionDesc (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Sat, Dec 21, 2019 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I just had to retrieve my jaw from the floor after reading this > bit in RelationBuildPartitionDesc: > > * The system cache may be out of date; if so, we may find no pg_class > * tuple or an old one where relpartbound is NULL. In that case, try > * the table directly. We can't just AcceptInvalidationMessages() and > * retry the system cache lookup because it's possible that a > * concurrent ATTACH PARTITION operation has removed itself to the > * ProcArray but yet added invalidation messages to the shared queue; > * InvalidateSystemCaches() would work, but seems excessive. > > As far as I can see, this argument is wrong on every count, and if it > were right, the code it is defending would still be wrong. > > In the first place, it's claiming, based on no evidence, that our whole > approach to syscaches is wrong. If this code needs to deal with obsolete > syscache entries then so do probably thousands of other places. No, because ATTACH PARTITION uses only ShareUpdateExclusiveLock, unlike most other DDL. > In the second place, the argument that AcceptInvalidationMessages wouldn't > work is BS, because the way that that *actually* works is that transaction > commit updates clog and sends inval messages before it releases locks. > So if you've acquired enough of a lock to be sure that the data you want > to read is stable, then you do not need to worry about whether you've > received any relevant inval messages. You have --- and you don't even > need to call AcceptInvalidationMessages for yourself, because lock > acquisition already did, see e.g. LockRelationOid. No, because ATTACH PARTITION uses only ShareUpdateExclusiveLock, which does not conflict with the AccessShareLock required to build a cache entry. > In the third place, if this imaginary risk that the syscache was out of > date were real, the code would be completely failing to deal with it, > because all it is testing is whether it found a null relpartbound value. > That wouldn't handle the case where a non-null relpartbound is obsolete, > which is what you'd expect after ATTACH PARTITION. No, that's what you'd expect after DETACH PARTITION, but that takes AccessExclusiveLock, so the problem doesn't occur. > Furthermore, if all of the above can be rebutted, then what's the argument > that reading pg_class directly will produce a better answer? The only way > that any of this could be useful is if you're trying to read data that is > changing under you because you didn't take an adequate lock. In that case > there's no guarantee that what you will read from pg_class is up-to-date > either. The argument is that the only possible change is the concurrent addition of a partition, and therefore the only thing that happen is to go from NULL to non-NULL. It can't go from non-NULL to NULL, nor from one non-NULL value to another. > Unsurprisingly, the code coverage report shows that this code path is > never taken. I think we could dike out partdesc.c lines 113-151 > altogether, and make the code just above there look more like every > other syscache access in the backend. It turns out that I tested this, and that if you do that, it's possible to produce failures. It's very hard to do so in the context of a regression test because they are low-probability, but they do happen. I believe some of the testing details are in the original thread that's probably linked from the commit message that added those lines. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: