Bogus logic in RelationBuildPartitionDesc - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Bogus logic in RelationBuildPartitionDesc |
Date | |
Msg-id | 32149.1576952916@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Bogus logic in RelationBuildPartitionDesc
|
List | pgsql-hackers |
I just had to retrieve my jaw from the floor after reading this bit in RelationBuildPartitionDesc: * The system cache may be out of date; if so, we may find no pg_class * tuple or an old one where relpartbound is NULL. In that case, try * the table directly. We can't just AcceptInvalidationMessages() and * retry the system cache lookup because it's possible that a * concurrent ATTACH PARTITION operation has removed itself to the * ProcArray but yet added invalidation messages to the shared queue; * InvalidateSystemCaches() would work, but seems excessive. As far as I can see, this argument is wrong on every count, and if it were right, the code it is defending would still be wrong. In the first place, it's claiming, based on no evidence, that our whole approach to syscaches is wrong. If this code needs to deal with obsolete syscache entries then so do probably thousands of other places. In the second place, the argument that AcceptInvalidationMessages wouldn't work is BS, because the way that that *actually* works is that transaction commit updates clog and sends inval messages before it releases locks. So if you've acquired enough of a lock to be sure that the data you want to read is stable, then you do not need to worry about whether you've received any relevant inval messages. You have --- and you don't even need to call AcceptInvalidationMessages for yourself, because lock acquisition already did, see e.g. LockRelationOid. In the third place, if this imaginary risk that the syscache was out of date were real, the code would be completely failing to deal with it, because all it is testing is whether it found a null relpartbound value. That wouldn't handle the case where a non-null relpartbound is obsolete, which is what you'd expect after ATTACH PARTITION. Furthermore, if all of the above can be rebutted, then what's the argument that reading pg_class directly will produce a better answer? The only way that any of this could be useful is if you're trying to read data that is changing under you because you didn't take an adequate lock. In that case there's no guarantee that what you will read from pg_class is up-to-date either. In reality, what this code is doing is examining relations that it found by reading pg_inherit, using an MVCC snapshot, so I do not see what is the argument for supposing that the pg_class cache is more out-of-date than the pg_inherit data. Unsurprisingly, the code coverage report shows that this code path is never taken. I think we could dike out partdesc.c lines 113-151 altogether, and make the code just above there look more like every other syscache access in the backend. If somebody's got some actual evidence that this is necessary, and not a flight of feverish imagination, let's hear it. (And maybe let's develop an isolation test that exercises the code path, because there's sure little reason to believe it works right now.) regards, tom lane
pgsql-hackers by date: