Bogus logic in RelationBuildPartitionDesc - Mailing list pgsql-hackers

From Tom Lane
Subject Bogus logic in RelationBuildPartitionDesc
Date
Msg-id 32149.1576952916@sss.pgh.pa.us
Whole thread Raw
Responses Re: Bogus logic in RelationBuildPartitionDesc
List pgsql-hackers
I just had to retrieve my jaw from the floor after reading this
bit in RelationBuildPartitionDesc:

         * The system cache may be out of date; if so, we may find no pg_class
         * tuple or an old one where relpartbound is NULL.  In that case, try
         * the table directly.  We can't just AcceptInvalidationMessages() and
         * retry the system cache lookup because it's possible that a
         * concurrent ATTACH PARTITION operation has removed itself to the
         * ProcArray but yet added invalidation messages to the shared queue;
         * InvalidateSystemCaches() would work, but seems excessive.

As far as I can see, this argument is wrong on every count, and if it
were right, the code it is defending would still be wrong.

In the first place, it's claiming, based on no evidence, that our whole
approach to syscaches is wrong.  If this code needs to deal with obsolete
syscache entries then so do probably thousands of other places.

In the second place, the argument that AcceptInvalidationMessages wouldn't
work is BS, because the way that that *actually* works is that transaction
commit updates clog and sends inval messages before it releases locks.
So if you've acquired enough of a lock to be sure that the data you want
to read is stable, then you do not need to worry about whether you've
received any relevant inval messages.  You have --- and you don't even
need to call AcceptInvalidationMessages for yourself, because lock
acquisition already did, see e.g. LockRelationOid.

In the third place, if this imaginary risk that the syscache was out of
date were real, the code would be completely failing to deal with it,
because all it is testing is whether it found a null relpartbound value.
That wouldn't handle the case where a non-null relpartbound is obsolete,
which is what you'd expect after ATTACH PARTITION.

Furthermore, if all of the above can be rebutted, then what's the argument
that reading pg_class directly will produce a better answer?  The only way
that any of this could be useful is if you're trying to read data that is
changing under you because you didn't take an adequate lock.  In that case
there's no guarantee that what you will read from pg_class is up-to-date
either.

In reality, what this code is doing is examining relations that it found
by reading pg_inherit, using an MVCC snapshot, so I do not see what is the
argument for supposing that the pg_class cache is more out-of-date than
the pg_inherit data.

Unsurprisingly, the code coverage report shows that this code path is
never taken.  I think we could dike out partdesc.c lines 113-151
altogether, and make the code just above there look more like every
other syscache access in the backend.

If somebody's got some actual evidence that this is necessary, and not
a flight of feverish imagination, let's hear it.  (And maybe let's
develop an isolation test that exercises the code path, because there's
sure little reason to believe it works right now.)

            regards, tom lane



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Misleading comment in pg_upgrade.c
Next
From: Bruce Momjian
Date:
Subject: Re: psql small improvement patch