Re: Reducing some DDL Locks to ShareLock - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Reducing some DDL Locks to ShareLock
Date
Msg-id 7321.1226268766@sss.pgh.pa.us
Whole thread Raw
In response to Re: Reducing some DDL Locks to ShareLock  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Reducing some DDL Locks to ShareLock
List pgsql-hackers
Reviewing away ...

There's a fairly serious problem with this patch, which is that it
overlooks one of the reasons that index_update_stats can work the
way it does:
    * 3. Because we execute CREATE INDEX with just share lock on the parent    * rel (to allow concurrent index
creations),an ordinary update could    * suffer a tuple-concurrently-updated failure against another CREATE    * INDEX
committingat about the same time.  We can avoid that by having    * them both do nontransactional updates (we assume
theywill both be                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^    * trying to change the
pg_classrow to the same thing, so it doesn't      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    * matter which
goesfirst).
 

While the above assumption would still be true for two CreateTrigger
operations happening in parallel, it is surely *not* true for, say,
CreateTrigger and CreateIndex in parallel.

The window for a race condition is actually fairly wide, because the
various functions that call heap_inplace_update mostly start with a
tuple they got from syscache, which means it probably reflects the most
recent commit, and could be missing changes from recent nontransactional
updates.  So the second guy to apply his change could wipe out the first
guy's change.

I looked at the other existing uses of heap_inplace_update, and I think
there could possibly be a similar issue for vac_update_datfrozenxid(),
though the chance of creating a real problem there seems vanishingly
small.  In any case the proposed patch has got lots of chances for
trouble since it introduces several different not-mutually-monotonic
ways of changing a pg_class entry nontransactionally.

I do not think this is a fatal problem, but what it means is that we
have to get some kind of short-term lock on the target tuple and be
sure we read the actual old tuple contents *after* acquiring that lock.
It would seem to be a good idea to fix all the uses of
heap_inplace_update to work that way, even if they seem safe at the
moment.

Any thoughts about the best way to do it?  My immediate inclination is
to use heap_lock_tuple but it's a bit expensive.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Re: Hot standby v5 patch - Databases created post backup remain inaccessible + replica SIGSEGV when coming out of standby
Next
From: "David Rowley"
Date:
Subject: Re: Windowing Function Patch Review -> Performance Comparison.