Re: RFC: Making TRUNCATE more "MVCC-safe" - Mailing list pgsql-hackers

From Robert Haas
Subject Re: RFC: Making TRUNCATE more "MVCC-safe"
Date
Msg-id CA+Tgmobtjc4edN5MY6Z6jzQLp_xsB3gJYAJCB4sqKsg+azztJQ@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Making TRUNCATE more "MVCC-safe"  (Noah Misch <noah@leadboat.com>)
Responses Re: RFC: Making TRUNCATE more "MVCC-safe"  (Simon Riggs <simon@2ndQuadrant.com>)
Re: RFC: Making TRUNCATE more "MVCC-safe"  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Mon, Mar 5, 2012 at 2:22 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Feb 13, 2012 at 09:29:56AM -0500, Robert Haas wrote:
>> On Fri, Feb 10, 2012 at 11:46 PM, Noah Misch <noah@leadboat.com> wrote:
>> > I've yet to see an MVCC anomaly that one can reproduce at REPEATABLE READ and
>> > not at READ COMMITTED. ?They tend to be narrow race conditions at READ
>> > COMMITTED, yet easy to demonstrate at REPEATABLE READ. ?Related:
>> > http://archives.postgresql.org/pgsql-performance/2011-02/msg00451.php
>>
>> Yeah.  Well, that's actually an interesting example, because it
>> illustrates how general this problem is.  We could potentially get
>> ourselves into a situation where just about every system catalog table
>> needs an xmin field to store the point at which the object came into
>> existence - or for that matter, was updated.
>
> I can see this strategy applying to many relation-pertinent system catalogs.
> Do you foresee applications to non-relation catalogs?

Well, in theory, we have similar issues if, say, a query uses a
function that didn't exist at the time the snapshot as taken; the
actual results the user sees may not be consistent with any serial
execution schedule.  And the same could be true for any other SQL
object.  It's unclear that those cases are as compelling as this one,
but then again it's unclear that no one will ever want to fix them,
either.  For example, suppose we have a view v over a table t that
calls a function f.  Somebody alters f to give different results and,
in the same transaction, modifies the contents of t (but no DDL).
This doesn't strike me as a terribly unlikely scenario; the change to
t could well be envisioned as a compensating transaction.  But now if
somebody uses the new definition of f against the old contents of t,
the user may fail to get what they were hoping for out of bundling
those changes together in one transaction.

Now, maybe we're never going to fix those kinds of anomalies anyway,
but if we go with this architecture, then I think the chances of it
ever being palatable to try are pretty low.

> In any event, I think a pg_class.relvalidxmin is the right starting point.
> One might imagine a family of relvalidxmin, convalidxmin, indcheckxmin
> (already exists), inhvalidxmin, and attvalidxmin.  relvalidxmin is like the
> AccessExclusiveLock of that family; it necessarily blocks everything that
> might impugn the others.  The value in extending this to more catalogs is the
> ability to narrow the impact of failing the check.  A failed indcheckxmin
> comparison merely excludes plans involving the index.  A failed inhvalidxmin
> check might just skip recursion to the table in question.  Those are further
> refinements, much like using weaker heavyweight lock types.

Yes, good parallel.

>> But it's not quite the
>> same as the xmin of the row itself, because some updates might be
>> judged not to matter.  There could also be intermediate cases where
>> updates are invalidating for some purposes but not others.  I think
>> we'd better get our hands around more of the problem space before we
>> start trying to engineer solutions.
>
> I'm not seeing that problem.  Any operation that would update some xmin
> horizon should set it to the greater of its current value and the value the
> operation needs for its own correctness.  If you have something in mind that
> needs more, could you elaborate?

Well, consider something like CLUSTER.  It's perfectly OK for CLUSTER
to operate on a table that has been truncated since CLUSTER's snapshot
was taken, and no serialization anomaly is created that would not have
already existed as a result of the non-MVCC-safe TRUNCATE.  On the
other hand, if CLUSTER operates on a table that was created since
CLUSTER's snapshot was taken, then you have a bona fide serialization
anomaly.  Maybe not a very important one, but does that prove that
there's no significant problem of this type in general, or just
nobody's thought through all the cases yet?  After all, the issues
with CREATE TABLE/TRUNCATE vs. a concurrent SELECT have been around
for a very long time, and we're only just getting around to looking at
them, so I don't have much confidence that there aren't other cases
floating around out there.

I guess another way to put this is that you could need "locks" of a
great number of different strengths to really handle all the cases.
It's going to be unappealing to, say, set the relation xmin when
setting the constraint xmin would do, or to fail for a concurrent
TRUNCATE as well as a concurrent CREATE TABLE when only the latter
logically requires a failure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Command Triggers, patch v11
Next
From: Simon Riggs
Date:
Subject: Re: RFC: Making TRUNCATE more "MVCC-safe"