Re: RFC: Making TRUNCATE more "MVCC-safe" - Mailing list pgsql-hackers

From Robert Haas
Subject Re: RFC: Making TRUNCATE more "MVCC-safe"
Date
Msg-id CA+TgmobSDtyABRFvN=++m8zPUjoYSmm73Y99K1KoGyKOaq_rew@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Making TRUNCATE more "MVCC-safe"  (Noah Misch <noah@leadboat.com>)
Responses Re: RFC: Making TRUNCATE more "MVCC-safe"
Re: RFC: Making TRUNCATE more "MVCC-safe"
List pgsql-hackers
On Fri, Feb 10, 2012 at 11:46 PM, Noah Misch <noah@leadboat.com> wrote:
> On Fri, Feb 10, 2012 at 01:59:18PM -0500, Robert Haas wrote:
>> On Fri, Feb 10, 2012 at 6:42 AM, Noah Misch <noah@leadboat.com> wrote:
>> > I like the design you have chosen. ?It would find applications beyond
>> > TRUNCATE, so your use of non-specific naming is sound. ?For example, older
>> > snapshots will see an empty table "t" after "CREATE TABLE t AS SELECT 1"
>> > commits; that's a comparable MVCC anomaly. ?Some of our non-MVCC-safe commands
>> > should perhaps just become MVCC-safe, but there will always be use cases for
>> > operations that shortcut MVCC. ?When one truly does want that, your proposal
>> > for keeping behavior consistent makes plenty of sense.
>>
>> I guess I'm not particularly excited by the idea of trying to make
>> TRUNCATE MVCC-safe.  I notice that the example involves the REPEATABLE
>> READ isolation level, which is already known to be busted in a variety
>> of ways; that's why we now have SERIALIZABLE, and why most people use
>> READ COMMITTED.  Are there examples of this behavior at other
>> isolation levels?
>
> I've yet to see an MVCC anomaly that one can reproduce at REPEATABLE READ and
> not at READ COMMITTED.  They tend to be narrow race conditions at READ
> COMMITTED, yet easy to demonstrate at REPEATABLE READ.  Related:
> http://archives.postgresql.org/pgsql-performance/2011-02/msg00451.php

Yeah.  Well, that's actually an interesting example, because it
illustrates how general this problem is.  We could potentially get
ourselves into a situation where just about every system catalog table
needs an xmin field to store the point at which the object came into
existence - or for that matter, was updated.  But it's not quite the
same as the xmin of the row itself, because some updates might be
judged not to matter.  There could also be intermediate cases where
updates are invalidating for some purposes but not others.  I think
we'd better get our hands around more of the problem space before we
start trying to engineer solutions.

> Incidentally, people use READ COMMITTED because they don't question the
> default, not because they know hazards of REPEATABLE READ.  I don't know the
> bustedness you speak of; could we improve the documentation to inform folks?

The example that I remember was related to SELECT FOR UPDATE/SELECT
FOR SHARE.  The idea of those statements is that you want to prevent
the row from being updated or deleted until some other concurrent
action is complete; for example, in the case of a foreign key, we'd
like to prevent the referenced row from being deleted or updated in
the relevant columns until the inserting transaction is committed.
But it doesn't work, because when the updating or deleting process
gets done with the lock wait, they are still using the same snapshot
as before, and merrily do exactly the the thing that the lock-wait was
supposed to prevent.  If an actual UPDATE is used, it's safe (I
think): anyone who was going to UPDATE or DELETE the row will fail
with some kind of serialization error.  But a SELECT FOR UPDATE that
commits is treated more like an UPDATE that rolls back: it's as if the
lock never existed.  Someone (Florian?) proposed a patch to change
this, but it seemed problematic for reasons I no longer exactly
remember.

When using an actual foreign key, we work around this by taking a new
snapshot to cross-check that things haven't changed under us, but
user-level code can't do that.  At READ COMMITTED, depending on the
situation, either the fact that we take new snapshots pretty
frequently or the EPQ machinery sometimes make things work sensibly
anyway, and at SERIALIZABLE, SSI prevents these kinds of anomalies.
But REPEATABLE READ has no protection.  I wish I could find the thread
where we discussed this before.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: index-only quals vs. security_barrier views
Next
From: Andres Freund
Date:
Subject: Re: Removing special case OID generation