Re: SSI predicate locking on heap -- tuple or row? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: SSI predicate locking on heap -- tuple or row?
Date
Msg-id BANLkTin1uSxB+NYgq4Kbt5isB0=C9=qdRA@mail.gmail.com
Whole thread Raw
In response to Re: SSI predicate locking on heap -- tuple or row?  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
On Sat, May 21, 2011 at 8:50 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas  wrote:
>> How is an UPDATE different from a DELETE and an INSERT?
>
> That's the question Jeff asked which caused us to revisit this.
>
> There are two answers -- conceptual and implementation.
>
> Conceptually, an updated row is the same row, and a row inserted after a
> delete is a new row.  Note that READ COMMITTED doesn't treat them the
> same on a write conflict.  To give a practical example, police
> departments in Wisconsin typically reassign a badge number to a new
> officer after an existing officer leaves.  Updating an officer record
> keyed by badge number (say, with a new address or a name change) would
> be qualitatively different from deleting an old officer record and
> inserting a new one for a different person now getting the badge number.
>  (OK, so this is somewhat artificial, because they track who had the
> number in what temporal ranges, but you get the idea.)
>
> In the implementation the only difference between an UPDATE in a table
> and a DELETE and INSERT in the same table in the same transaction
> (besides concurrency handling) is the ctid linkage, at least as far as I
> can see.

I think the implementation is what matters here.  I understand that
there are certain situations in which the user might choose to UPDATE
a row and other situations in which they might choose to DELETE and
then INSERT: but the user's intent is basically irrelevant.  If the
system treats those operations in basically the same way, then it
shouldn't be necessary to follow the CTID chain in one case given that
there is no CTID chain in the other case.  Or to put that another way,
if it is necessary to follow the CTID chain, then we should be able to
articulate a reason for that necessity -- something that is materially
different in the UPDATE case.  Otherwise, we're just following the
chain "because it's there".

It seems to me that we can actually state with some degree of
precision what that "material difference" would need to be.  The goal
of SSI is to prevent serialization anomalies that would not be
prevented by snapshot isolation.  Let's suppose that it successfully
does that in the DELETE/INSERT case.  Suppose further that we change
SSI so that it handles the UPDATE case in the same way that it handles
the DELETE/INSERT case.  This change will be incorrect only if there
is a serialization anomaly that snapshot isolation *would have
prevented* in the DELETE/INSERT case that *it does not prevent* in the
update case.  In other words, if SSI needs to be more rigorous in the
UPDATE case, it can only be because snapshot isolation is less
rigorous in that case, and the additional rigor that SSI must apply
there must be exactly equal to whatever snapshot isolation isn't
picking up (as compared with the DELETE/INSERT case).

Does that make any sense?  It seems logical to me, but IJWH.

> So I think that you can't just treat the two things the same in SSI just
> because the PostgreSQL implementation details make them similar; but I
> think that you can treat the two things the same for the reasons I
> worked out at the start of the thread.

Your argument seems reasonable to me; but it would be nice if we could
find a simpler one, because simpler arguments are less likely to be
incorrect.  :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: eviscerating the parser
Next
From: Pavel Stehule
Date:
Subject: Re: about EDITOR_LINENUMBER_SWITCH