Re: SERIALIZABLE and INSERTs with multiple VALUES - Mailing list pgsql-general

From Kevin Grittner
Subject Re: SERIALIZABLE and INSERTs with multiple VALUES
Date
Msg-id CACjxUsOZB6yy4aO_v1f9a+NOW3cyVD4ujxB1vyMLNvC40==AEg@mail.gmail.com
Whole thread Raw
In response to Re: SERIALIZABLE and INSERTs with multiple VALUES  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: SERIALIZABLE and INSERTs with multiple VALUES
List pgsql-general
On Wed, Oct 12, 2016 at 8:06 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Oct 13, 2016 at 10:06 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
>> On Wed, Oct 12, 2016 at 3:02 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>>
>>> I agree that the multi-value case is a bug.
>>
>>> I think that it should be pretty obvious to you why the check exists
>>> at all, Kevin. It exists because it would be improper to decide to
>>> take the DO NOTHING path on the basis of some other committed tuple
>>> existing that is not visible to the original MVCC snapshot, at higher
>>> isolation levels.
>>
>> That's only true if it causes a cycle in the apparent order of
>> execution.  If we rip out the check what we have is behavior
>> completely consistent with the other transaction executing first;
>> in other words, it creates a read-write dependency (a/k/a
>> rw-conflict) from the current transaction to the concurrent
>> transaction which succeeds in its insert.  That may or may not
>> cause a cycle, depending on what else is happening.
>
> The "higher isolation levels" probably shouldn't be treated the same way.
>
> I think Peter's right about REPEATABLE READ.  We should definitely
> raise the error immediately as we do in that level, because our RR
> (SI) doesn't care about write skew and all that stuff, it just
> promises that you can only see data in your snapshot.

But the whole point of the special code for both RI and INSERT/ON
CONFLICT is to get "underneath" that and provide a "primitive" that
can see things an application statement can't, for better
performance and error handling.  What SERIALIZABLE promises is that
it runs exactly the same as REPEATABLE READ, but with some
additional monitoring for serialization failure errors in some
places that REPEATABLE READ does not generate them -- this would be
the first and only place that SERIALIZABLE would break that model.
The idea seems completely wrong and arbitrary.

Where do you see a problem if REPEATABLE READ handles INSERT/ON
CONFLICT without error?  In many cases it would actually be
providing a result consistent with a serial execution of the
transactions; and where it doesn't, it would be the same anomalies
that are possible with anything else under REPEATABLE READ.

> We can't allow you to take a different course of action based on
> data that your snapshot can't see,

But that is exactly what INSERT/ON CONFLICT always does!  That is
the only way to avoid the so-called "unprincipled deadlocks" the
feature aims to avoid.

> so the only reasonable thing to do is abandon ship.

I disagree.

> But yeah, the existing code raises false positive serialization
> failures under SERIALIZABLE, and that's visible in the isolation test
> I posted: there is actually a serial order of those transactions with
> the same result.

Exactly.  The error based on the write conflict with ON CONFLICT DO
NOTHING in your patch is really a false positive.  That doesn't
break correctness, but it hurts performance, so it should be
avoided if possible.

> When working on commit fcff8a57 I became suspicious of the way ON
> CONFLICT interacts with SSI, as I mentioned in passing back then[1],
> thinking mainly of false negatives.  I failed to find a
> non-serializable schedule involving ON CONFLICT that was allowed to
> run, though I didn't spend much time on it.   One thing that worries
> me is the final permutation of read-write-unique-4.spec, which
> produces an arguably spurious UCV, that is, a transaction that doesn't
> commit but raises a UCV instead of the serialization failure you might
> expect.  The ON CONFLICT equivalent might be a transaction that takes
> the ON CONFLICT path and then commits, even though it should be
> considered non-serializable.  I would really like to understand that
> case better, and until then I wouldn't bet my boots that it isn't
> possible to commit anomalies using ON CONFLICT under SERIALIZABLE
> without Peter's check (or even with it), despite the fact that it
> reaches predicate locking code via heap_fetch etc.

Hm.  With the duplicate key error I fail to see how any anomaly
could make it to a committed state in the database, although I
agree it is unfortunate that there is that one case where it really
should be considered a serialization failure that we haven't yet
coerced to yield that instead of the duplicate key error.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-general by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: SERIALIZABLE and INSERTs with multiple VALUES
Next
From: Kevin Grittner
Date:
Subject: Re: SERIALIZABLE and INSERTs with multiple VALUES