Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date
Msg-id CAM3SWZSvSrTzPhjNPjahtJ0rFfS-gJFhU86Vpewf+eO8GwZXNQ@mail.gmail.com
Whole thread Raw
In response to Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
List pgsql-hackers
On Wed, Oct 9, 2013 at 11:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> This patch is still marked as "Needs Review" in the CommitFest
> application.  There's no reviewer, but in fact Andres and I both spent
> quite a lot of time providing design feedback (probably more than I
> spent on any other CommitFest patch).

Right, thank you both.

> I think there are still some design considerations to work out here,
> but honestly I'm not totally sure what the remaining points of
> disagreement are.  It would be nice to here the opinions of a few more
> people on the concurrency issues, but beyond that I think that a lot
> of this is going to boil down to whether the details of the value
> locking can be made to seem palatable enough and sufficiently
> low-overhead in the common case.  I don't believe we can comment on
> that in the abstract.

I agree that we cannot comment on it in the abstract. I am optimistic
that we can make the value locking work better without regressing the
common cases (especially if we're only concerned about not regressing
users that never use the feature, as opposed to having some
expectation for regular inserters inserting values into the same
ranges as an upserter). That's not my immediate concern, though - my
immediate concern is getting the concurrency and visibility issues
scrutinized.

What would it take to get the patch into a committable state if the
value locking had essentially the same properties (they were held
instantaneously), but were perfect? There is no point in giving the
value locking implementation too much further consideration unless
that question can be answered. In the past I've said that row locking
and value locking cannot be considered separately, but that was when
it was generally assumed that value locks had to persist for a long
time in a way that I don't think is feasible (and I think Robert would
now agree that it's at the very least very hard). Persisting value
locks basically make not regressing the general case hard, when you
think about the implementation. As Robert remarked, regular btree
index insertion blockings on an xid, not a value, and cannot be easily
made to appreciate that the "value lock" that a would-be duplicate
index tuple represents may just be held for a short time, and not the
entire duration of their inserter's transaction.

> There's still some question in my mind as to what the semantics ought
> to be.  I do understand Peter's point that having to specify a
> particular index would be grotty, but I'm not sure it invalidates my
> point that having to work across multiple indexes could lead to
> surprising results in some scenarios. I'm not going to stand here and
> hold my breath, though: if that's the only thing that makes me nervous
> about the final patch, I'll not object to it on that basis.

I should be so lucky!   :-)

Unfortunately, I have a very busy schedule in the month ahead,
including travelling to Ireland and Japan, so I don't think I'm going
to get the opportunity to work on this too much. I'll try and produce
a V4 that formally proposes some variant of my ideas around visibility
of locked tuples.

Here are some things you might not like about this patch, if we're
still assuming that the value locks are prototype and it's useful to
defer discussion around their implementation:

* The lock starvation hazards around going from value locking to row
locking, and retrying if it doesn't work out (i.e. if the row and its
descendant rows cannot be locked without what would ordinarily
necessitate using EvalPlanQual()). I don't see what we could do about
those, other than checking for changes in the rows unique index
values, which would be complex. I understand the temptation to do
that, but the fact is that that isn't going to work all the time -
some unique index value may well change every time. By doing that
you've already accepted whatever hazard may exist, and it becomes a
question of degree. Which is fine, but I don't see that the current
degree is actually much of problem in the real world.

* Reordering of value locks generally. I still need to ensure this
will behave reasonably at higher isolation levels (i.e. they'll get a
serialization failure). I think that Robert accepts that this isn't
inconsistent with read committed's documented behavior, and that it is
useful, and maybe even essential.

* The basic question of whether or not it's possible to lock values
and rows at the same time, and if that matters (because it turns out
what looks like that isn't, because deleters will effectively lock
values without even touching an index). I think Robert saw the
difficulty of doing this, but it would be nice to get a definitive
answer. I think that any MERGE implementation worth its salt will not
deadlock without the potential for multiple rows to be locked in an
inconsistent order, so this shouldn't either, and as I believe I
demonstrated, value locks and row locks should not be held at the same
time for at least that reason. Right?

* The syntax. I like the composability, and the way it's likely to
become idiomatic to combine it with wCTEs. Others may not.

* The visibility hacks that V4 is likely to have. The fact that
preserving the composable syntax may imply changes to
HeapTupleSatisfiesMVCC() so that rows locked but with no currently
visible version (under conventional rules) are visible to our snapshot
by virtue of having been locked all the same (this only matters at
read committed).

So I think that what this patch really could benefit from is lots of
scrutiny around the concurrency issues. It would be unfair to ask for
that before at least producing a V4, so I'll clean up what I already
have and post it, probably on Sunday.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Auto-tuning work_mem and maintenance_work_mem
Next
From: Peter Geoghegan
Date:
Subject: Re: Auto-tuning work_mem and maintenance_work_mem