Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers

From Andres Freund
Subject Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date
Msg-id 20130914221524.GF4071@awork2.anarazel.de
Whole thread Raw
In response to Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE  (Peter Geoghegan <pg@heroku.com>)
Responses Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
List pgsql-hackers
On 2013-09-13 14:41:46 -0700, Peter Geoghegan wrote:
> On Fri, Sep 13, 2013 at 12:23 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > The reason I wasn't saying "this will never get accepted" are twofold:
> > a) I don't want to stiffle alternative ideas to the "promises" idea,
> > just because I think it's the way to go. That might stop a better idea
> > from being articulated. b) I am not actually in the position to say it's
> > not going to be accepted.
> 
> Well, the reality is that the promises idea hasn't been described in
> remotely enough detail to compare it to what I have here. I've pointed
> out plenty of problems with it.

Even if you disagree, I still think that doesn't matter in the very
least. You say:

> I think that the details of how this approach compare to others are
> totally pertinent. For me, that's the whole point - getting towards
> something that will balance all of these concerns and be acceptable.

Well, the two other people involved in the discussion so far have gone
on the record saying that the presented approach is not acceptable to
them. And you haven't started reacting to that.

> Yes, it's entirely possible that that could look quite different to
> what I have here. I do not want to reduce all this to a question of
> "is this one design acceptable or not?".

But the way you're discussing it so far is exactly reducing it that way.

If you want the discussion to be about *how* can we implement it that
the various concerns are addressed: fsck*ing great. I am with you there.

In the end, even though I have my usual strong opinions which is the
best way, I don't care which algorithm gets pursued further. At least,
if, and only if, it has a fighting chance of getting committed. Which
this doesn't.

> After all, it was the first thing that
> I considered, and I'm on the record talking about it in the 2012 dev
> meeting. I didn't take that approach for many good reasons.

Well, I wasn't there when you said that ;)

> The reason I ended up here is not because I didn't get the memo about
> holding buffer locks across complex operations being a bad thing. At
> least grant me that. I'm here because in all these years no one has
> come up with a suggestion that doesn't have some very major downsides.
> Like, even worse than this.

I think you're massively, massively, massively overstating the dangers
of bloat here. It's a known problem that's *NOT* getting worse by any of
the other proposals if you compare it with the loop/lock/catch
implementation of upsert that we have today as the only option. And we
*DO* have infrastructure to deal with bloat, even if could use some
improvement. We *don't* have infrastructure with deadlocks on
lwlocks. And we're not goint to get that infrastructure, because it
would even further remove the "lw" part of lwlocks.

> >> As to the rules you refer to, you must mean "These locks are intended
> >> to be short-term: they should not be held for long". I don't think
> >> that they will ever be held for long. At least, when I've managed the
> >> amount of work that a heap_insert() can do better. I expect to produce
> >> a revision where toasting doesn't happen with the locks held soon.
> >> Actually, I've already written the code, I just need to do some
> >> testing.
> >
> > I personally think - and have stated so before - that doing a
> > heap_insert() while holding the btree lock is unacceptable.
> 
> Presumably your reason is essentially that we exclusive lock a heap
> buffer (exactly one heap buffer) while holding shared locks on btree
> index buffers.

It's that it interleaves an already complex but local locking scheme
that required several years to become correct with another that is just
the same. That's an utterly horrid idea.

> Is that really so different to holding an exclusive
> lock on a btree buffer while holding a shared lock on a heap buffer?
> Because that's what _bt_check_unique() does today.

Yes, it it is different. But, in my opinion, bt_check_unique() doing so
is a bug that needs fixing. Not something that we want to extend.

(Note that _bt_check_unique() already needs to deal with the fact that
it reads an unlocked page, because it moves right in some cases)

And, as you say:

> Now, I'll grant you that there is one appreciable difference, which is
> that multiple unique indexes may be involved. But limiting ourselves
> to the primary key or something like that remains an option. And I'm
> not sure that it's really any worse anyway.

I don't think that's an acceptable limitation. If it were something we
could lift in a release or two, maybe, but that's not what you're
talking about.

> > At this point I am a bit confused why you are asking for review.
> 
> I am asking for us, collectively, through consensus, to resolve the
> basic approach to doing this. That was something I stated right up
> front, pointing out details of where the discussion had gone in the
> past. That was my explicit goal. There has been plenty of discussing
> on this down through the years, but nothing ever came from it.

At the moment ISTM you're not conceding on *ANY* points. That's not very
often the way to find concensus.

> Why is this an intractable problem for over a decade for us alone? Why
> isn't this a problem for other database systems? I'm not implying that
> it's because they do this. It's something that I am earnestly
> interested in, though. A number of people have asked me that, and I
> don't have a good answer for them.

Afaik all those go the route of bloat, don't they? Also, at least in the
past, mysql had a long list of caveats around it...

> >> I mean, if we do the promise tuple thing, and there are multiple
> >> unique indexes, what happens when an inserter needs to block pending
> >> the outcome of another transaction?  They had better go clean up the
> >> promise tuples from the other unique indexes that they're trying to
> >> insert into, because they cannot afford to hold value locks for a long
> >> time, no matter how they're implemented.
> >
> > Why? We're using normal transaction visibility rules here. We don't stop
> > *other* values on the same index getting updated or similar.

> Because you're locking a value in some other, earlier unique index,
> all the while waiting *indefinitely* on some other value in a second
> or subsequent one. That isn't acceptable. A bunch of backends would
> back up just because one backend had this contention on the second
> unique index value that the others didn't actually have themselves. My
> design allows those other backends to immediately go through and
> finish.

That argument doesn't make sense to me. You're inserting a unique
value. It completely makes sense that you can only insert one of
them. If it's unclear whether you can insert, you're going to have to
wait. Thats why they are UNIQUE after all. You're describing a complete
nonadvantange here. It's also how unique indexes already work.
Also note, that wait's on xids are properly supervised by deadlock detection.

Even if it had an advantage, not blocking *for the single unique key alone*
opens you to issues of livelocks where several backends retry because of
each other indefinitely.

> Value locks have these kinds of hazards no matter how you implement
> them. Deadlocks, and unreasonable stalling as described here is always
> unacceptable - whether or not the problems are detected at runtime is
> ultimately of marginal interest. Either way, it's a bug.

Whether postgres locks down in a way that can only resolved by kill -9
or whether it aborts a transaction are, like, a couple of magnitude of a
difference.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Assertions in PL/PgSQL
Next
From: Andres Freund
Date:
Subject: Re: GUC for data checksums