Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers

From Robert Haas
Subject Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date
Msg-id CA+TgmoY7ZZrG5wBPxCvOEvRun0OpeKOYnDx7F1Qfk3E6+vk+Gw@mail.gmail.com
Whole thread Raw
In response to Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE  (Peter Geoghegan <pg@heroku.com>)
Responses Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
List pgsql-hackers
On Sat, Sep 14, 2013 at 6:27 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Note that today there is no guarantee that the original waiter for a
> duplicate-inserting xact to complete will be the first one to get a
> second chance, so I think it's hard to question this on correctness
> grounds. Even if they are released in FIFO order, there is no reason
> to assume that the first waiter will win the race with a second. Most
> obviously, the second waiter may not even ever get the chance to block
> on the same xid at all (so it's not really a waiter at all) and still
> be able to insert, if the blocking-xact aborts after the second
> "waiter" starts its descent but before it checks uniqueness. All this,
> even though the second "waiter" arrived maybe minutes after the first.

ProcLockWakeup() only wakes as many waiters from the head of the queue
as can all be granted the lock without any conflicts.  So I don't
think there is a race condition in that path.

> So far it's
> been a slippery slope type argument that can be equally well used to
> argue against some facet of almost any substantial patch ever
> proposed.

I don't completely agree with that characterization, but you do have a
point.  Obviously, if the differences in the area of interruptibility,
starvation, deadlock risk, etc. can be made small enough relative to
the status quo can be made small enough, then those aren't reasons to
reject the approach.

But I'm skeptical that you're going to be able to accomplish that,
especially without adversely affecting maintainability.  I think the
way that you're proposing to use lwlocks here is sufficiently
different from what the rest of the system does that it's going to be
hard to avoid system-wide affects that can't easily be caught during
code review; and like Andres, I don't share your skepticism about
alternative approaches.

>> For that matter, if the table has more than MAX_SIMUL_LWLOCKS indexes,
>> you'll error out.  In fact, if you get the number of indexes exactly
>> right, you'll exceed MAX_SIMUL_LWLOCKS in visibilitymap_clear() and
>> panic the whole system.
>
> Oh, come on. We can obviously engineer a solution to that problem. I
> don't think I've ever seen a table with close to 100 *unique* indexes.
> 4 or 5 is a very high number. If we just raised on error if someone
> tried to do this with more than 10 unique indexes, I would guess
> that'd we'd get exactly zero complaints about it.

That's not a solution; that's a hack.

> Undetected deadlock is really not much worse than detected deadlock
> here. Either way, it's a bug. And it's something that any kind of
> implementation will need to account for. It's not okay to
> *unpredictably* deadlock, in a way that the user has no control over.
> Today, someone can do an analysis of their application and eliminate
> deadlocks if they need to. That might not be terribly practical much
> of the time, but it can be done. It certainly is practical to do it in
> a localized way. I wouldn't like to compromise that.

I agree that unpredictable deadlocks are bad.  I think the fundamental
problem with UPSERT, MERGE, and this proposal is what happens when the
conflicting tuple is present but not visible to your scan, either
because it hasn't committed yet or because it has committed but is not
visible to your snapshot.  I'm not clear on how you handle that in
your approach.

> If you look at the code, you'll see that I've made very modest
> modifications to LWLockRelease only. I would be extremely surprised if
> the overhead was not only in the noise, but was completely impossible
> to detect through any conventional benchmark. These are the same kind
> of very modest changes made for LWLockAcquireOrWait(), and you said
> nothing about that at the time. Despite the fact that you now appear
> to think that that whole effort was largely a waste of time.

Well, I did have some concerns about the performance impact of that patch:

http://www.postgresql.org/message-id/CA+TgmoaPyQKEaoFz8HkDGvRDbOmRpkGo69zjODB5=7Jh3hbPQA@mail.gmail.com

I also discovered, after it was committed, that it didn't help in the
way I expected:

http://www.postgresql.org/message-id/CA+TgmoY8P3sD=oUViG+xZjmZk5-phuNV39rtfyzUQxU8hJtZxw@mail.gmail.com

It's true that I didn't raise those concerns contemporaneously with
the commit, but I didn't understand the situation well enough at that
time to realize how narrow the benefit was.

I've wished, on a number of occasions, to be able to add more lwlock
primitives.  The problem with that is that if everybody does it, we'll
pretty soon end up with a mess.  I attempted to address that with this
proposal:

http://www.postgresql.org/message-id/CA+Tgmob4YE_k5dpO0T07PNf1SOKPybo+wj4m4FryOS7Z4_yOzg@mail.gmail.com

...but nobody (including me) was very sure that was the right way
forward, and it never went anywhere.  However, I think the basic issue
remains.  I was sad to discover last week that Heikki handled this
problem for the WAL scalability patch by basically copy-and-pasting
much of the lwlock code and then hacking it up.  I think we're well on
our way to an unmaintainable mess already, and I don't want it to get
worse.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: Minmax indexes
Next
From: Noah Misch
Date:
Subject: Re: relscan_details.h