Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date
Msg-id 528A27B1.1050204@vmware.com
Whole thread Raw
In response to Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE  (Peter Geoghegan <pg@heroku.com>)
Responses Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
List pgsql-hackers
On 14.10.2013 07:12, Peter Geoghegan wrote:
> On Wed, Oct 9, 2013 at 1:11 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Unfortunately, I have a very busy schedule in the month ahead,
>> including travelling to Ireland and Japan, so I don't think I'm going
>> to get the opportunity to work on this too much. I'll try and produce
>> a V4 that formally proposes some variant of my ideas around visibility
>> of locked tuples.
>
> V4 is attached.
>
> Most notably, this adds the modifications to HeapTupleSatisfiesMVCC(),
> though they're neater than in the snippet I sent earlier.
>
> There is also some clean-up around row-level locking. That code has
> been simplified. I also try and handle serialization failures in a
> better way, though that really needs the attention of a subject matter
> expert.
>
> There are a few additional XXX comments highlighting areas of concern,
> particularly around serializable behavior. I've deferred making higher
> isolation levels care about wrongfully relying on the special
> HeapTupleSatisfiesMVCC() exception (e.g. they won't throw a
> serialization failure, mostly because I couldn't decide on where to do
> the test on time prior to travelling tomorrow).
>
> I've added code to do heap_prepare_insert before value locks are held.
> Whatever our eventual value locking implementation, that's going to be
> a useful optimization. Though unfortunately I ran out of time to give
> this the scrutiny it really deserves, I suppose that it's something
> that we can return to later.
>
> I ask that reviewers continue to focus on concurrency issues and broad
> design issues, and continue to defer discussion about an eventual
> value locking implementation. I continue to think that that's the most
> useful way of proceeding for the time being. My earlier points about
> probable areas of concern [1] remain a good place for reviewers to
> start.

I think it's important to recap the design goals of this. I don't think 
these have been listed before, so let me try:

* It should be usable and perform well for both large batch updates and 
small transactions.

* It should perform well both when there are no duplicates, and when 
there are lots of duplicates

And from that follows some finer requirements:

* Performance when there are no duplicates should be close to raw INSERT 
performance.

* Performance when all rows are duplicates should be close to raw UPDATE 
performance.

* We should not leave behind large numbers of dead tuples in either case.

Anything else I'm missing?


What about exclusion constraints? I'd like to see this work for them as 
well. Currently, exclusion constraints are checked after the tuple is 
inserted, and you abort if the constraint was violated. We could still 
insert the heap and index tuples first, but instead of aborting on 
violation, we would kill the heap tuple we already inserted and retry. 
There are some complications there, like how to wake up any other 
backends that are waiting to grab a lock on the tuple we just killed, 
but it seems doable.

That would, however, perform badly and leave garbage behind if there are 
duplicates. A refinement of that would be to first check for constraint 
violations, then insert the tuple, and then check again. That would 
avoid the garbage in most cases, but would perform much more poorly when 
there are no duplicates, because it needs two index scans for every 
insertion. A further refinement would be to keep track of how many 
duplicates there have been recently, and switch between the two 
strategies based on that.

That cost of doing two scans could be alleviated by using 
markpos/restrpos to do the second scan. That is presumably cheaper than 
starting a whole new scan with the same key. (markpos/restrpos don't 
currently work for non-MVCC snapshots, so that'd need to be fixed, though)

And that detour with exclusion constraints takes me back to the current 
patch :-). What if you implemented the unique check in a similar fashion 
too (when doing INSERT ON DUPLICATE KEY ...)? First, scan for a 
conflicting key, and mark the position. Then do the insertion to that 
position. If the insertion fails because of a duplicate key (which was 
inserted after we did the first scan), mark the heap tuple as dead, and 
start over. The indexam changes would be quite similar to the changes 
you made in your patch, but instead of keeping the page locked, you'd 
only hold a pin on the target page (if even that). The first indexam 
call would check that the key doesn't exist, and remember the insert 
position. The second call would re-find the previous position, and 
insert the tuple, checking again that there really wasn't a duplicate 
key violation. The locking aspects would be less scary than your current 
patch.

I'm not sure if that would perform as well as your current patch. I must 
admit your current approach is pretty optimal performance-wise. But I'd 
like to see it, and that would be a solution for exclusion constraints 
in any case.

One fairly limitation with your current approach is that the number of 
lwlocks you can hold simultaneously is limited (MAX_SIMUL_LWLOCKS == 
100). Another limitation is that the minimum for shared_buffers is only 
16. Neither of those is a serious problem in real applications - no-one 
runs with shared_buffers=16 and no sane schema has a hundred unique 
indexes, but it's still something to consider.

- Heikki



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: additional json functionality
Next
From: Tom Lane
Date:
Subject: Re: inherit support for foreign tables