Thread: Promise index tuples for UPSERT

Promise index tuples for UPSERT

From
Simon Riggs
Date:
Summary of algorithm to use "promise tuples" for concurrency control
during UPSERT

1. Perform btree search to location of key, if it exists.
a) If an unkilled index tuple exists, we decide this is an UPDATE and
drop straight thru to step 2
b) If it does not exist, insert a "promise" tuple into unique index(s)
- marked with the xid of the inserting transaction, but using the key.
This happens while the page is locked, so it is not possible to insert
a second promise tuple concurrently.
Record the btree blockid on the index scan and move to step 3
When later insert scans see the promise tuple they perform
XactLockTableWait() and when they get control they look again for the
key. If they find a promise tuple with an aborted xid they replace
that value with their own xid and continue as a). Otherwise b).

2. Find existing heap tuple
Find heap tuple.
Check it is actually valid. If not, go back to (1), kill the prior
tuple and follow 1b) path
If it is valid, perform heap_update as normal.

3. Insert new heap tuple
Perform heap_insert
Re-find index tuple using the btree blockid recorded at step 1; this
may require moving right until we find the actual value we are looking
for, so block splits don't negatively affect this approach.
Once re-found we change the index tuple from a promise tuple to a
normal index tuple, by setting tid and removing promise flag. Tuple
remains same length because the value was known when promise tuple
inserted, so this is an inplace update.
Insert other index values normally.

If a transaction that inserted a promise tuple dies, how is that cleaned up?
Any user that sees a dead promise tuple with a value they want will clean it up.
Dead promise tuples can be removed as needed, just as killed tuples
currently are.
VACUUM can remove dead transactions from index as it scans, no problems.

Index bloat is less of a problem than with normal inserts since there
are additional ways of removing promise tuples. Only one index tuple
at a time can have a specific value, so we would actually reduce heap
bloat caused by concurrent inserts.

It's possible we find existing rows that we can't see according to our
snapshot. That is handled in exactly the same as the way we treat
UPDATEs.

There is a potential loop here in that we might find an index tuple,
follow it, find out the tuple is actually dead then return to insert a
promise tuple, only to find that someone else just did that and we
have to wait/re-follow the link to update the new tuple. That is an
extremely unlikely race condition, though possible; there is no heavy
locking to prevent that in this approach. In principle deadlocks are
possible from that, but that is not any different from the current
case where people that insert same key at same time might cause
deadlocks. Its not a common application pattern and not something we
should be protecting against.

All of this is only needed for unique indexes.

It can handled by a new API call for acquire_value_lock() and
release_value_lock(), or similar.

Advantages
* We don't do anything weird in the heap - if this breaks, data is not corrupt
* There is no heap bloat or index bloat above existing levels
* The approach leverages existing mechanism for transaction waiting
* Optimistic approach to value locking will improve performance over
heavy weight locking

Disadvantages
* Not written yet - <1 month to code, still possible for 9.5, doesn't look hard

Other stated possible disadvantages
* Violates existing invariant that every index tuple has a
corresponding heap tuple, which goes back to the Berkeley days.
Currently, we always create heap tuples first, and physically delete
them last. [Why is that a problem? Could be, but why?]
("Deleting them last" implies something is being touched in that
regard by this suggestion. I'm not sure what you are referring to)

* Relatedly, concern about breaking VACUUM. VACUUMing of btrees occurs
with a list of TIDs to kill in index, built from the heap. Current
bulk delete callback cannot be used for this. Super-exclusive lock is
needed to delete tuples in btree page (i.e. VACUUM). Normally skipping
of LockBufferForCleanup() (as added by bbb6e559) works okay in heap
because it tends to imply that list of tids won't be bulk deleted in
index, where we currently cannot skip for failure to quickly acquire
super exclusive lock. So this could make the problem addressed by
bbb6e559 return to some extent.
[Don't see any problems; just test the xid as we scan a promise tuple
and remove it if needed]

"Index-only" bloat becomes a possibility. Implications are not well understood.
[FUD, no reason to believe there is a problem.]

We have to re-find any items using an ordinary index scan, not a tid.
Must match our xid to that.
[Explained above, easy and efficient.]

Doesn't have a strategy for dealing with unprincipled deadlocks, at
least at the moment. Presumably some aspects of #2 could be adopted
here.
[FUD. Unprincipled deadlock still not properly explained as yet. No
way of telling whether it will happen with this approach, or not].

Comments please.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Heikki Linnakangas
Date:
On 10/01/2014 02:34 PM, Simon Riggs wrote:
> Summary of algorithm to use "promise tuples" for concurrency control
> during UPSERT
>
> 1. Perform btree search to location of key, if it exists.
> a) If an unkilled index tuple exists, we decide this is an UPDATE and
> drop straight thru to step 2
> b) If it does not exist, insert a "promise" tuple into unique index(s)
> - marked with the xid of the inserting transaction, but using the key.
> This happens while the page is locked, so it is not possible to insert
> a second promise tuple concurrently.
> Record the btree blockid on the index scan and move to step 3
> When later insert scans see the promise tuple they perform
> XactLockTableWait() and when they get control they look again for the
> key. If they find a promise tuple with an aborted xid they replace
> that value with their own xid and continue as a). Otherwise b).

XactLockTableWait() waits until the end of transaction, that's not you 
want here. If the backend that inserted the promise tuple decides to not 
proceed with the insertion, and removes the promise tuple, the backend 
waiting on it needs to be woken up more or less immediately, not when 
the transaction completes.

I had this exact same issue in my earlier patch versions, fixed it with 
a new kind of heavy-weight lock, and a new field in PGPROC 
(http://www.postgresql.org/message-id/52D00D2D.6030307@vmware.com). That 
wasn't very pretty, but it got the job done. Some other design might be 
better.

- Heikki




Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Wed, Oct 1, 2014 at 4:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Summary of algorithm to use "promise tuples" for concurrency control
> during UPSERT

> Index bloat is less of a problem than with normal inserts since there
> are additional ways of removing promise tuples. Only one index tuple
> at a time can have a specific value, so we would actually reduce heap
> bloat caused by concurrent inserts.

I am not all that concerned about bloat. I didn't think it was a major
advantage of #1 relative to #2 when I investigated the differences,
and looked at the numbers. And I'm speaking as the advocate of the
design with zero bloat.

> It's possible we find existing rows that we can't see according to our
> snapshot. That is handled in exactly the same as the way we treat
> UPDATEs.

This isn't a special case in the patch. It's more like REPEATABLE READ
and SERIALIZABLE have a special responsibility to make sure they're
not updating an invisible-to-MVCC-snapshot (not "instantaneously
invisible", by which I mean invisible in the
HeapTupleSatisfiesUpdate() sense). Otherwise, we don't care if it's
MVCC-visible or not.

I imagine that by "That is handled in exactly the same as the way we
treat UPDATEs", you mean that we do the EvalPlanQual() stuff. We only
do that in the event of a concurrent UPDATE/DELETE, though. Otherwise,
we trust the underlying relation scan to accurate represent that
tuples pulled up are visible. So I don't understand the comparison
(but tell me if I've misunderstood).

> There is a potential loop here in that we might find an index tuple,
> follow it, find out the tuple is actually dead then return to insert a
> promise tuple, only to find that someone else just did that and we
> have to wait/re-follow the link to update the new tuple. That is an
> extremely unlikely race condition, though possible; there is no heavy
> locking to prevent that in this approach. In principle deadlocks are
> possible from that, but that is not any different from the current
> case where people that insert same key at same time might cause
> deadlocks. Its not a common application pattern and not something we
> should be protecting against.

People are going to use this feature in cases where they could almost
use an UPDATE. It will happen if you let it happen. But even if you
don't, a guarantee is infinitely more useful than a non-guarantee to
app developers. I'll be up-front about this: you have very little
chance of getting me to budge on this point. I *really* hate the idea
of allowing this kind of thing. I don't think I'm alone here.

> All of this is only needed for unique indexes.
>
> It can handled by a new API call for acquire_value_lock() and
> release_value_lock(), or similar.
>
> Advantages
> * We don't do anything weird in the heap - if this breaks, data is not corrupt

Index corruption leads to wrong answers. I don't think this is a very
good argument, fwiw.

> * There is no heap bloat or index bloat above existing levels

Fair enough.

> * The approach leverages existing mechanism for transaction waiting

That's not really an advantage. That more or less applies to all designs.

> * Optimistic approach to value locking will improve performance over
> heavy weight locking

There is no evidence that promise index tuples will perform better. #2
didn't perform better than #1.

> Disadvantages
> * Not written yet - <1 month to code, still possible for 9.5, doesn't look hard

Maybe, but that is beside the point, which is: If there were any
fundamental problems with either #1 or #2, I think I'd have figured
them out by now. They are less risky today - it might be that #3 turns
out to have the same properties, but I cannot be sure. That counts for
something. I feel perfectly entitled to hold that kind of uncertainty
against any design, tbh.

> Other stated possible disadvantages
> * Violates existing invariant that every index tuple has a
> corresponding heap tuple, which goes back to the Berkeley days.
> Currently, we always create heap tuples first, and physically delete
> them last. [Why is that a problem? Could be, but why?]

Unknown unknowns. Huge amounts of code must be audited to figure out
that it isn't an issue. So I don't know, but then I'm not the one
making the proposal.

> ("Deleting them last" implies something is being touched in that
> regard by this suggestion. I'm not sure what you are referring to)

The uncertain scope of the problem is a big part of the problem.

> * Relatedly, concern about breaking VACUUM. VACUUMing of btrees occurs
> with a list of TIDs to kill in index, built from the heap. Current
> bulk delete callback cannot be used for this. Super-exclusive lock is
> needed to delete tuples in btree page (i.e. VACUUM). Normally skipping
> of LockBufferForCleanup() (as added by bbb6e559) works okay in heap
> because it tends to imply that list of tids won't be bulk deleted in
> index, where we currently cannot skip for failure to quickly acquire
> super exclusive lock. So this could make the problem addressed by
> bbb6e559 return to some extent.
> [Don't see any problems; just test the xid as we scan a promise tuple
> and remove it if needed]
>
> "Index-only" bloat becomes a possibility. Implications are not well understood.
> [FUD, no reason to believe there is a problem.]

"no reason to believe there is a problem" isn't that good a defense. I
know this from experience.

> Doesn't have a strategy for dealing with unprincipled deadlocks, at
> least at the moment. Presumably some aspects of #2 could be adopted
> here.
> [FUD. Unprincipled deadlock still not properly explained as yet. No
> way of telling whether it will happen with this approach, or not].

What's unclear about unprincipled deadlocks? Heikki went to
considerable effort to have his design meet my standard of not doing
that. I think that there is almost no controversy about whether or not
we need to avoid that at this stage.

If you want a more practical definition, I think it's very unlikely
that I'll accuse any implementation of exhibiting this problem once it
doesn't deadlock with the test-case I presented Heikki with last year
[1].

[1] http://www.postgresql.org/message-id/CAM3SWZShbE29KpoD44cVc3vpZJGmDer6k_6FGHiSzeOZGmTFSQ@mail.gmail.com
-- 
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Wed, Oct 1, 2014 at 12:54 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> XactLockTableWait() waits until the end of transaction, that's not you want
> here. If the backend that inserted the promise tuple decides to not proceed
> with the insertion, and removes the promise tuple, the backend waiting on it
> needs to be woken up more or less immediately, not when the transaction
> completes.

Simon has not been inconsistent here: he has said that deadlocks may
be possible. I happen to think that allowing them would be a major
mistake on our part, but that's another story.


-- 
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Wed, Oct 1, 2014 at 12:59 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Oct 1, 2014 at 12:54 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> XactLockTableWait() waits until the end of transaction, that's not you want
>> here. If the backend that inserted the promise tuple decides to not proceed
>> with the insertion, and removes the promise tuple, the backend waiting on it
>> needs to be woken up more or less immediately, not when the transaction
>> completes.
>
> Simon has not been inconsistent here: he has said that deadlocks may
> be possible. I happen to think that allowing them would be a major
> mistake on our part, but that's another story.

Don't forget that not waiting on XactLockTableWait(), but rather
waiting on a "speculative insertion token" wasn't just the thing that
made your prototype not deadlock - it was also the thing that made its
performance more or less comparable to that of my original value
locking design. Your prototype performed only a fraction as well as my
design before that last revision.

So that's two *excellent* reasons to not use XactLockTableWait() here.

-- 
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Simon Riggs
Date:
On 1 October 2014 20:54, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 10/01/2014 02:34 PM, Simon Riggs wrote:
>>
...
>> When later insert scans see the promise tuple they perform
>> XactLockTableWait() and when they get control they look again for the
>> key. If they find a promise tuple with an aborted xid they replace
>> that value with their own xid and continue as a). Otherwise b).
>
>
> XactLockTableWait() waits until the end of transaction, that's not you want
> here. If the backend that inserted the promise tuple decides to not proceed
> with the insertion, and removes the promise tuple, the backend waiting on it
> needs to be woken up more or less immediately, not when the transaction
> completes.

There is no "remove promise tuple" part of the above design.

> I had this exact same issue in my earlier patch versions, fixed it with a
> new kind of heavy-weight lock, and a new field in PGPROC
> (http://www.postgresql.org/message-id/52D00D2D.6030307@vmware.com). That
> wasn't very pretty, but it got the job done. Some other design might be
> better.

Currently, if some fool developer decides to insert rows that already
duplicate values in the table, then currently he inserts a heap row,
then sees a potential conflict in the index and waits for the
conflicting xact to end. His fool insert, his wait. That's how btree
does this now.
I don't see any reason why we need to do better for Upsert.

If the row already exists we need to be able to quickly change into an
update, but we still only support one write xact at a time.

The value in the index needs to be protected by a block level lock, so
we can check it quickly, but the eventual heap work is serialized by
transactional semantics.

I think a little perspective is due here and we should stick to the
main use case, not cater for bizarre edge cases. What we are looking
for is something that can decide whether it is an insert or an update
and progress quickly with that. It needs to be correct, but there is
no requirement to be faster than currently possible, in the case of
concurrency.

Any form of tuple locking that uses the general lock manager will not
be usable. I can't see it is worth the overhead of doing that to
protect against deadlocks that would only be experienced by people
doing foolish things.

Please let's not over think and over engineer this. No flying cars please.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Heikki Linnakangas
Date:
On 10/03/2014 11:07 AM, Simon Riggs wrote:
> On 1 October 2014 20:54, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>> On 10/01/2014 02:34 PM, Simon Riggs wrote:
>>>
> ...
>>> When later insert scans see the promise tuple they perform
>>> XactLockTableWait() and when they get control they look again for the
>>> key. If they find a promise tuple with an aborted xid they replace
>>> that value with their own xid and continue as a). Otherwise b).
>>
>>
>> XactLockTableWait() waits until the end of transaction, that's not you want
>> here. If the backend that inserted the promise tuple decides to not proceed
>> with the insertion, and removes the promise tuple, the backend waiting on it
>> needs to be woken up more or less immediately, not when the transaction
>> completes.
>
> There is no "remove promise tuple" part of the above design.
>
>> I had this exact same issue in my earlier patch versions, fixed it with a
>> new kind of heavy-weight lock, and a new field in PGPROC
>> (http://www.postgresql.org/message-id/52D00D2D.6030307@vmware.com). That
>> wasn't very pretty, but it got the job done. Some other design might be
>> better.
>
> Currently, if some fool developer decides to insert rows that already
> duplicate values in the table, then currently he inserts a heap row,
> then sees a potential conflict in the index and waits for the
> conflicting xact to end. His fool insert, his wait. That's how btree
> does this now.
> I don't see any reason why we need to do better for Upsert.
>
> If the row already exists we need to be able to quickly change into an
> update, but we still only support one write xact at a time.

That lowers the bar from what I thought everyone agreed on. Namely, if 
two backends run a similar UPSERT command concurrently on a table that 
has more than one unique constraint, they might deadlock, causing one of 
them to throw an error instead of INSERTing or UPDATEing anything.

I'm sure that's useful enough in many applications, but I'd like to have 
a more robust implementation. The shorter we can keep the list of 
caveats, the better.

> The value in the index needs to be protected by a block level lock, so
> we can check it quickly, but the eventual heap work is serialized by
> transactional semantics.
>
> I think a little perspective is due here and we should stick to the
> main use case, not cater for bizarre edge cases.

I'm trying to bisect your thoughts on exactly what use cases you think 
we must support, and which ones you consider bizarre edge cases, and 
what exactly is acceptable behavior in those edge cases.

> What we are looking for is something that can decide whether it is an
> insert or an update and progress quickly with that. It needs to be
> correct, but there is no requirement to be faster than currently
> possible, in the case of concurrency.

I believe we all agree on that.

> Any form of tuple locking that uses the general lock manager will not
> be usable. I can't see it is worth the overhead of doing that to
> protect against deadlocks that would only be experienced by people
> doing foolish things.

Maybe, maybe not, but let's define the acceptable behavior first, and 
think about the implementation second. I'm pretty sure all of the 
approaches discussed so far can be made fast enough, and the bloat 
issues can be made small enough, that it doesn't matter much which one 
we choose from a performance point of view. The differences are in what 
use cases they can support, and the maintainability of the code.

- Heikki




Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Fri, Oct 3, 2014 at 2:03 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> That lowers the bar from what I thought everyone agreed on. Namely, if two
> backends run a similar UPSERT command concurrently on a table that has more
> than one unique constraint, they might deadlock, causing one of them to
> throw an error instead of INSERTing or UPDATEing anything.

It lowers the bar to a level that I am not willing to limbo dance
under. You don't even need two unique constraints. Nothing as
"complicated" as that is required.

When this happens with MySQL, they have the good sense to call it a
bug [1], and even fix it. I find the comparison with conventional
insertion entirely unconvincing.

> I'm sure that's useful enough in many applications, but I'd like to have a
> more robust implementation. The shorter we can keep the list of caveats, the
> better.

INSERT and UPDATE are supposed to be fairly well balanced here.
Conflicts are the norm.

>> The value in the index needs to be protected by a block level lock, so
>> we can check it quickly, but the eventual heap work is serialized by
>> transactional semantics.
>>
>> I think a little perspective is due here and we should stick to the
>> main use case, not cater for bizarre edge cases.
>
>
> I'm trying to bisect your thoughts on exactly what use cases you think we
> must support, and which ones you consider bizarre edge cases, and what
> exactly is acceptable behavior in those edge cases.

"Lots of concurrency" is not an edge-case.

>> Any form of tuple locking that uses the general lock manager will not
>> be usable. I can't see it is worth the overhead of doing that to
>> protect against deadlocks that would only be experienced by people
>> doing foolish things.
>
> Maybe, maybe not, but let's define the acceptable behavior first, and think
> about the implementation second.

+1. Updating a lot with UPSERT is not foolish. That's all it took to
make earlier prototypes deadlock.

> I'm pretty sure all of the approaches
> discussed so far can be made fast enough, and the bloat issues can be made
> small enough, that it doesn't matter much which one we choose from a
> performance point of view. The differences are in what use cases they can
> support, and the maintainability of the code.

+1

What do we get for giving up on not having unprincipled deadlocks
here? What's the advantage? Assuming that this is a bizarre edge-case
(note: it isn't), what do we get in return for giving up on fixing it?

[1] MySQL bug #52020
-- 
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Simon Riggs
Date:
On 3 October 2014 10:03, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> That lowers the bar from what I thought everyone agreed on. Namely, if two
> backends run a similar UPSERT command concurrently on a table that has more
> than one unique constraint, they might deadlock, causing one of them to
> throw an error instead of INSERTing or UPDATEing anything.

Now we get to a productive discussion, this is good.

When we first make requirements, obviously everyone agrees a long list
of things since there is initially not much reason to say No to it. As
we go towards implementation we begin to understand the true price of
meeting each requirement. It was good that this detail was raised and
sensible to attempt to avoid unprincipled deadlocks. If the price of
avoiding them is high, it is worth reconsidering how important that
is.

My view is that I can't see the above use case from happening in real
situations, except by infrequent mistake. In most cases, unique
indexes represent some form of object identity and those don't change
frequently in the real world. So to be changing two unique fields at
the same time and it not representing some form of business process
error that people would like to see fail anyway, I'd be surprised by.
If someone has an example of that in a real, common case then I would
like to see it and I would revise my view accordingly

We are frequently hampered by trying to design something that can sing
and dance at the same time. That thought is exactly how we are looking
at upsert now, not merge. So trimming our objectives to what makes
sense is an accepted part of this project already.

>> Any form of tuple locking that uses the general lock manager will not
>> be usable. I can't see it is worth the overhead of doing that to
>> protect against deadlocks that would only be experienced by people
>> doing foolish things.
>
>
> Maybe, maybe not, but let's define the acceptable behavior first, and think
> about the implementation second.

Hand in hand, I think, given the other constraints of time, review,
maintainability etc..

> I'm pretty sure all of the approaches
> discussed so far can be made fast enough, and the bloat issues can be made
> small enough, that it doesn't matter much which one we choose from a
> performance point of view. The differences are in what use cases they can
> support, and the maintainability of the code.

The discussion of approaches has up to now focused only on what
impossibities exist, with a "we must do this because feature A can't
do aspect X". I haven't yet seen much discussion of maintainability of
code, but I would guess simpler is better, overall.

Realistically, I won't be coding any separate approaches, so this is
down to Peter and maybe yourself Heikki. I hope only to avoid
foreclosing viable and simple approaches for the wrong reasons. There
are many other considerations that make up the final view.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Fri, Oct 3, 2014 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> My view is that I can't see the above use case from happening in real
> situations, except by infrequent mistake. In most cases, unique
> indexes represent some form of object identity and those don't change
> frequently in the real world. So to be changing two unique fields at
> the same time and it not representing some form of business process
> error that people would like to see fail anyway, I'd be surprised by.

Are we talking about two different things here?

Unprincipled deadlocks can be seen without updating any constrained
column in the UPSERT. The test-case that originally highlighted the
issue only had one unique index, and it was *not* in the update's
targetlist. See:

https://wiki.postgresql.org/wiki/Value_locking#.22Unprincipled_Deadlocking.22_and_value_locking

-- 
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Simon Riggs
Date:
On 3 October 2014 10:32, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 3, 2014 at 2:03 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> That lowers the bar from what I thought everyone agreed on. Namely, if two
>> backends run a similar UPSERT command concurrently on a table that has more
>> than one unique constraint, they might deadlock, causing one of them to
>> throw an error instead of INSERTing or UPDATEing anything.
>
> It lowers the bar to a level that I am not willing to limbo dance
> under. You don't even need two unique constraints. Nothing as
> "complicated" as that is required.
>
> When this happens with MySQL, they have the good sense to call it a
> bug [1], and even fix it. I find the comparison with conventional
> insertion entirely unconvincing.

Is there a test case that demonstrates the problem?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Fri, Oct 3, 2014 at 3:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Is there a test case that demonstrates the problem?

Yes. See my e-mail to Heikki here:

http://www.postgresql.org/message-id/CAM3SWZShbE29KpoD44cVc3vpZJGmDer6k_6FGHiSzeOZGmTFSQ@mail.gmail.com

Testcase is attached.


-- 
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Simon Riggs
Date:
On 3 October 2014 10:57, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 3, 2014 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> My view is that I can't see the above use case from happening in real
>> situations, except by infrequent mistake. In most cases, unique
>> indexes represent some form of object identity and those don't change
>> frequently in the real world. So to be changing two unique fields at
>> the same time and it not representing some form of business process
>> error that people would like to see fail anyway, I'd be surprised by.
>
> Are we talking about two different things here?
>
> Unprincipled deadlocks can be seen without updating any constrained
> column in the UPSERT. The test-case that originally highlighted the
> issue only had one unique index, and it was *not* in the update's
> targetlist. See:
>
> https://wiki.postgresql.org/wiki/Value_locking#.22Unprincipled_Deadlocking.22_and_value_locking

I followed that to a wiki page, then clicked again to an old email. No
test case.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Heikki Linnakangas
Date:
On 10/03/2014 01:05 PM, Peter Geoghegan wrote:
> On Fri, Oct 3, 2014 at 3:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Is there a test case that demonstrates the problem?
>
> Yes. See my e-mail to Heikki here:
>
> http://www.postgresql.org/message-id/CAM3SWZShbE29KpoD44cVc3vpZJGmDer6k_6FGHiSzeOZGmTFSQ@mail.gmail.com
>
> Testcase is attached.

Simon's approach would actually pass that test case just fine. It 
inserts the (promise) index tuple first, and heap tuple only after that. 
It will fail the test case with more than one unique index, however.

- Heikki



Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Fri, Oct 3, 2014 at 3:54 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Simon's approach would actually pass that test case just fine. It inserts
> the (promise) index tuple first, and heap tuple only after that. It will
> fail the test case with more than one unique index, however.

Oh, I see. Still, I don't think you need to UPDATE a
uniquely-constrained attribute - even if updating constrained
attributes is rare (dubious), non-HOT updates will have the same
effect, no? I still think that's unacceptable.

In any case, I still don't see what this buys us over the other two
designs. What's the pay-off for giving up on the general avoidance of
unprincipled deadlocks?

-- 
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Simon Riggs
Date:
On 3 October 2014 11:54, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> Simon's approach would actually pass that test case just fine. It inserts
> the (promise) index tuple first, and heap tuple only after that. It will
> fail the test case with more than one unique index, however.

Please explain what you mean by "fail" here?

My understanding of what you're saying is that if

* we have a table with >1 unique index
* and we update the values of the uniquely index columns (e.g. PK update)
* on both of the uniquely indexed column sets
then we get occaisonal deadlocks, just as we would do using current
UPDATE/INSERT.

Is their a business use case that requires that? (Or exactly what you
meant, if that isn't it?)

My view is if we are going to base the whole design on this point,
then we need to have it very clearly accessible for all to understand.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Heikki Linnakangas
Date:
On 10/06/2014 03:05 PM, Simon Riggs wrote:
> On 3 October 2014 11:54, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>
>> Simon's approach would actually pass that test case just fine. It inserts
>> the (promise) index tuple first, and heap tuple only after that. It will
>> fail the test case with more than one unique index, however.
>
> Please explain what you mean by "fail" here?

I meant that the test case will sometimes deadlock, and some 
transactions will therefore be rolled back.

> My understanding of what you're saying is that if
>
> * we have a table with >1 unique index
> * and we update the values of the uniquely index columns (e.g. PK update)
> * on both of the uniquely indexed column sets
> then we get occaisonal deadlocks, just as we would do using current
> UPDATE/INSERT.

Right. To be precise: you don't need to update both of the columns in 
the same transaction, it's enough that some of the concurrent 
transactions update one column, while other transactions update the 
other column.

> Is their a business use case that requires that?

I don't know. Conceivably any use case where you have two unique 
constraints to begin with.

- Heikki




Re: Promise index tuples for UPSERT

From
Heikki Linnakangas
Date:
On 10/06/2014 03:21 PM, Heikki Linnakangas wrote:
> On 10/06/2014 03:05 PM, Simon Riggs wrote:
>> My understanding of what you're saying is that if
>>
>> * we have a table with >1 unique index
>> * and we update the values of the uniquely index columns (e.g. PK update)
>> * on both of the uniquely indexed column sets
>> then we get occaisonal deadlocks, just as we would do using current
>> UPDATE/INSERT.
>
> Right. To be precise: you don't need to update both of the columns in
> the same transaction, it's enough that some of the concurrent
> transactions update one column, while other transactions update the
> other column.

Ok, that didn't make much sense. With UPSERT, you have to specify values 
for both columns. But it's sufficient that you have a mix of 
transactions where only some are UPSERTs, and others are regular UPDATEs 
on one of the columns.

- Heikki




Re: Promise index tuples for UPSERT

From
Simon Riggs
Date:
On 6 October 2014 13:21, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

>> My understanding of what you're saying is that if
>>
>> * we have a table with >1 unique index
>> * and we update the values of the uniquely index columns (e.g. PK update)
>> * on both of the uniquely indexed column sets
>> then we get occaisonal deadlocks, just as we would do using current
>> UPDATE/INSERT.
>
>
> Right. To be precise: you don't need to update both of the columns in the
> same transaction, it's enough that some of the concurrent transactions
> update one column, while other transactions update the other column.

CREATE TABLE foo
(id1    integer not null primary key
,id2    integer not null unique
,val    integer);

Given the table above, which one do we mean?

1. When we mix UPDATE foo SET id2 = X WHERE id1 = Y;  and UPDATE foo
SET id1 = Y WHERE id2 = X; we can deadlock
2. When we mix UPDATE foo SET val = Z WHERE id1 = Y;  and UPDATE foo
SET val = W WHERE id2 = X; we can deadlock

(2) is a common use case, (1) is a very rare use case and most likely
a poor design

If the user wishes to protect against such deadlocks they retain the
option to use row locking. Yes?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Heikki Linnakangas
Date:
On 10/06/2014 04:44 PM, Simon Riggs wrote:
> On 6 October 2014 13:21, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>
>>> My understanding of what you're saying is that if
>>>
>>> * we have a table with >1 unique index
>>> * and we update the values of the uniquely index columns (e.g. PK update)
>>> * on both of the uniquely indexed column sets
>>> then we get occaisonal deadlocks, just as we would do using current
>>> UPDATE/INSERT.
>>
>>
>> Right. To be precise: you don't need to update both of the columns in the
>> same transaction, it's enough that some of the concurrent transactions
>> update one column, while other transactions update the other column.
>
> CREATE TABLE foo
> (id1    integer not null primary key
> ,id2    integer not null unique
> ,val    integer);
>
> Given the table above, which one do we mean?
>
> 1. When we mix UPDATE foo SET id2 = X WHERE id1 = Y;  and UPDATE foo
> SET id1 = Y WHERE id2 = X; we can deadlock
> 2. When we mix UPDATE foo SET val = Z WHERE id1 = Y;  and UPDATE foo
> SET val = W WHERE id2 = X; we can deadlock
>
> (2) is a common use case, (1) is a very rare use case and most likely
> a poor design

Well, at least one of the statements has to be an UPSERT, and at least 
one of them has to update a column with a unique constraint on it. This 
pair of transactions could deadlock, for example:

Transaction 1:
INSERT INTO foo VALUES (Y, X, Z) ON CONFLICT IGNORE;
Transaction 2:
UPDATE foo SET id2 = X WHERE id1 = Y;

That's made-up syntax, but the idea is that the first transaction 
attempts to insert a row with values id1=Y, id2=X, val=Z. If that fails 
because of a row with id1=Y or id2=X already exists, then it's supposed 
to do nothing.

> If the user wishes to protect against such deadlocks they retain the
> option to use row locking. Yes?

Sorry, I didn't understand that. Row locking?

In general, this is of course a lot easier to implement if we restrict 
it so that it only works in some limited cases. That may be fine, but 
then we have to be able to document clearly what the limitations are, 
and throw an error if you violate those limitations.

- Heikki




Re: Promise index tuples for UPSERT

From
Simon Riggs
Date:
On 6 October 2014 15:04, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 10/06/2014 04:44 PM, Simon Riggs wrote:
>>
>> On 6 October 2014 13:21, Heikki Linnakangas <hlinnakangas@vmware.com>
>> wrote:
>>
>>>> My understanding of what you're saying is that if
>>>>
>>>> * we have a table with >1 unique index
>>>> * and we update the values of the uniquely index columns (e.g. PK
>>>> update)
>>>> * on both of the uniquely indexed column sets
>>>> then we get occaisonal deadlocks, just as we would do using current
>>>> UPDATE/INSERT.
>>>
>>>
>>>
>>> Right. To be precise: you don't need to update both of the columns in the
>>> same transaction, it's enough that some of the concurrent transactions
>>> update one column, while other transactions update the other column.
>>
>>
>> CREATE TABLE foo
>> (id1    integer not null primary key
>> ,id2    integer not null unique
>> ,val    integer);
>>
>> Given the table above, which one do we mean?
>>
>> 1. When we mix UPDATE foo SET id2 = X WHERE id1 = Y;  and UPDATE foo
>> SET id1 = Y WHERE id2 = X; we can deadlock
>> 2. When we mix UPDATE foo SET val = Z WHERE id1 = Y;  and UPDATE foo
>> SET val = W WHERE id2 = X; we can deadlock
>>
>> (2) is a common use case, (1) is a very rare use case and most likely
>> a poor design
>
>
> Well, at least one of the statements has to be an UPSERT, and at least one
> of them has to update a column with a unique constraint on it. This pair of
> transactions could deadlock, for example:
>
> Transaction 1:
> INSERT INTO foo VALUES (Y, X, Z) ON CONFLICT IGNORE;
> Transaction 2:
> UPDATE foo SET id2 = X WHERE id1 = Y;
>
> That's made-up syntax, but the idea is that the first transaction attempts
> to insert a row with values id1=Y, id2=X, val=Z. If that fails because of a
> row with id1=Y or id2=X already exists, then it's supposed to do nothing.

Lets look at a real world example

CREATE TABLE citizen
(ssn    integer not null primary key
,email text not null unique
,tax_amount  decimal);

Transaction 1:
INSERT INTO citizen VALUES (555123456, 'simon@2ndQuadrant.com',
1000.00) ON CONFLICT IGNORE;
Transaction 2:
UPDATE foo SET email = 'simon@2ndQuadrant.com', tax_amount = 1000.00
WHERE ssn = 555123456;

OK, now I understand how a deadlock is possible. Thanks for your help.
Again I note that there is no isolation test that refers to this
situation, nor any documentation, internal or user facing that
describes the situation or its workaround.

My feeling is that is an unlikely situation. To have two actors
concurrently updating the same data AND in different ways from two
different angles.

How likely is it that we would issue those two transactions
concurrently AND we would be concerned because this caused an error?
If the tax_amount was the same, it wouldn't matter that one failed.
If the tax_amount differeed, we would want to know about the error,
not accept it in silence.

Are any of those things substantially worse than the current situation
using INSERT/UPDATE loops?

It might be nice if the above never deadlocked. What is the price of
ensuring that in terms of code maintainability and performance? What
would this do to  COPY performance?


>> If the user wishes to protect against such deadlocks they retain the
>> option to use row locking. Yes?
>
>
> Sorry, I didn't understand that. Row locking?

I think that thought doesn't apply here.

> In general, this is of course a lot easier to implement if we restrict it so
> that it only works in some limited cases. That may be fine, but then we have
> to be able to document clearly what the limitations are, and throw an error
> if you violate those limitations.

Seems reasonable.

My point here is to establish that...

a) there are multiple ways to implement the UPSERT feature and none
should be thrown away too quickly
b) the current patch does not implement something we all agree on yet
c) not all requirements have been properly documented, understood or
agreed by hackers

If we want to move forwards we need to agree things based upon clarity
and real world usage.

It may be that people on reading this now believe Peter's HW locking
approach is the best. I'm happy to go with consensus.

My feeling is that substantially more work is required on explaining
the details around multiple unique index constraints, trigger
behaviour and various other corner cases.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Mon, Oct 6, 2014 at 5:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Lets look at a real world example
>
> CREATE TABLE citizen
> (ssn    integer not null primary key
> ,email text not null unique
> ,tax_amount  decimal);
>
> Transaction 1:
> INSERT INTO citizen VALUES (555123456, 'simon@2ndQuadrant.com',
> 1000.00) ON CONFLICT IGNORE;
> Transaction 2:
> UPDATE foo SET email = 'simon@2ndQuadrant.com', tax_amount = 1000.00
> WHERE ssn = 555123456;
>
> OK, now I understand how a deadlock is possible. Thanks for your help.
> Again I note that there is no isolation test that refers to this
> situation, nor any documentation, internal or user facing that
> describes the situation or its workaround.

This seems like a concern specific to other approaches to value
locking. But fair enough.

> My feeling is that is an unlikely situation. To have two actors
> concurrently updating the same data AND in different ways from two
> different angles.

Hard to say for sure.

> How likely is it that we would issue those two transactions
> concurrently AND we would be concerned because this caused an error?
> If the tax_amount was the same, it wouldn't matter that one failed.
> If the tax_amount differeed, we would want to know about the error,
> not accept it in silence.
>
> Are any of those things substantially worse than the current situation
> using INSERT/UPDATE loops?

Yes, because the new feature is supposed to make it so that you
yourself don't have to put your UPSERT statement in a loop with
subxacts. Taking away the burden of having to think about this stuff
is something I'm striving for here.

> It might be nice if the above never deadlocked. What is the price of
> ensuring that in terms of code maintainability and performance?

I am going to reserve judgement on that, at least for a little while.
It seems like the person proposing an alternative ought to have a
better sense of what the price of avoiding this is. I'd understand
what you were getting at more here if it immediately made our lives
easier in some obvious way. I don't see that it does, though I admit
that I may simply not understand where you're coming from. So sure,
let's not be prejudiced about what's important, but at the same time I
don't see that either Heikki or I have actually been inflexible to a
degree that hurts things WRT not giving up on important high-level-ish
goals.

I am not completely inflexible on "never error". I am very close to
totally inflexible, though. I think I could live with an error that
literally no one would ever see. For example, we could error if there
was an excessive number of retries, which I find acceptable because it
will never happen in the real world. I tend to think that what you're
talking about is pretty far from that, though.

> My point here is to establish that...
>
> a) there are multiple ways to implement the UPSERT feature and none
> should be thrown away too quickly
> b) the current patch does not implement something we all agree on yet
> c) not all requirements have been properly documented, understood or
> agreed by hackers
>
> If we want to move forwards we need to agree things based upon clarity
> and real world usage.

I certainly agree with that.

> It may be that people on reading this now believe Peter's HW locking
> approach is the best. I'm happy to go with consensus.

I bet you didn't think that you'd say that a week ago.  :-)

I hope I don't sound smug when I say that. I just mean, as you say,
that we all need to keep an open mind on this. A healthy respect for
the problem is recommended. I think it's still possible that there are
problems with design #1, even on its own terms.

> My feeling is that substantially more work is required on explaining
> the details around multiple unique index constraints, trigger
> behaviour and various other corner cases.

Probably. Ideally, we should do that in a way driven by real-world
prototypes. In that spirit, I attach a new version of my patch, but
now implemented using approach #2 to value locking. I haven't spent
all that much time testing this (at least recently, in this form), but
it does pass all existing tests, including my stress-tests when run
for half an hour.

A lot of those corner cases you mention are big concerns. It's much
easier to identify these issues by breaking real implementations. So
surprisingly, to a certain extent (with something like this) it makes
sense to have requirements driven by actual implementations. If we
cannot do this iteratively, we are likely to fail. That's just how it
is, I think.
--
Peter Geoghegan

Attachment

Re: Promise index tuples for UPSERT

From
Simon Riggs
Date:
On 7 October 2014 03:31, Peter Geoghegan <pg@heroku.com> wrote:

>> It may be that people on reading this now believe Peter's HW locking
>> approach is the best. I'm happy to go with consensus.
>
> I bet you didn't think that you'd say that a week ago.  :-)

You're right, because last week I thought heavyweight locking sucks
and I still think that; I haven't said it is the best.

What we've just discovered is that we're locking 100% of the time, but
its not needed in 99.9% of cases and is arguable in the 0.1% case -
not "required" at all.

The price of avoiding that rare and possibly erroneous condition seems
high to me.

Is there a way of detecting that we are updating a unique constraint
column and then applying the HW locking only in that case? Or can we
only apply locking when we have multiple unique constraints on a
table?
If so, I would withdraw any objection to the HW locking technique.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Robert Haas
Date:
On Tue, Oct 7, 2014 at 8:33 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 7 October 2014 03:31, Peter Geoghegan <pg@heroku.com> wrote:
>>> It may be that people on reading this now believe Peter's HW locking
>>> approach is the best. I'm happy to go with consensus.
>>
>> I bet you didn't think that you'd say that a week ago.  :-)
>
> You're right, because last week I thought heavyweight locking sucks
> and I still think that; I haven't said it is the best.
>
> What we've just discovered is that we're locking 100% of the time, but
> its not needed in 99.9% of cases and is arguable in the 0.1% case -
> not "required" at all.
>
> The price of avoiding that rare and possibly erroneous condition seems
> high to me.
>
> Is there a way of detecting that we are updating a unique constraint
> column and then applying the HW locking only in that case? Or can we
> only apply locking when we have multiple unique constraints on a
> table?
> If so, I would withdraw any objection to the HW locking technique.

I'm not up on the details of what Peter's patch does with heavyweight
locking, but I'd say it this way: if the patch uses heavyweight
locking routinely, that's probably not going to scale well[1].   If
the patch detects possible conflicts and uses heavyweight locking only
in those cases and for the specific purpose of untangling those
conflicts, then that might well be OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] As evidence, I offer the fact that removing 1 of 2 places where
heavyweight locking is used in hash indexes resulted in a large
performance gain at high client counts.  See commit
76837c1507cb5a5a0048046233568447729e66dd.



Re: Promise index tuples for UPSERT

From
Simon Riggs
Date:
On 7 October 2014 14:06, Robert Haas <robertmhaas@gmail.com> wrote:

>> Is there a way of detecting that we are updating a unique constraint
>> column and then applying the HW locking only in that case? Or can we
>> only apply locking when we have multiple unique constraints on a
>> table?
>> If so, I would withdraw any objection to the HW locking technique.
>
> I'm not up on the details of what Peter's patch does with heavyweight
> locking, but I'd say it this way: if the patch uses heavyweight
> locking routinely, that's probably not going to scale well[1].   If
> the patch detects possible conflicts and uses heavyweight locking only
> in those cases and for the specific purpose of untangling those
> conflicts, then that might well be OK.

Yes, what I meant, just more clearly phrased. Thanks for the link.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Tue, Oct 7, 2014 at 6:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I'm not up on the details of what Peter's patch does with heavyweight
> locking, but I'd say it this way: if the patch uses heavyweight
> locking routinely, that's probably not going to scale well[1].   If
> the patch detects possible conflicts and uses heavyweight locking only
> in those cases and for the specific purpose of untangling those
> conflicts, then that might well be OK.

The patch opportunistically tries to use shared buffer locks when a
conflict is expected, when we restart (but only on the unique index
where a conflict was detected). So in the event of a lot of
near-conflicts, the hwlock traffic is quite modest. That, combined
with the fact that it uses what I've called "an index scan with an
identity crisis" (could be a near-insertion + hwlock in advance of
insertion proper, or just something akin to a regular index scan)
makes it perform best (at least with one or two unique indexes, which
is what I tested a few months back). It does not have a pre-check that
is always wasted with insertion-heavy workloads.

Now, we're not talking about a huge advantage here (I should re-test
that). And, in case I wasn't clear: I have misgivings about all 3
designs. Like Simon, I think it is appropriate that we figure out our
exact requirements using the two working prototype patches. Although,
right now #1 and #2 (the prototypes) seem quite comparable, that might
just be down to a failure of imagination. It's hard to be completely
confident about something like that.

-- 
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Tue, Oct 7, 2014 at 11:25 AM, Peter Geoghegan <pg@heroku.com> wrote:
> Now, we're not talking about a huge advantage here (I should re-test
> that).

I attach raw output when running the bash scripts insert.sh and
update.sh. These are benchmarks that concern performance in terms of
total system throughput (TPS). The scripts are available from my
stress-test suite: https://github.com/petergeoghegan/upsert

These scripts were originally designed to compare UPSERT with an
unsympathetic "gold-standard" for performance: "equivalent" INSERTs
and UPDATEs. I looked at a few runs of 60 seconds, on unlogged tables,
making the most direct comparison between UPSERTs and "equivalent"
INSERTs and UPDATEs that is possible. To be clear, by "equivalent" I
mean UPSERTS where we know we'll only UPDATE (in the case of
update.sh), and UPSERTS where we know we'll only INSERT (in the case
of insert.sh).

Both #1 and #2 do respectably as compared to "equivalent" INSERTs and
UPDATEs. There may be even less sympathetic though more representative
cases, but certainly for these simple cases, performance is solid
across the board. I got these numbers on my laptop, and it may be
necessary to devise a more rigorous benchmark later, but performance
is quite consistent between runs shown here.

Approach #1 wins out with UPDATEs. The heavyweight-lock avoidance
stuff is enough to compensate for the fact that we never INSERT (and
never need B-Tree leaf page heavyweight locks). Median TPS was
19,310.79 for #1. Whereas for #2, it was 18,872.63 TPS. This is the
case even though the "pre-check" for #2 is always appropriate, while
we still acquire page-level heavyweight locks sometimes with #1.

INSERTs see #2 win, and by a wider margin than #1 beat #2 with
UPDATEs. However, insert.sh is by design very unsympathetic towards
#1. It uses a serial primary key, so every INSERT uselessly obtains a
HW lock on the same leaf page for the duration of heap insertion.
Anyway, the median INSERT TPS numbers is 17,759.53 for #1, and
20,441.57 TPS for #2. So you're pretty much seeing the full brunt of
page heavyweight locking, and it isn't all that bad. However, Heikki
has said something about being more clever with when and how #2 is
made to pre-check (which is always wasted here); so it's possible to
imagine INSERTs becoming faster for #2, while that probably isn't the
case for #1. I think that if I wanted to, I could get #1 to do much
better on another case where page heavyweight locking is more varied.
My goal here was to do the opposite, though.

--
Peter Geoghegan

Attachment

Re: Promise index tuples for UPSERT

From
Anssi Kääriäinen
Date:
On Tue, 2014-10-07 at 13:33 +0100, Simon Riggs wrote:
> Is there a way of detecting that we are updating a unique constraint
> column and then applying the HW locking only in that case? Or can we
> only apply locking when we have multiple unique constraints on a
> table?

What is the use case of doing an UPSERT into a table with multiple
unique constraints?

Consider table user with unique columns name and email and a non-unique
column age. If it has data
   Jack | jack@example.com |33    Tom | tom@example.com | 35

And the user does UPSERT values (Jack, tom@example.com, 34). The
proposed specification will pick random unique index (either name or
email index) and do an update of that row.

First, this will cause unique violation, second, doing the UPSERT on
random index seems confusing.

The MySQL documentation says that "you should try to avoid using an ON
DUPLICATE KEY UPDATE clause on tables with multiple unique indexes"[1].
The proposed feature's documentation has the same suggestion[2]. Still,
the feature defaults to this behavior. Why is the default something the
documentation says you shouldn't do?

Going a bit further, I am wondering what is the use case of doing an
UPSERT against multiple unique indexes? If multiple unique indexes
UPSERT could be dropped that might allow for faster or cleaner index
locking techniques.
- Anssi

1: http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html
2: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html




Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Wed, Oct 8, 2014 at 12:41 AM, Anssi Kääriäinen
<anssi.kaariainen@thl.fi> wrote:
> The MySQL documentation says that "you should try to avoid using an ON
> DUPLICATE KEY UPDATE clause on tables with multiple unique indexes"[1].
> The proposed feature's documentation has the same suggestion[2]. Still,
> the feature defaults to this behavior. Why is the default something the
> documentation says you shouldn't do?

MySQL started saying that when they realized it broke their
statement-based replication. They have their own weird reasons for
disliking it. Now, that's not to minimize the risks.

The reasoning behind making the unique index specification optional is:

We cannot easily cover corner cases with another syntax - unique
indexes must be named directly to cover every case, and make the
user's intent absolutely clear. That's not obviously the case, but
consider partial unique indexes, for example. Or consider uniquely
constrained columns, with an almost equivalent uniquely constrained
expression on those same columns. On the one hand I am not comfortable
failing to support those, but on the other hand it could get very
messy to do it another way.

As we all know, naming a unique index in DML is ugly, and has poor
support in ORMs. It seems likely that we're better off making it
optional - it hasn't been much of a problem with the existing subxact
looping pattern. A lot of the time, there will only be a single unique
index anyway, or there will be a trivial serial PK unique index and
the unique index we always want to treat would-be conflicts within as
triggering the alternative UPDATE/IGNORE path.

> Going a bit further, I am wondering what is the use case of doing an
> UPSERT against multiple unique indexes? If multiple unique indexes
> UPSERT could be dropped that might allow for faster or cleaner index
> locking techniques.

I see no reason to think that. I don't think it buys us much at all.

--
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Heikki Linnakangas
Date:
On 10/08/2014 11:10 AM, Peter Geoghegan wrote:
> The reasoning behind making the unique index specification optional is:
>
> We cannot easily cover corner cases with another syntax - unique
> indexes must be named directly to cover every case, and make the
> user's intent absolutely clear. That's not obviously the case, but
> consider partial unique indexes, for example. Or consider uniquely
> constrained columns, with an almost equivalent uniquely constrained
> expression on those same columns. On the one hand I am not comfortable
> failing to support those, but on the other hand it could get very
> messy to do it another way.
>
> As we all know, naming a unique index in DML is ugly, and has poor
> support in ORMs.

I vehemently object to naming indexes in the UPSERT statement. That 
mixes logical and physical database design, which is a bad idea. This is 
not ISAM.

Instead of naming the index, you should name the columns, and the system 
can look up the index or indexes that match those columns.

(Remind me again, where did this need to name an index come from in the 
first place?)

- Heikki




Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Instead of naming the index, you should name the columns, and the system can
> look up the index or indexes that match those columns.

It's not totally clear that we need *any* WITHIN clause, BTW. I'm not
dead set on it. It was something I mainly added at Kevin's request. I
do see the risks, though.

> (Remind me again, where did this need to name an index come from in the
> first place?)

I agree that naming indexes is ugly, and best avoided. It's tricky,
though. The first thing you might try is to look up the index during
parse analysis and/or planning. That could work in simple cases (which
are probably the vast majority), but I'm worried about stuff like
before row triggers that change the values being inserted out from
under us, in a way that interferes with partial unique indexes. Maybe
you could choose the wrong one if you didn't leave it until the last
minute. But it's not very convenient to leave it until the last
minute.

If you're willing to live with the command conservatively failing when
there isn't a clean specification (although not in a way that can make
the command fail when the user innocently adds a unique index later),
then I think I can do it. Otherwise, it could be surprisingly hard to
cover all the cases non-surprisingly. I freely admit that putting a
unique index in a DML statement is weird, but it is sort of the most
direct way of expressing what we want. Oracle actually have an
INSERT-IGNORE like hint that names a unique index (yes, really - see
the UPSERT wiki page). That's really bizarre, but the point is that
they may have felt that there was no reasonable alternative.

-- 
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Anssi Kääriäinen
Date:
On Wed, 2014-10-08 at 02:22 -0700, Peter Geoghegan wrote:
> On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
> > Instead of naming the index, you should name the columns, and the system can
> > look up the index or indexes that match those columns.
> 
> It's not totally clear that we need *any* WITHIN clause, BTW. I'm not
> dead set on it. It was something I mainly added at Kevin's request. I
> do see the risks, though.

To be usable in Django ORM's .save() method there must be an option to
use the primary key index, and only the primary key index for conflict
resolution.

Assume an author table with id SERIAL PRIMARY KEY, email TEXT UNIQUE,
age INTEGER, then when saving an object, Django must update the row with
matching primary key value, or otherwise do an insert. Doing an update
of matching email column isn't allowed. So, Django can't use the query:
  INSERT INTO author values(1, 'tom@example.com', 35)  ON CONFLICT UPDATE      SET email = conflicting(email), age =
conflicting(age);

If the table contains data (id=2, email='tom@example.com', age=34), the
query would update tom's age to 35 when it should have resulted in
unique constraint violation.

Other ORMs have similar save/persist implementations, that is objects
are persisted on primary key identity alone.
- Anssi




Re: Promise index tuples for UPSERT

From
Anssi Kääriäinen
Date:
On Wed, 2014-10-08 at 01:10 -0700, Peter Geoghegan wrote:
> On Wed, Oct 8, 2014 at 12:41 AM, Anssi Kääriäinen
> <anssi.kaariainen@thl.fi> wrote:
> > The MySQL documentation says that "you should try to avoid using an ON
> > DUPLICATE KEY UPDATE clause on tables with multiple unique indexes"[1].
> > The proposed feature's documentation has the same suggestion[2]. Still,
> > the feature defaults to this behavior. Why is the default something the
> > documentation says you shouldn't do?

> As we all know, naming a unique index in DML is ugly, and has poor
> support in ORMs. It seems likely that we're better off making it
> optional - it hasn't been much of a problem with the existing subxact
> looping pattern.

The subxact approach is a bit different than the proposed UPSERT
command. It loops:
  try:      INSERT INTO author VALUE('Jack', 'tom@example.com', 34)  except UniqueConstraintViolation:      UPDATE
authorSET ... WHERE name = 'Jack'
 

while the UPSERT command does something like:
  try:      INSERT INTO author VALUE('Jack', 'tom@example.com', 34)  except UniqueConstaintViolation:      UPDATE
authorSET ... WHERE name = 'Jack' OR email = 'tom@example.com' LIMIT 1;
 
- Anssi




Re: Promise index tuples for UPSERT

From
Simon Riggs
Date:
On 8 October 2014 00:34, Peter Geoghegan <pg@heroku.com> wrote:

> INSERTs see #2 win, and by a wider margin than #1 beat #2 with
> UPDATEs. However, insert.sh is by design very unsympathetic towards
> #1. It uses a serial primary key, so every INSERT uselessly obtains a
> HW lock on the same leaf page for the duration of heap insertion.
> Anyway, the median INSERT TPS numbers is 17,759.53 for #1, and
> 20,441.57 TPS for #2. So you're pretty much seeing the full brunt of
> page heavyweight locking, and it isn't all that bad.

Lets see the results of running a COPY please.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Promise index tuples for UPSERT

From
Kevin Grittner
Date:
Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

>> Instead of naming the index, you should name the columns, and
>> the system can look up the index or indexes that match those
>> columns.

+1

That is what I have been consistently requesting instead of index
names, every time I notice it mentioned.

> It's not totally clear that we need *any* WITHIN clause, BTW.
> I'm not dead set on it. It was something I mainly added at
> Kevin's request. I do see the risks, though.

What I said was in response to your assertion that your technique
would *never* generate a duplicate key error.  As others have again
been pointing out, when there is more than one unique index you can
have rows to apply which cannot be applied accurately without
causing such an error.  What I requested was that the behavior in
those cases be clear and documented.  I didn't take a position on
whether you pick an index or ignore the input row (with a
warning?), but we need to decide how it is handled.  I have made
the same point Heikki is making, though -- we have no business
referencing an index name in the statement.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Wed, Oct 8, 2014 at 2:50 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> What I said was in response to your assertion that your technique
> would *never* generate a duplicate key error.

That is strictly true: the INSERT cannot raise a unique violation
error, because to do so would cause it to take the UPDATE path
(assuming there is no WITHIN clause). The UPDATE may get one, though.
It doesn't have to get one for your statement to have effects that are
surprising, though.

> As others have again
> been pointing out, when there is more than one unique index you can
> have rows to apply which cannot be applied accurately without
> causing such an error.

It's simpler than that. You want to make sure that the right condition
- the right unique index having a would-be duplicate violation - is
the one taken *in practice*, the condition on which the UPDATE path is
taken. What happens after that is not that interesting (e.g. whether
or not there is an UPDATE-path duplicate violation), because either
way it's broken.

> What I requested was that the behavior in
> those cases be clear and documented.  I didn't take a position on
> whether you pick an index or ignore the input row (with a
> warning?), but we need to decide how it is handled.  I have made
> the same point Heikki is making, though -- we have no business
> referencing an index name in the statement.

I think that's fair enough. That's all fine - but actually doing that
is quite tricky. Look at what I've said in relation to doing that.
Consider the corner-cases I name. They're certainly only a small
minority of cases in practice, but as an implementer I still need to
worry about them (maybe even just as much). Yes, I would like to just
name columns, but it's hard to make that fully generally.

If you want me to do what you say, fine. But in order to do that, I
need support for having the corner cases make parse analysis throw up
its hands and error. Is that a price you're willing to pay? If so,
then I'll implement it. Or, alternatively, we could do WITHIN PRIMARY
key/NOT WITHIN PRIMARY KEY.

-- 
Peter Geoghegan



Re: Promise index tuples for UPSERT

From
Peter Geoghegan
Date:
On Wed, Oct 8, 2014 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Lets see the results of running a COPY please.

Not exactly sure what you mean here. A concurrent COPY?

-- 
Peter Geoghegan