Thread: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
this development justifies a new thread, though.)

This features a new way of organizing the code. Heikki and I are now
in agreement that the best way of incrementally committing the work is
to do ON CONFLICT IGNORE first (perhaps someone else can review ON
CONFLICT UPDATE). This patch series is organized so that the first
commit adds tests, documentation and code that only relates to the
IGNORE variant. RLS support is still split out as a separate commit,
which is not intended to actually be committed separately (from the
main ON CONFLICT UPDATE commit). As before, I've just done that to
highlight that part for the benefit of RLS subject matter experts. The
RTE permissions split patch is also broken out, but I believe that
there is consensus that that could sensibly be committed on its own.

There are some minor changes to the code/documentation itself:

* Ran the code through pg_indent.

* Documented ON CONFLICT UPDATE as a MERGE alternative under
"unsupported SQL features" (sql_features.txt).

* Minor tweaks of comments here and there.

* Logical decoding fixes.

* Call ExecCheckPlanOutput() for an ON CONFLICT UPDATE auxiliary query, too.

I should point out that I'm aware of the following open issues around
the patch series (most of these apply to the base IGNORE commit, so
Heikki should of course look out for them):

* Andres and I are still hashing out details of whether or not certain
assumptions made around handling super deletion within logical
decoding are safe [1]. Consider the decoding stuff suspect for now.

* There is still an unresolved semantics issue with unique index
inference and non-default opclasses. I think it's sufficient that the
available/defined unique indexes dictate our idea of a unique
violation (that necessitates taking the alternative path). Even in a
world where there exists a non-default opclass with an "equals"
operator that does not match that of the default opclass (that is not
really the world we live in, because the only counter-example known is
made that way specifically to *discourage* its use by users), this
seems okay to me. It seems okay to me because surely the relevant
definition of equality is the one actually chosen for the available
unique index. If there exists an ambiguity for some strange reason
(i.e. there are two unique indexes, on the same attribute(s), but with
different "equals" operators), then its a costing issue, so the
behavior given is essentially non-predictable (it could end up being
either...but hey, those are the semantics). I have a very hard time
imagining how that could ever be the case, even when we have (say)
case insensitive opclasses for the text type. A case insensitive
opclass is stricter than a case sensitive opclass.  Why would a user
ever want both on the same attribute(s) of the same table? Is the user
really more or less expecting to never get a unique violation on the
non-arbitrating unique index, despite all this?

If reviewers are absolutely insistent that this theoretical ambiguity
is a problem, we can add an optional CREATE INDEX style opclass
specification (I'm already using the IndexElems representation from
CREATE INDEX for the inference specification, actually, so that would
be easy enough). I really have a hard time believing that the ugliness
is a problem for those hypothetical users that eventually consider
"equals" operator ambiguity among opclasses among available unique
indexes to be a problem. I haven't just gone and implemented this
already because I didn't want to document that an opclass
specification will be accepted. Since there is literally no reason why
anyone would care today, I see no reason to add what IMV would really
just be cruft.

* I'm flying blind with the SEPostgres changes. Unfortunately, it's
not very possible to test SEPostgres on my machine. (Does not apply to
IGNORE variant.)

As always, the jjanes_upsert tool [2] has proven invaluable with
testing this patch (Thanks Jeff!). Reviewers should look to that tool
for ideas on how the patch might be broken. It caught a number of race
conditions with exclusion constraints in the past, for example. We're
in good shape with those now (and have been since V2.3 - see "livelock
insurance" comments within the IGNORE commit).

Thoughts?

[1] http://www.postgresql.org/message-id/20150303111342.GF2579@alap3.anarazel.de
[2] https://github.com/petergeoghegan/jjanes_upsert
--
Peter Geoghegan

Attachment

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Wed, Mar 4, 2015 at 5:18 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
> CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
> this development justifies a new thread, though.)

Bruce Momjian kindly made available a server for stress-testing [1].
I'm using jjanes_upsert for this task (I stopped doing this for a
little while, and was in need of a new server).

At the very highest client count I'm testing (128), I see unprincipled
deadlocks for the exclusion constraint case very infrequently:

"""""
2015-03-05 14:09:36 EST [ 64987901 ]: ERROR:  deadlock detected
2015-03-05 14:09:36 EST [ 64987901 ]: DETAIL:  Process 7044 waits for
ShareLock on promise tuple with token 1 of transaction 64987589;
blocked by process 7200.       Process 7200 waits for ShareLock on transaction 64987901;
blocked by process 7044.       Process 7044: insert into upsert_race_test (index, count)
values ('541','-1') on conflict                     update set count=TARGET.count + EXCLUDED.count
whereTARGET.index = EXCLUDED.index                     returning count       Process 7200: insert into upsert_race_test
(index,count)
 
values ('541','-1') on conflict                     update set count=TARGET.count + EXCLUDED.count
whereTARGET.index = EXCLUDED.index                     returning count
 
2015-03-05 14:09:36 EST [ 64987901 ]: HINT:  See server log for query details.
2015-03-05 14:09:36 EST [ 64987901 ]: STATEMENT:  insert into
upsert_race_test (index, count) values ('541','-1') on conflict                     update set count=TARGET.count +
EXCLUDED.count                    where TARGET.index = EXCLUDED.index                     returning count
 

"""""

(Reminder: Exclusion constraints doing UPSERTs, and not just IGNOREs,
are artificial; this has been enabled within the parser solely for the
benefit of this stress-testing. Also, the B-Tree AM does not have nor
require "livelock insurance").

This only happens after just over 30 minutes, while consistently doing
128 client exclusion constraint runs. This is pretty close to the most
stressful thing that you could throw at the implementation, so that's
really not too bad.

I believe that this regression occurred as a direct result of adding
"livelock insurance". Basically, we cannot be 100% sure that the
interleaving of WAL-logged things within and across transactions won't
be such that the "only", "oldest" session that gets to wait in the
second stage (the second possible call to
check_exclusion_or_unique_constraint(), from ExecInsertIndexTuples())
will really be the "oldest" XID. Another *older* xact could just get
in ahead of us, waiting on our promise tuple as we wait on its xact
end (maybe it updates some third tuple that we didn't see in that has
already committed...not 100% sure). Obviously XID assignment order
does not guarantee that things like heap insertion and index tuple
insertion occur in serial XID order, especially with confounding
factors like super deletion. And so, every once in a long while we
deadlock.

Now, the very fact that this happens at all actually demonstrates the
need for "livelock insurance", IMV. The fact that we reliably
terminate is *reassuring*, because livelocks are in general a
terrifying possibility. We cannot *truly* solve the unprincipled
deadlock problem without adding livelocks, I think. But what we have
here is very close to eliminating unprincipled deadlocks, while not
also adding any livelocks, AFAICT. I'd argue that that's good enough.

Of course, when I remove "livelock insurance", the problem ostensibly
goes away (I've waited on such a stress test session for a couple of
hours now, so this conclusion seems very likely to be correct). I
think that we should do nothing about this, other than document it as
possible in our comments on "livelock insurance". Again, it's very
unlikely to be a problem in the real world, especially since B-Trees
are unaffected.

Also, I should point out that one of those waiters doesn't look like
an insertion-related wait at all: "7200 waits for ShareLock on
transaction 64987901; blocked by process 7044". Looks like row locking
is an essential part of this deadlock, and ordinarily that isn't even
possible for exclusion constraints (unless the patch is hacked to make
the parser accept exclusion constraints for an UPSERT, as it has been
here). Not quite sure what exact  XactLockTableWait() call did this,
but don't think it was any of them within
check_exclusion_or_unique_constraint(), based on the lack of details
(TID and so on are not shown).

[1] http://momjian.us/main/blogs/pgblog/2012.html#January_20_2012
-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Dean Rasheed
Date:
On 5 March 2015 at 23:44, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Mar 4, 2015 at 5:18 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
>> CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
>> this development justifies a new thread, though.)
>

Hi,

I had a play with this, mainly looking at the interaction with RLS.

(Note there is some bitrot in gram.y that prevents the first patch
from applying cleanly to HEAD)

I tested using the attached script, and one test didn't behave as I
expected. I believe the following should have been a valid upsert
(following the update path) but actually it failed:

INSERT INTO t1 VALUES (4, 0) ON CONFLICT (a) UPDATE SET b = 1;

AFAICT, it is applying a WITH CHECK OPTION with qual "b > 0 AND a % 2
= 0" to the about-to-be-updated tuple (a=4, b=0), which is wrong
because the "b > 0" check (policy p3) should only be applied to the
post-update tuple.

Possibly I'm missing something though.

Regards,
Dean

Attachment

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 03/05/2015 03:18 AM, Peter Geoghegan wrote:
> Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
> CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
> this development justifies a new thread, though.)

I'm still not sure the way the speculative locking works is the best
approach. Instead of clearing xmin on super-deletion, a new flag on the
heap tuple seems more straightforward. And we could put the speculative
insertion token in t_ctid, instead of stashing it in the PGPROC array.
That would again seem more straightforward.

A couple of quick random comments:

> /*
>  * plan_speculative_use_index
>  *        Use the planner to decide speculative insertion arbiter index
>  *
>  * Among indexes on target of INSERT ... ON CONFLICT, decide which index to
>  * use to arbitrate taking alternative path.  This should be called
>  * infrequently in practice, because it's unusual for more than one index to
>  * be available that can satisfy a user-specified unique index inference
>  * specification.
>  *
>  * Note: caller had better already hold some type of lock on the table.
>  */
> Oid
> plan_speculative_use_index(PlannerInfo *root, List *indexList)
> {
>     ...
>     /* Locate cheapest IndexOptInfo for the target index */

If I'm reading this correctly, if there are multiple indexes that match
the unique index inference specification, we pick the cheapest one.
Isn't that unstable? Two backends running the same INSERT ON CONFLICT
statement might pick different indexes, and the decision might change
over time as the table is analyzed. I think we should have a more robust
rule. Could we easily just use all matching indexes?


> ... Deferred unique constraints are not supported as
> +   arbiters of whether an alternative <literal>ON CONFLICT</> path
> +   should be taken.

We really need to find a shorter term for "arbiter of whether an
alternative path should be taken". Different variations of that term are
used a lot, and it's tedious to read.

> * There is still an unresolved semantics issue with unique index
> inference and non-default opclasses. I think it's sufficient that the
> available/defined unique indexes dictate our idea of a unique
> violation (that necessitates taking the alternative path). Even in a
> world where there exists a non-default opclass with an "equals"
> operator that does not match that of the default opclass (that is not
> really the world we live in, because the only counter-example known is
> made that way specifically to *discourage* its use by users), this
> seems okay to me. It seems okay to me because surely the relevant
> definition of equality is the one actually chosen for the available
> unique index. If there exists an ambiguity for some strange reason
> (i.e. there are two unique indexes, on the same attribute(s), but with
> different "equals" operators), then its a costing issue, so the
> behavior given is essentially non-predictable (it could end up being
> either...but hey, those are the semantics). I have a very hard time
> imagining how that could ever be the case, even when we have (say)
> case insensitive opclasses for the text type. A case insensitive
> opclass is stricter than a case sensitive opclass.  Why would a user
> ever want both on the same attribute(s) of the same table? Is the user
> really more or less expecting to never get a unique violation on the
> non-arbitrating unique index, despite all this?
>
> If reviewers are absolutely insistent that this theoretical ambiguity
> is a problem, we can add an optional CREATE INDEX style opclass
> specification (I'm already using the IndexElems representation from
> CREATE INDEX for the inference specification, actually, so that would
> be easy enough). I really have a hard time believing that the ugliness
> is a problem for those hypothetical users that eventually consider
> "equals" operator ambiguity among opclasses among available unique
> indexes to be a problem. I haven't just gone and implemented this
> already because I didn't want to document that an opclass
> specification will be accepted. Since there is literally no reason why
> anyone would care today, I see no reason to add what IMV would really
> just be cruft.

I've been thinking that it would be nice to be able to specify a
constraint name. Naming an index directly feels wrong, as in relational
and SQL philosophy, indexes are just an implementation detail, but
naming a constraint is a fair game. It would also be nice to be able to
specify "use the primary key".

Attached patch contains a few more things I saw at a quick read.

- Heikki


Attachment

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Mar 5, 2015 at 3:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Bruce Momjian kindly made available a server for stress-testing [1].
> I'm using jjanes_upsert for this task (I stopped doing this for a
> little while, and was in need of a new server).

BTW, this was run for about another week on Bruce's server, and no
further issues arose affecting either the B-Tree or exclusion
constraint implementations. I've stopped with it for now, because it
feels unlikely that I'm going to find any more bugs this way.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I'm still not sure the way the speculative locking works is the best
> approach. Instead of clearing xmin on super-deletion, a new flag on the heap
> tuple seems more straightforward. And we could put the speculative insertion
> token in t_ctid, instead of stashing it in the PGPROC array. That would
> again seem more straightforward.

I see the appeal of that approach. What concerns me about that
approach is that it makes it the problem of the inserter to confirm
its insertion, even though overwhelmingly, speculative insertions
succeeds (the race window is small due to the pre-check). The current
speculative token is legitimately a very short lived thing, a thing
that I hesitate to write to disk at all. So our optimistic approach to
inserting speculatively loses its optimism, which I don't like, if you
know what I mean.

OTOH, apart from avoiding stashing things in PGPROC, I do see another
advantage to what you outline. Doing things this way (and making the
insertion and confirmation of a speculative insertion a two-phased
thing) unambiguously fixes all problems with logical decoding, without
adding unexpected cross-change-coordination tricks during transaction
reassembly, and really without much further thought. All we do is get
a new case to decode, that ultimately produces a
REORDER_BUFFER_CHANGE_INSERT change, which Andres more or less wanted
to do anyway.

Under this scheme with using t_ctid, heap_insert() would do minimal
WAL logging, necessary for the same reasons that some WAL logging is
required within heap_lock_tuple() (so the new record type is very
similar to the existing, simple xl_heap_lock record type). We'd use
one of the two remaining bits within t_infomask2 for this, so that
waiters will know to interpret t_ctid as a speculative insertion token
(and then wait on it). Then, after speculative insertion succeeded,
we'd WAL log something closer to an UPDATE to confirm that insertion
was in fact successful, which is where the contents of the new tuple
is actually finally WAL-logged (like UPDATEs used to be, but without a
new tuple being WAL-logged at all, since there of course is none).

Is that more or less what you have in mind?

> A couple of quick random comments:
>
>> /*
>>  * plan_speculative_use_index
>>  *              Use the planner to decide speculative insertion arbiter
>> index
>>  *
>>  * Among indexes on target of INSERT ... ON CONFLICT, decide which index
>> to
>>  * use to arbitrate taking alternative path.  This should be called
>>  * infrequently in practice, because it's unusual for more than one index
>> to
>>  * be available that can satisfy a user-specified unique index inference
>>  * specification.
>>  *
>>  * Note: caller had better already hold some type of lock on the table.
>>  */
>> Oid
>> plan_speculative_use_index(PlannerInfo *root, List *indexList)
>> {
>>         ...
>>         /* Locate cheapest IndexOptInfo for the target index */
>
>
> If I'm reading this correctly, if there are multiple indexes that match the
> unique index inference specification, we pick the cheapest one. Isn't that
> unstable? Two backends running the same INSERT ON CONFLICT statement might
> pick different indexes, and the decision might change over time as the table
> is analyzed. I think we should have a more robust rule. Could we easily just
> use all matching indexes?

Robert feel pretty strongly that there should be a costing aspect to
this. Initially I was skeptical of this, but just did what he said all
the same. But I've since come around to his view entirely because we
now support partial unique indexes.

You can add a predicate to a unique index inference specification, and
you're good, even if there is no partial index on those
attributes/expressions exactly matching the DML/inference predicate -
we use predicate_implied_by() for this, which works with an equivalent
non-partial unique index. This is because a non-partial unique index
covers those attributes sufficiently. There may be value in preferring
the cheap partial index, and then verifying that it actually did cover
the final inserted tuple version (which is how things work now). If we
cannot discriminate against one particular unique index, making sure
it covers the inserted heap tuple (by throwing a user-visible
ERRCODE_TRIGGERED_ACTION_EXCEPTION error if it doesn't, as I currently
do within ExecInsertIndexTuples()) is on shaky ground as a
user-visible behavior. I like that behavior, though - seems less
surprising than letting a failure to actually cover a selective
partial unique index predicate (that of the one and only arbiter
index) slide. You can only get this ERRCODE_TRIGGERED_ACTION_EXCEPTION
error when you actually had a predicate in your unique index inference
specification in the first place, so seems likely that you want the
error. But, I also like supporting unique indexes that are non-partial
even in the presence of a predicate in the unique index inference
specification clause, because, as I said, that's still correct - I can
see not doing this breaking queries when a partial unique index is
dropped due to a more general uniqueness requirement.

Besides, having a list of arbiter unique indexes that must all be
equivalent for the user's purpose anyway is very awkward indeed. It
seems very confusing. If they all have equivalent definitions anyway,
as they must, then it must not matter one bit which one you take
(except perhaps on cost grounds). It might be unstable, but it
couldn't possibly matter, so why bother with anything else? If it
*does* matter on semantics grounds, then we have bigger problems.

There is a tension between indexes as an implementation detail, and
indexes as the thing that define the semantics for the purposes of
this feature here (in a way that isn't baked into the DML statement,
but is only known from a cataloged definition of an index or its
physical structure). I have not fully resolved that tension here, but
I think I have a good practical balance between the two extremes. Does
that make sense?

>> ... Deferred unique constraints are not supported as
>> +   arbiters of whether an alternative <literal>ON CONFLICT</> path
>> +   should be taken.
>
>
> We really need to find a shorter term for "arbiter of whether an alternative
> path should be taken". Different variations of that term are used a lot, and
> it's tedious to read.

Okay. I'll think about that.

> I've been thinking that it would be nice to be able to specify a constraint
> name. Naming an index directly feels wrong, as in relational and SQL
> philosophy, indexes are just an implementation detail, but naming a
> constraint is a fair game. It would also be nice to be able to specify "use
> the primary key".

Now that we have partial unique indexes covered, which never have a
constraint (I think - see my remarks above), this is probably okay.
But that doesn't really solve the highly theoretical (non-)issue with
"equals" operator ambiguity, which I called out as an open issue
recently [1]. That's still at thing we need to decide on, even if that
means that you agree with me that it doesn't matter (and since this
naming of constraints in DML is an escape hatch from that problem too,
that seems more likely than before). What do you think?

> Attached patch contains a few more things I saw at a quick read.

This is mostly documentation stuff. I can certainly address this
feedback quickly and easily, and will do so soon.

[1] http://www.postgresql.org/message-id/CAM3SWZS9DTFt1=sQT=WFdp5UFkOac2Qc1i5WE-Od4BXJZ=JsCw@mail.gmail.com
-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Mon, Mar 16, 2015 at 8:10 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> (Note there is some bitrot in gram.y that prevents the first patch
> from applying cleanly to HEAD)

That's trivially fixable. I'll have those fixes in the next revision,
once I firm some things up with Heikki.

> I tested using the attached script, and one test didn't behave as I
> expected. I believe the following should have been a valid upsert
> (following the update path) but actually it failed:
>
> INSERT INTO t1 VALUES (4, 0) ON CONFLICT (a) UPDATE SET b = 1;
>
> AFAICT, it is applying a WITH CHECK OPTION with qual "b > 0 AND a % 2
> = 0" to the about-to-be-updated tuple (a=4, b=0), which is wrong
> because the "b > 0" check (policy p3) should only be applied to the
> post-update tuple.
>
> Possibly I'm missing something though.

I think that you may have. Did you read the commit message/docs of the
RLS commit 0004-*? You must consider the second point here, I believe:

""""
The 3 places that RLS policies are enforced are:

* Against row actually inserted, after insertion proceeds successfully (INSERT-applicable policies only).

* Against row in target table that caused conflict.  The implementation is careful not to leak the contents of that row
indiagnostic messages (INSERT-applicable *and* UPDATE-applicable policies).
 

* Against the version of the row added by to the relation after ExecUpdate() is called (INSERT-applicable *and*
UPDATE-applicablepolicies).
 

""""

You're seeing a failure that applies to the target tuple of the UPDATE
(the tuple that we can't leak the contents of). I felt it was best to
check all policies against the target/existing tuple, including both
WITH CHECK OPTIONS and USING quals (which are both enforced).

I can see why you might not like that behavior, but it is the intended
behavior. I thought that this whole intersection of RLS + UPSERT is
complex enough that it would be best to be almost as conservative as
possible in what fails and what succeeds. The one exception is when
the insert path is actually taken, since the statement is an INSERT
statement.
-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Dean Rasheed
Date:
On 17 March 2015 at 23:25, Peter Geoghegan <pg@heroku.com> wrote:
>> Possibly I'm missing something though.
>
> I think that you may have. Did you read the commit message/docs of the
> RLS commit 0004-*? You must consider the second point here, I believe:
>
> """"
> The 3 places that RLS policies are enforced are:
>
> * Against row actually inserted, after insertion proceeds successfully
>   (INSERT-applicable policies only).
>
> * Against row in target table that caused conflict.  The implementation
>   is careful not to leak the contents of that row in diagnostic
>   messages (INSERT-applicable *and* UPDATE-applicable policies).
>
> * Against the version of the row added by to the relation after
>   ExecUpdate() is called (INSERT-applicable *and* UPDATE-applicable
>   policies).
>
> """"
>

Yes, I read that, and I agree with the intention to not leak data
according to both the INSERT and UPDATE policies, however...

> You're seeing a failure that applies to the target tuple of the UPDATE
> (the tuple that we can't leak the contents of). I felt it was best to
> check all policies against the target/existing tuple, including both
> WITH CHECK OPTIONS and USING quals (which are both enforced).
>

I think that's an incorrect implementation of the RLS UPDATE policy.
The WITH CHECK quals of a RLS policy are intended to be applied to the
NEW data, not the existing data. This patch is applying the WITH CHECK
quals to both the existing and NEW tuples, which runs counter to the
way RLS polices are normally enforced, and I think that will just lead
to confusion.

> I can see why you might not like that behavior, but it is the intended
> behavior. I thought that this whole intersection of RLS + UPSERT is
> complex enough that it would be best to be almost as conservative as
> possible in what fails and what succeeds. The one exception is when
> the insert path is actually taken, since the statement is an INSERT
> statement.

The problem with that is that the user will see errors saying that the
data violates the RLS WITH CHECK policy, when they might quite
reasonably argue that it doesn't. That's not really being
conservative. I'd argue it's a bug.

Regards,
Dean



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Robert Haas
Date:
On Tue, Mar 17, 2015 at 3:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I've been thinking that it would be nice to be able to specify a constraint
> name. Naming an index directly feels wrong, as in relational and SQL
> philosophy, indexes are just an implementation detail, but naming a
> constraint is a fair game. It would also be nice to be able to specify "use
> the primary key".

Intuitively, I think you should specify an operator name, not a
constraint name.  That's what we do for, e.g., exclusion constraints,
and it feels right.  People sometimes create and drop indexes (and
thus, perhaps, the constraints that depend on them) for maintenance
reasons where a change in semantics will be unwelcome.  But I don't
accept Peter's argument that it's OK to be indifferent to which
particular equality semantics are being used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Robert Haas
Date:
On Tue, Mar 17, 2015 at 6:27 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I'm still not sure the way the speculative locking works is the best
>> approach. Instead of clearing xmin on super-deletion, a new flag on the heap
>> tuple seems more straightforward. And we could put the speculative insertion
>> token in t_ctid, instead of stashing it in the PGPROC array. That would
>> again seem more straightforward.
>
> I see the appeal of that approach. What concerns me about that
> approach is that it makes it the problem of the inserter to confirm
> its insertion, even though overwhelmingly, speculative insertions
> succeeds (the race window is small due to the pre-check). The current
> speculative token is legitimately a very short lived thing, a thing
> that I hesitate to write to disk at all. So our optimistic approach to
> inserting speculatively loses its optimism, which I don't like, if you
> know what I mean.

Modifying a shared buffer is not the same as writing to disk.

>> If I'm reading this correctly, if there are multiple indexes that match the
>> unique index inference specification, we pick the cheapest one. Isn't that
>> unstable? Two backends running the same INSERT ON CONFLICT statement might
>> pick different indexes, and the decision might change over time as the table
>> is analyzed. I think we should have a more robust rule. Could we easily just
>> use all matching indexes?
>
> Robert feel pretty strongly that there should be a costing aspect to
> this. Initially I was skeptical of this, but just did what he said all
> the same. But I've since come around to his view entirely because we
> now support partial unique indexes.

I think Heikki's concern is something different, although I am not
altogether up to speed on this and may be confused.  The issue is:
suppose that process A and process B are both furiously upserting into
the same table with the same key column but, because they have
different costing parameters, process A consistently chooses index X
and process B consistently chooses index Y.  In that situation, will
we deadlock, livelock, error out, bloat, or get any other undesirable
behavior, or is that A-OK?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 03/18/2015 06:23 PM, Robert Haas wrote:
> On Tue, Mar 17, 2015 at 6:27 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> I'm still not sure the way the speculative locking works is the best
>>> approach. Instead of clearing xmin on super-deletion, a new flag on the heap
>>> tuple seems more straightforward. And we could put the speculative insertion
>>> token in t_ctid, instead of stashing it in the PGPROC array. That would
>>> again seem more straightforward.
>>
>> I see the appeal of that approach. What concerns me about that
>> approach is that it makes it the problem of the inserter to confirm
>> its insertion, even though overwhelmingly, speculative insertions
>> succeeds (the race window is small due to the pre-check). The current
>> speculative token is legitimately a very short lived thing, a thing
>> that I hesitate to write to disk at all. So our optimistic approach to
>> inserting speculatively loses its optimism, which I don't like, if you
>> know what I mean.
>
> Modifying a shared buffer is not the same as writing to disk.

AFAICS, there is no need to go and clear the tag after the insert has 
completed.

Here's what I had in mind: the inserter tags the tuple with the 
speculative insertion token, by storing the token in the t_ctid field. 
If the inserter needs to super-delete the tuple, it sets xmax like in a 
regular deletion, but also sets another flag to indicate that it was a 
super-deletion.

When another backend inserts, and notices that it has a potential 
conflict with the first tuple, it tries to acquire a hw-lock on the 
token. In most cases, the inserter has long since completed the 
insertion, and the acquisition succeeds immediately but you have to 
check because the token is not cleared on a completed insertion. It 
would be slightly faster for the second backend if the inserter actually 
removed the token after the insertion has completed: it wouldn't need to 
check the lock if the tuple was not tagged in the first place. But 
that'd be just a tiny optimization, for the rare case that there is a 
conflict, and surely not worth it.

Hmm, I just realized a little fly in that ointment: if the inserter 
inserts enough tuples to wrap around the token counter (4 billion?) in a 
single transaction, another backend that inserts could confuse a new 
speculative insertion with one that completed a long time ago. The risk 
seems small enough that we could just accept it. I don't think anything 
particularly bad would happen on a false positive here. Or we could also 
use all 48 bits available in the t_ctid to push it truly in the realm of 
ain't-gonna-happen. Or we could use (relfilenode, block,  token) as the 
lock tag for the SpeculativeInsertionLock.

Regarding the physical layout: We can use a magic OffsetNumber value 
above MaxOffsetNumber to indicate that the t_ctid field stores a token 
rather than a regular ctid value. And another magic t_ctid value to 
indicate that a tuple has been super-deleted. The token and the 
super-deletion flag are quite ephemeral, they are not needed after the 
inserting transaction has completed, so it's nice to not consume the 
valuable infomask bits for these things. Those states are conveniently 
not possible on an updated tuple, when we would need the t_ctid field 
for it's current purpose.

>>> If I'm reading this correctly, if there are multiple indexes that match the
>>> unique index inference specification, we pick the cheapest one. Isn't that
>>> unstable? Two backends running the same INSERT ON CONFLICT statement might
>>> pick different indexes, and the decision might change over time as the table
>>> is analyzed. I think we should have a more robust rule. Could we easily just
>>> use all matching indexes?
>>
>> Robert feel pretty strongly that there should be a costing aspect to
>> this. Initially I was skeptical of this, but just did what he said all
>> the same. But I've since come around to his view entirely because we
>> now support partial unique indexes.
>
> I think Heikki's concern is something different, although I am not
> altogether up to speed on this and may be confused.  The issue is:
> suppose that process A and process B are both furiously upserting into
> the same table with the same key column but, because they have
> different costing parameters, process A consistently chooses index X
> and process B consistently chooses index Y.  In that situation, will
> we deadlock, livelock, error out, bloat, or get any other undesirable
> behavior, or is that A-OK?

Right, that's what I had in mind.

- Heikki



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Wed, Mar 18, 2015 at 11:41 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> AFAICS, there is no need to go and clear the tag after the insert has
> completed.
>
> Here's what I had in mind: the inserter tags the tuple with the speculative
> insertion token, by storing the token in the t_ctid field. If the inserter
> needs to super-delete the tuple, it sets xmax like in a regular deletion,
> but also sets another flag to indicate that it was a super-deletion.
>
> When another backend inserts, and notices that it has a potential conflict
> with the first tuple, it tries to acquire a hw-lock on the token. In most
> cases, the inserter has long since completed the insertion, and the
> acquisition succeeds immediately but you have to check because the token is
> not cleared on a completed insertion. It would be slightly faster for the
> second backend if the inserter actually removed the token after the
> insertion has completed: it wouldn't need to check the lock if the tuple was
> not tagged in the first place. But that'd be just a tiny optimization, for
> the rare case that there is a conflict, and surely not worth it.
>
> Hmm, I just realized a little fly in that ointment: if the inserter inserts
> enough tuples to wrap around the token counter (4 billion?) in a single
> transaction, another backend that inserts could confuse a new speculative
> insertion with one that completed a long time ago. The risk seems small
> enough that we could just accept it. I don't think anything particularly bad
> would happen on a false positive here. Or we could also use all 48 bits
> available in the t_ctid to push it truly in the realm of ain't-gonna-happen.
> Or we could use (relfilenode, block,  token) as the lock tag for the
> SpeculativeInsertionLock.

Maybe we'd use all 48-bits anyway, but it's not a major concern either
way. Speculative token locks (by design) are only held for an instant.
I think a spurious wait would be entirely benign, and very unlikely.

> Regarding the physical layout: We can use a magic OffsetNumber value above
> MaxOffsetNumber to indicate that the t_ctid field stores a token rather than
> a regular ctid value. And another magic t_ctid value to indicate that a
> tuple has been super-deleted. The token and the super-deletion flag are
> quite ephemeral, they are not needed after the inserting transaction has
> completed, so it's nice to not consume the valuable infomask bits for these
> things. Those states are conveniently not possible on an updated tuple, when
> we would need the t_ctid field for it's current purpose.

You seem pretty convinced that this is the way to go. I'll try and
produce a revision that works this way shortly. I don't think that it
will be all that hard to come up with something to prove the idea out.

>> I think Heikki's concern is something different, although I am not
>> altogether up to speed on this and may be confused.  The issue is:
>> suppose that process A and process B are both furiously upserting into
>> the same table with the same key column but, because they have
>> different costing parameters, process A consistently chooses index X
>> and process B consistently chooses index Y.  In that situation, will
>> we deadlock, livelock, error out, bloat, or get any other undesirable
>> behavior, or is that A-OK?
>
> Right, that's what I had in mind.

Oh, I see. I totally failed to understand that that was the concern.

I think it'll be fine. The pre-check is going to look for a heap tuple
using one or the other of (say) a pair of equivalent indexes. We might
miss the heap tuple because we picked an index that had yet to have a
physical index tuple inserted, and then hit a conflict on optimistic
insertion (the second phase). But there is no reason to think that
that won't happen anyway. The ordering of operations isn't critical.

The one issue you might still have is a duplicate violation, because
you happened to infer the unique index that does not get to tolerate
unique violations as conflicts that can be recovered from (and there
was a race, which is unlikely). I don't really care about this,
though. You really are inferring one particular unique index, and for
reasons like this I think it would be a mistake to try to pretend that
the unique index is 100% an implementation detail. That's why I called
the new clause a unique index inference specification.

This hypothetical set of unique indexes will always have n-1 redundant
unique indexes - they must closely match. That's something that calls
into question why the user wants things this way to begin with. Also,
note that one unique index will consistently "win", since the
insertion order is stable (the relcache puts them in OID order). So it
will not be all over the map.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Wed, Mar 18, 2015 at 9:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 17, 2015 at 3:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I've been thinking that it would be nice to be able to specify a constraint
>> name. Naming an index directly feels wrong, as in relational and SQL
>> philosophy, indexes are just an implementation detail, but naming a
>> constraint is a fair game. It would also be nice to be able to specify "use
>> the primary key".
>
> Intuitively, I think you should specify an operator name, not a
> constraint name.  That's what we do for, e.g., exclusion constraints,
> and it feels right.  People sometimes create and drop indexes (and
> thus, perhaps, the constraints that depend on them) for maintenance
> reasons where a change in semantics will be unwelcome.

I think we should use a constraint name. That is the plain reality of
what we're doing, and is less ambiguous than an operator. 99% of the
time you'll be happy with an unspecified, across-the-board IGNORE, or
won't be using exclusion constraints anyway (so we can infer a unique
index).

A constraint name covers all reasonable cases, since partial unique
indexes are otherwise covered (partial unique indexes do not have a
pg_constraint entry). Oracle has a hint for ignoring particular, named
unique indexes (not constraints). I realize that Oracle hints are not
supposed to affect semantics, but this is actually true (Google it).
This is a bit ugly, but less ugly as the hint, since as Heikki points
out we're only naming a constraint. Naming a constraint reflects the
reality of how the feature needs to work, and has a sort of precedent
from other systems.

> But I don't
> accept Peter's argument that it's OK to be indifferent to which
> particular equality semantics are being used.

I am not suggesting actual indifference makes sense. I am leaving it
up to the definition of available indexes. And there are no known
cases where it could matter anyway, unless you consider the ===
operator for tuples to be a counter example. And you need multiple
conflicting unique indexes on the exact same attributes/expressions on
attributes to begin with. Isn't that a highly esoteric thing to have
to worry about? Perhaps to the extent that literally no one will ever
have to care anyway? It's an oddball use-case, if ever I saw one.

Note: the issue of caring about equality semantics across B-Tree
opclasses of the same type, and the issue of naming unique indexes are
separate issues, AFAICT. No one should confuse them. The only
crossover is that the oddball use-case mentioned could use the named
constraint thing as an escape hatch.

As I've said, I think it's misguided to try to make unique indexes
100% an implementation detail. It's going to fall apart in edge cases,
like the one with multiple unique indexes that I mentioned in my last
e-mail. No one thinks of them that way, including users.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Robert Haas
Date:
On Wed, Mar 18, 2015 at 3:40 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> I think Heikki's concern is something different, although I am not
>>> altogether up to speed on this and may be confused.  The issue is:
>>> suppose that process A and process B are both furiously upserting into
>>> the same table with the same key column but, because they have
>>> different costing parameters, process A consistently chooses index X
>>> and process B consistently chooses index Y.  In that situation, will
>>> we deadlock, livelock, error out, bloat, or get any other undesirable
>>> behavior, or is that A-OK?
>>
>> Right, that's what I had in mind.
>
> Oh, I see. I totally failed to understand that that was the concern.
>
> I think it'll be fine. The pre-check is going to look for a heap tuple
> using one or the other of (say) a pair of equivalent indexes. We might
> miss the heap tuple because we picked an index that had yet to have a
> physical index tuple inserted, and then hit a conflict on optimistic
> insertion (the second phase). But there is no reason to think that
> that won't happen anyway. The ordering of operations isn't critical.
>
> The one issue you might still have is a duplicate violation, because
> you happened to infer the unique index that does not get to tolerate
> unique violations as conflicts that can be recovered from (and there
> was a race, which is unlikely). I don't really care about this,
> though. You really are inferring one particular unique index, and for
> reasons like this I think it would be a mistake to try to pretend that
> the unique index is 100% an implementation detail. That's why I called
> the new clause a unique index inference specification.
>
> This hypothetical set of unique indexes will always have n-1 redundant
> unique indexes - they must closely match. That's something that calls
> into question why the user wants things this way to begin with. Also,
> note that one unique index will consistently "win", since the
> insertion order is stable (the relcache puts them in OID order). So it
> will not be all over the map.

I think this is pretty lousy.  The reasons why the user wants things
that way is because they created a UNIQUE index and it got bloated
somehow with lots of dead tuples.  So they made a new UNIQUE index on
the same column and then they're planning to do a DROP INDEX
CONCURRENTLY on the old one, which is maybe even now in progress.  And
now they start getting duplicate key failures, the avoidance of which
was their whole reason for using UPSERT in the first place.  If I were
that user, I'd report that as a bug, and if someone told me that it
was intended behavior, I'd say "oh, so you deliberately designed this
feature to not work some of the time?".

ISTM that we need to (1) decide which operator we're using to compare
and then (2) tolerate conflicts in every index that uses that
operator.  In most cases there will only be one, but if there are
more, so be it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Mar 19, 2015 at 1:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think this is pretty lousy.  The reasons why the user wants things
> that way is because they created a UNIQUE index and it got bloated
> somehow with lots of dead tuples.  So they made a new UNIQUE index on
> the same column and then they're planning to do a DROP INDEX
> CONCURRENTLY on the old one, which is maybe even now in progress.  And
> now they start getting duplicate key failures, the avoidance of which
> was their whole reason for using UPSERT in the first place.  If I were
> that user, I'd report that as a bug, and if someone told me that it
> was intended behavior, I'd say "oh, so you deliberately designed this
> feature to not work some of the time?".
>
> ISTM that we need to (1) decide which operator we're using to compare
> and then (2) tolerate conflicts in every index that uses that
> operator.  In most cases there will only be one, but if there are
> more, so be it.


On reflection, I see your point. I'll try and do something about this too.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Wed, Mar 18, 2015 at 2:41 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Here's what I had in mind: the inserter tags the tuple with the speculative
> insertion token, by storing the token in the t_ctid field. If the inserter
> needs to super-delete the tuple, it sets xmax like in a regular deletion,
> but also sets another flag to indicate that it was a super-deletion.

I was able to quickly hack up a prototype of this in my hotel room at
pgConf.US. It works fine at first blush, passing the jjanes_upsert
stress tests and my own regression tests without a problem. Obviously
it needs more testing and clean-up before posting, but I was pleased
with how easy this was.

> When another backend inserts, and notices that it has a potential conflict
> with the first tuple, it tries to acquire a hw-lock on the token. In most
> cases, the inserter has long since completed the insertion, and the
> acquisition succeeds immediately but you have to check because the token is
> not cleared on a completed insertion.

You don't even have to check/take a ShareLock on the token when the
other xact committed/aborted, because you know that if it is there,
then based on that (and based on the fact that it wasn't super
deleted) the tuple is visible/committed, or (in the event of
other-xact-abort) not visible/aborted. In other words, we continue to
only check for a speculative token when the inserting xact is in
flight - we just take the token from the heap now instead. Not much
needs to change, AFAICT.

> Regarding the physical layout: We can use a magic OffsetNumber value above
> MaxOffsetNumber to indicate that the t_ctid field stores a token rather than
> a regular ctid value. And another magic t_ctid value to indicate that a
> tuple has been super-deleted. The token and the super-deletion flag are
> quite ephemeral, they are not needed after the inserting transaction has
> completed, so it's nice to not consume the valuable infomask bits for these
> things. Those states are conveniently not possible on an updated tuple, when
> we would need the t_ctid field for it's current purpose.

Haven't done anything about this yet. I'm just using an infomask2 bit
for now. Although that was only because I forgot that you suggested
this before having a go at implementing this new t_ctid scheme!

My next revision will have a more polished version of this scheme. I'm
not going to immediately act on Robert's feedback elsewhere (although
I'd like to), owing to time constraints - no reason to deny you the
opportunity to review the entirely unrelated low-level speculative
locking mechanism due to that.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Wed, Mar 25, 2015 at 12:42 PM, Peter Geoghegan <pg@heroku.com> wrote:
> My next revision will have a more polished version of this scheme. I'm
> not going to immediately act on Robert's feedback elsewhere (although
> I'd like to), owing to time constraints - no reason to deny you the
> opportunity to review the entirely unrelated low-level speculative
> locking mechanism due to that.

Attached revision, V3.1, implements this slightly different way of
figuring out if an insertion is speculative, as discussed. We reuse
t_ctid for speculatively inserted tuples. That's where the counter
goes. I think that this is a significant improvement, since there is
no longer any need to touch the proc array for any reason, without
there being any significant disadvantage that I'm aware of. I also
fixed some bitrot, and a bug with index costing (the details aren't
terribly interesting - tuple width wasn't being calculated correctly).

I have worked through Heikki's feedback on documentation and code
tweaks, too (they were mostly just documentation changes).

Notably, I have not:

* Added ON CONFLICT PRIMARY KEY (might get to this later)

* Otherwise altered the inference specification. I have not allowed it
to name a constraint, which Heikki and I favor (Robert wanted to name
an operator directly). Again, just haven't got around to it.

* Done anything new with logical decoding. My handling of conflicts
during transaction reassembly remains questionable. I hope this can be
worked out soon, possibly with input from Andres. He is sitting next
to me at pgConf.US, so maybe we can work something out in person.

* Addressed the concerns held by Heikki and Robert about multiple
equivalent unique indexes. I confirmed through testing that this can
cause unique violations that are arguably spurious. It isn't too bad,
though - with a couple of unique indexes, jjanes_upsert requires high
concurrency (i.e. 128 clients) in order to see one or two or three
such unique violations after a minute or so. But even still, let's fix
it.

As I mentioned, I am at a conference, and obviously there are other
time pressures, so I haven't put as much testing into this revision as
I'd like. I thought that under the circumstances it was preferable to
post what I have sooner rather than later, though.

--
Peter Geoghegan

Attachment

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 03/26/2015 08:00 PM, Peter Geoghegan wrote:
> On Wed, Mar 25, 2015 at 12:42 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> My next revision will have a more polished version of this scheme. I'm
>> not going to immediately act on Robert's feedback elsewhere (although
>> I'd like to), owing to time constraints - no reason to deny you the
>> opportunity to review the entirely unrelated low-level speculative
>> locking mechanism due to that.
>
> Attached revision, V3.1, implements this slightly different way of
> figuring out if an insertion is speculative, as discussed. We reuse
> t_ctid for speculatively inserted tuples. That's where the counter
> goes. I think that this is a significant improvement, since there is
> no longer any need to touch the proc array for any reason, without
> there being any significant disadvantage that I'm aware of. I also
> fixed some bitrot, and a bug with index costing (the details aren't
> terribly interesting - tuple width wasn't being calculated correctly).

Cool. Quickly looking at the patch though - does it actually work as it 
is? RelationPutHeapTuple will overwrite the ctid field when the tuple is 
put on the page, so I don't think the correct token will make it to disk 
as the patch stands. Also, there are a few places where we currently 
check if t_ctid equals the tuple's location, and try to follow t_ctid if 
it doesn't. I think those need to be taught that t_ctid can also be a token.

- Heikki




Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Mar 26, 2015 at 2:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Attached revision, V3.1, implements this slightly different way of
>> figuring out if an insertion is speculative, as discussed. We reuse
>> t_ctid for speculatively inserted tuples. That's where the counter
>> goes. I think that this is a significant improvement, since there is
>> no longer any need to touch the proc array for any reason, without
>> there being any significant disadvantage that I'm aware of. I also
>> fixed some bitrot, and a bug with index costing (the details aren't
>> terribly interesting - tuple width wasn't being calculated correctly).
>
> Cool. Quickly looking at the patch though - does it actually work as it is?

The test cases pass, including jjanes_upsert, and stress tests that
test for unprincipled deadlocks. But yes, I am entirely willing to
believe that something that was written in such haste could be broken.
My manual testing was pretty minimal.

Sorry for posting a shoddy patch, but I thought it was more important
to show you that this is perfectly workable ASAP.

> RelationPutHeapTuple will overwrite the ctid field when the tuple is put on
> the page, so I don't think the correct token will make it to disk as the
> patch stands.

Oops - You're right. I find it interesting that this didn't result in
breaking my test cases. I guess that not having proc array locking
might have made the difference with unprincipled deadlocks, which I
could not recreate (and row locking saves us from breaking UPSERT, I
think - although if so the token lock would still certainly be needed
for the IGNORE variant). It is interesting that this wasn't obviously
broken for UPSERT, though. I think it at least suggests that when
testing, we need to be more careful with taking a working UPSERT as a
proxy for a working ON CONFLICT IGNORE.

> Also, there are a few places where we currently check if
> t_ctid equals the tuple's location, and try to follow t_ctid if it doesn't.
> I think those need to be taught that t_ctid can also be a token.

I did fix at least some of those. I thought that the choke point for
doing that was fairly small, entirely confined to one or two routines
with heapam.c. But it would surely be better to follow your suggestion
of using an invalid/magic tuple offset value to be sure that it cannot
possibly occur elsewhere. And I'm still using that infomask2 bit,
which is probably not really necessary. So that needs to change too.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Gavin Flower
Date:
On 27/03/15 09:14, Peter Geoghegan wrote:
> On Thu, Mar 26, 2015 at 2:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
[...]
> Oops - You're right. I find it interesting that this didn't result in
> breaking my test cases.
>
[...]

Reminds of the situation where I got an A++ for a COBOL programming 
assignment that successfully handled the test data provided - then I 
found a major bug when 'idly' reviewing my code!  The lecturer (also a 
highly experienced developer) was amused when I pointed it out to her, 
and she said I still deserved the A++!


Cheers,
Gavin



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
Hi,

Just had a longer chat with Peter about this patch.

* Not a fan of the heap flags usage, the reusage seems sketch to me
* Explain should show the arbiter index in text as well
* AddUniqueSpeculative is a bad name, it should refer IndexInfo
* Work on the ExecInsert() comments
* Let's remove the planner choosing the "cheapest" arbiter index; it should do all.
* s/infer_unique_index/infer_arbiter_index/
* Not supporting inheritance properly makes me uncomfortable. I don't think users will think that's a
acceptable/reasonablerestriction.
 
* Let's not use t_ctid directly, but add a wrapper
* The code uses LockTupleExclusive to lock rows. That means the fkey key locking doesn't work; That's annoying. This
meansthat using upsert will potentially cause deadlocks in other sessions :(. I think you'll have to determine what
lockto acquire by fetching the tuple, doing the key comparison, and acquire the appropriate lock. That should be fine.
 
* I think we should decouple the insertion and wal logging more. I think the promise tuple insertion should be
differentfrom the final insertion of the actual tuple. For one it seems cleaner to me, for another it will avoid the
uglynessaround logical decoding. I think also that the separation will make it more realistic to use something like
thisfor a COPY variant that doesn't raise unique violations and such.
 
* We discussed the infererence and that it should just reuse (code, grammar, docs) the column specification from create
index.
* Some more stuff I don't recall.


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Sat, Mar 28, 2015 at 6:36 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Just had a longer chat with Peter about this patch.

It was a very useful chat. Thanks for making yourself available to do it.

> * Not a fan of the heap flags usage, the reusage seems sketch to me
> * Explain should show the arbiter index in text as well
> * AddUniqueSpeculative is a bad name, it should refer IndexInfo
> * Work on the ExecInsert() comments
> * Let's remove the planner choosing the "cheapest" arbiter index; it
>   should do all.
> * s/infer_unique_index/infer_arbiter_index/

OK.

> * Not supporting inheritance properly makes me uncomfortable. I don't
>   think users will think that's a acceptable/reasonable restriction.

I'll look into making the inference specification deduce a child relation index.

> * Let's not use t_ctid directly, but add a wrapper

We talked about a union. This seems quite doable.

> * The code uses LockTupleExclusive to lock rows. That means the fkey key
>   locking doesn't work; That's annoying. This means that using upsert
>   will potentially cause deadlocks in other sessions :(. I think you'll
>   have to determine what lock to acquire by fetching the tuple, doing
>   the key comparison, and acquire the appropriate lock. That should be
>   fine.

Looking into the implementation of this. As we discussed, the
difficulty about avoiding a lock escalation within ExecUpdate() is
that we must fetch the row, run EvalPlanQual() to check if the new row
version generated by updating will require a LockTupleExclusive or
LockTupleNoKeyExclusive, and then lock the row using the right
lockmode, and only then call ExecUpdate(). Right now, UPSERT benefits
from fetching and locking the row together, so going this way imposes
a little additional complexity. It's probably worth it, though.

> * I think we should decouple the insertion and wal logging more. I think
>   the promise tuple insertion should be different from the final
>   insertion of the actual tuple. For one it seems cleaner to me, for
>   another it will avoid the uglyness around logical decoding. I think
>   also that the separation will make it more realistic to use something
>   like this for a COPY variant that doesn't raise unique violations and
>   such.

Your COPY argument swung this for me. I'm looking into the implementation.

> * We discussed the infererence and that it should just reuse (code,
>   grammar, docs) the column specification from create index.

Agreed.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 03/30/2015 07:20 PM, Peter Geoghegan wrote:
>
>> >* I think we should decouple the insertion and wal logging more. I think
>> >   the promise tuple insertion should be different from the final
>> >   insertion of the actual tuple. For one it seems cleaner to me, for
>> >   another it will avoid the uglyness around logical decoding. I think
>> >   also that the separation will make it more realistic to use something
>> >   like this for a COPY variant that doesn't raise unique violations and
>> >   such.
> Your COPY argument swung this for me. I'm looking into the implementation.

I'm pretty sceptical of that. ISTM you'll need to do modify the page 
twice for each insertion, first to insert the promise tuple, and then to 
turn the promise tuple into a real tuple. And WAL-log both updates. 
That's going to hurt performance.

To recover COPY from unique violations, you can just do the same as 
INSERT ON CONFLICT IGNORE does, and super-delete the inserted tuple on 
conflict. To recover from any random error, you'll need to abort the 
(sub)transaction anyway, and I don't see how it helps to separate the 
insertion of the promise tuple and the "finalization" of the insertion.

- Heikki




Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Tue, Mar 31, 2015 at 1:09 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I'm pretty sceptical of that. ISTM you'll need to do modify the page twice
> for each insertion, first to insert the promise tuple, and then to turn the
> promise tuple into a real tuple. And WAL-log both updates. That's going to
> hurt performance.


Andres' wish to do things that way is at least partially motivated by
having logical decoding just work. The co-ordination I'm currently
doing across changes within transaction reassembly is pretty ugly.
Andres has strongly suggested that it's broken, too, since a snapshot
change could occur between a speculative insertion and its super
deletion within transaction resassembly, thus invalidating the
assumption that the next change not being a super deletion means there
is no such super deletion change (i.e. the insert should be treated as
"real").

Anyway, if we don't do this, we'll need to make sure my changes to
transaction reassembly are sound. Hopefully that's an easy fix.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Tue, Mar 31, 2015 at 2:26 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Andres' wish to do things that way is at least partially motivated by
> having logical decoding just work.

I should add that there appears to be some need to terminate the loop
of speculative token waiting. By that I mean that since we're not
looking at the proc array to get a speculative token from
HeapTupleSatisfiesDirty() now, there is a livelock hazard. That goes
away when the speculative inserter cleans up after itself, as Andres
proposed. It would also go away if any speculative waiter cleaned up
after the inserter, which you suggested (that would be kind of
invasive to places like _bt_doinsert(), though). Finally, it would
also work if HeapTupleSatisfiesDirty() tested if the token was still
held directly, before reporting a speculative token, by for example
attempting to briefly acquire a ShareLock on the token (but that would
mean that the extra lock acquisition would be required unless and
until someone updated that originally-speculative tuple, in doing so
finally changing its t_ctid).

I think that we definitely have to do something like this, in any
case. Maybe just have SpeculativeTokenWait deal with the clean up is
cleanest, if we're not going to have inserters clean-up after
themselves immediately per Andres' suggestion.
-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Mon, Mar 30, 2015 at 9:20 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> * The code uses LockTupleExclusive to lock rows. That means the fkey key
>>   locking doesn't work; That's annoying. This means that using upsert
>>   will potentially cause deadlocks in other sessions :(. I think you'll
>>   have to determine what lock to acquire by fetching the tuple, doing
>>   the key comparison, and acquire the appropriate lock. That should be
>>   fine.
>
> Looking into the implementation of this. As we discussed, the
> difficulty about avoiding a lock escalation within ExecUpdate() is
> that we must fetch the row, run EvalPlanQual() to check if the new row
> version generated by updating will require a LockTupleExclusive or
> LockTupleNoKeyExclusive, and then lock the row using the right
> lockmode, and only then call ExecUpdate(). Right now, UPSERT benefits
> from fetching and locking the row together, so going this way imposes
> a little additional complexity. It's probably worth it, though.

Why do you think deadlocks will be a particular concern? Sure, it
could make the difference between deadlocking and not deadlocking,
which is bad, but it's not obvious to me that the risk would be any
worse than the risk of deadlocking with FKs in 9.2. Is that really all
that bad?


-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
I attach a revision - V3.2

On Mon, Mar 30, 2015 at 9:20 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> * Not supporting inheritance properly makes me uncomfortable. I don't
>>   think users will think that's a acceptable/reasonable restriction.

Inheritance is now supported to the greatest extent anyone could
reasonably expect. In other words, inference works just fine for child
tables, and now parent tables too (so you can UPSERT both). Of course,
because of the existing restriction on making unique constraints cross
tables participating in inheritance, UPSERT similarly cannot work
across inheritance tables. This should be totally obvious to anyone
that uses inheritance - if you won't get a conflict (duplicate
violation) based on an ordinary insert, then of course UPSERT will not
take the alternative UPDATE path just because someone imagines that a
(would-be) duplicate violation should happen.

>> * Let's not use t_ctid directly, but add a wrapper
>
> We talked about a union. This seems quite doable.

This now uses a union. And it now actually stores a token value!

>> * The code uses LockTupleExclusive to lock rows. That means the fkey key
>>   locking doesn't work; That's annoying. This means that using upsert
>>   will potentially cause deadlocks in other sessions :(. I think you'll
>>   have to determine what lock to acquire by fetching the tuple, doing
>>   the key comparison, and acquire the appropriate lock. That should be
>>   fine.
>
> Looking into the implementation of this.

Not quite sold on this, on second thought (although let's focus on the
WAL logging stuff - the immediate blocker to committing the IGNORE
variant). Perhaps you can explain why you think it's important.

I like that I am able to fully lock the row when the predicate isn't
passed. I think that's a useful feature in some cases (it particularly
makes sense for higher isolation levels that expect to repeat the same
command and not get a serialization failure). It also keeps the
already complicated function ExecLockUpdateTuple() significantly more
simple.

>> * I think we should decouple the insertion and wal logging more. I think
>>   the promise tuple insertion should be different from the final
>>   insertion of the actual tuple. For one it seems cleaner to me, for
>>   another it will avoid the uglyness around logical decoding. I think
>>   also that the separation will make it more realistic to use something
>>   like this for a COPY variant that doesn't raise unique violations and
>>   such.
>
> Your COPY argument swung this for me. I'm looking into the implementation.

I have a prototype implementation of this with V3.2 - it clearly needs
more work, but I thought it was best to post sooner rather than later.
I am reusing in-place update infrastructure for this. This should give
Heikki something to play with, since he wasn't quite sold on this
idea. Certainly, what I have here is not good enough to commit - there
is unnecessary double WAL logging of tuple contents, just for example.
More generally, my recent changes to heapam.c certainly lack polish. I
have something for us to discuss, though, and under the circumstances
I think that's a good thing. Grep for "XXX" and "TODO" comments for
more.

Logical decoding is not handled at all, since I hit a snag with
building a tuple from the in-place update WAL records, and I didn't
want to block on that (especially given the general uncertainty about
if and how to affirm that a speculative insertion succeeded - IWO, if
we should go Andres' way there to avoid making transaction reassembly
for decoding more messy). I have at least reverted the logical
decoding transaction reassembly peek-ahead thing that Andres hated so
much, though. I hope we can reach consensus on what to do on this
point of WAL logging/logical decoding in particular real soon now.

>> * We discussed the infererence and that it should just reuse (code,
>>   grammar, docs) the column specification from create index.

The inference specification now both accepts collation and opclass
specifications along the lines we discussed, and can infer multiple
unique indexes per Heikki/Robert's complaint (so there is no longer a
costing aspect to it at all).

There are lots of tests for the inference of collations and opclasses
- if you want to know how it works, look at those (e.g. to learn about
handling of edge cases with redundant or overlapping cases, perhaps
due only to differing collations). I've come up with something very
flexible/forgiving, I think. Also, new tests were added for the new
inheritance support.

The documentation has been updated to reflect all of this.

There is still no way to specify a named constraint (which is perhaps
useful for exclusion constraints - although perhaps it's good enough
to have the IGNORE variant only work with exclusion constraints in the
common case where there is no inference specification), or PRIMARY KEY
syntax. Still have not made text explain output display arbiter unique
indexes (although multiple arbiter unique indexes are now visible from
the non-text explain output, since as I mentioned multiple specific
unique indexes may now be inferred). This needs some more tweaking. It
was helpful with the new tests for inference (with complex variations
is collations and opclasses, and multiple unique indexes inferred in a
number of cases).

I've been meaning to revisit Dean Rasheed's recent remarks on RLS. But
that hasn't happened yet.

Thanks
--
Peter Geoghegan

Attachment

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Tue, Apr 7, 2015 at 10:59 PM, Peter Geoghegan <pg@heroku.com> wrote:
> The documentation has been updated to reflect all of this.

By the way, for the convenience of reviewers I continue to maintain a
mirror of pre-built documentation as outlined here:

https://wiki.postgresql.org/wiki/UPSERT#Documentation

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Wed, Mar 18, 2015 at 2:59 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Yes, I read that, and I agree with the intention to not leak data
> according to both the INSERT and UPDATE policies, however...
>
>> You're seeing a failure that applies to the target tuple of the UPDATE
>> (the tuple that we can't leak the contents of). I felt it was best to
>> check all policies against the target/existing tuple, including both
>> WITH CHECK OPTIONS and USING quals (which are both enforced).
>>
>
> I think that's an incorrect implementation of the RLS UPDATE policy.
> The WITH CHECK quals of a RLS policy are intended to be applied to the
> NEW data, not the existing data. This patch is applying the WITH CHECK
> quals to both the existing and NEW tuples, which runs counter to the
> way RLS polices are normally enforced, and I think that will just lead
> to confusion.

The big idea (the fine details of which Stephen appeared to be in
tentative agreement with [1]) is that an UPSERT is a hybrid between an
INSERT and an UPDATE, and not simply an INSERT and separate UPDATE
tied together. So at the very least the exact behavior of such a
hybrid cannot be assumed to be any particular way just from
generalizing from known behaviors for INSERT and UPDATE (which is a
usability issue, since the fine details of RLS are already complex
enough without UPSERT).

The INSERT policies are only enforced when a tuple is inserted
because, when the alternative path isn't taken then it's really just
an INSERT.

For the UPDATE path, where the stickiness/hybridness begins, we
enforce the target tuple passes both INSERT policies, and UPDATE
policies (including USING quals as WCO). The theory here is that if
you're entitled to INSERT it, you ought to be entitled to INSERT the
existing tuple in order to take the UPDATE path. And we bunch the
UPDATE USING quals + WCO for the sake of (conceptual, not
implementation) simplicity - they're already all WCO at that point.

Finally, the final tuple (generated using the EXCLUDED and TARGET
tuples, from the UPDATE) must pass the UPDATE WCO (including any that
originated as USING quals, a distinction that "no longer exists") as
well as INSERT policies. If you couldn't INSERT the tuple in the first
place (when there wasn't a conflict), then why should you be able to
UPSERT it? This is substantively the "same" row, no? You (the user)
are tacitly asserting that you don't care about whether the INSERT or
UPDATE path is taken anyway, so why should you care? Surely you'd want
this to fail as early as possible, rather than leaving it to chance. I
really do expect that people are only going to do simple
transformations in their UPDATE handler (e.g. "ON CONFLICT UPDATE set
count = TARGET.count + EXCLUDED.count"), so in practice it usually
doesn't matter.

Note that other systems that I looked at don't even support RLS with
SQL MERGE at all. So we have no precedent to consider that I'm aware
of, other than simply not supporting RLS, which would not be
outrageous IMV. I felt, given the ambiguity about how this should
differ from ordinary INSERTs + UPDATEs, that something quite
restrictive but not entirely restrictive (not supporting RLS, just
throwing an error all the time) was safest. In any case I doubt that
this will actually come up all that often.

> The problem with that is that the user will see errors saying that the
> data violates the RLS WITH CHECK policy, when they might quite
> reasonably argue that it doesn't. That's not really being
> conservative. I'd argue it's a bug.

Again, I accept that that's a valid interpretation of it. I have my
own opinion, but I will take the path of least resistance on this
point. What do other people think?

I'd appreciate it if you explicitly outlined what policies you feel
should be enforce at each of the 3 junctures within an UPSERT (post
INSERT, pre-UPDATE, post-UPDATE). I would also like you to be very
explicit about whether or not RLS WITH CHECK policies and USING quals
(presumably enforced as RLS WITH CHECK policies) from both INSERT and
UPDATE policies should be enforced at each point. In particular,
should I ever not treat RLS WCO and USING quals equivalently? (recall
that we don't want to elide an UPDATE silently, which makes much less
sense for UPSERT...I had assumed that whatever else, we'd always treat
WCO and USING quals from UPDATE (and ALL) policies equivalently, but
perhaps not).

Alternatively, perhaps you'd prefer to state your objection in terms
of the exact modifications you'd make to the above outline of the
existing behavior. I don't think I totally follow what you're saying
yet (which is the problem with being cleverer generally!). It is easy
to explain: The insert path is the same as always. Otherwise, both the
before and after tuple have all RLS policies (including USING quals)
enforced as WCOs. I think that it might be substantially easier to
explain that than to explain what you have in mind...let's see.

Thanks

[1] http://www.postgresql.org/message-id/20150109214041.GK3062@tamriel.snowman.net
-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Tue, Mar 31, 2015 at 7:02 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> Andres' wish to do things that way is at least partially motivated by
>> having logical decoding just work.

Attached revision, V3.3, adds actual support for ON CONFLICT with
logical decoding. I've implemented something along the lines Andres
wanted. There is now coordination across WAL records - the insert WAL
record, and an in-place update record that affirms that the
speculative insertion was successful. This does not involve
redundantly logging either the heap tuple header (which only appears
in the first of two records), or the tuple contents (which only
appears in the second of the two records). I stash the tuple header
metadata within the inserting xact's ReorderBufferTXN. This approach
appears to be robust.

Other changes:

* Significant refactoring of infer_arbiter_indexes() within the
optimizer. More tests, simpler code.

* Reworking of ExecInsert() comments, per feedback in NYC from Andres.
Also made the control flow there a little simpler.

* ON CONFLICT IGNORE is now fully supported on updatable views - an
inference specification can be included. ON CONFLICT UPDATE remains
unsupported (this can be revisited in a later release IMV).

* Dedicated inference element primnode - it wasn't cool that the
CREATE INDEX IndexElem parse node made its way into the optimizer
(where unique index inference occurs). This simplified some code, too.

* Per discussion in NYC with Andres, arbiter indexes now always appear
in explain output (not just for non-text explain output). I've also
made the changes to explain.c a lot simpler (I've further isolated the
kludge used to display quals from the implementation level sequential
scan that can appear in auxiliary UPDATE plans).

* Misc polishing. Obsolete comments were found in a few places. Fixed
build of contrib/pageinspect, that didn't get the memo about the
t_ctid union change.

I talked privately with Stephen about RLS. It seems likely that that
needs some behavioral changes, but nothing too invasive. I haven't got
around to implementing those yet, but I think they're about to make it
to the top of my todo list. In any case, that is still split up into a
separate commit (I anticipated that it would be a good idea to do
that, since the RLS discussion has yet to fully settle down). Nothing
has changed about that commit, though.

With a concerted effort, I think we can get this over the line for 9.5.

Andres: Please take a look at the logical decoding/WAL stuff. Your
input on those aspects would be particularly useful now.

I'm now pushing code to github regularly. The feature branch in
question is: https://github.com/petergeoghegan/postgres/commits/insert_conflict_ignore

Thanks
--
Peter Geoghegan

Attachment

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 04/15/2015 07:51 AM, Peter Geoghegan wrote:
> +heap_finish_speculative(Relation relation, HeapTuple tuple, bool conflict)
> +{
> +    if (!conflict)
> +    {
> +        /*
> +         * Update the tuple in-place, in the common case where no conflict was
> +         * detected during speculative insertion.
> +         *
> +         * When heap_insert is called in respect of a speculative tuple, the
> +         * page will actually have a tuple inserted.  However, during recovery
> +         * replay will add an all-zero tuple to the page instead, which is the
> +         * same length as the original (but the tuple header is still WAL
> +         * logged and will still be restored at that point).  If and when the
> +         * in-place update record corresponding to releasing a value lock is
> +         * replayed, crash recovery takes the final tuple value from there.
> +         * Thus, speculative heap records require two WAL records.
> +         *
> +         * Logical decoding interprets an in-place update associated with a
> +         * speculative insertion as a regular insert change.  In other words,
> +         * the in-place record generated affirms that a speculative insertion
> +         * completed successfully.
> +         */
> +        heap_inplace_update(relation, tuple);
> +    }
> +    else
> +    {

That's a bizarre solution. May I suggest a much simpler one:

Make the heap-insert record the same for normal and speculative 
insertions, except for a flag that's set if it's a speculative one. 
Replay as usual.

When the speculative insertion is finished, write a new kind of a WAL 
record for that. The record only needs to contain the ctid of the tuple. 
Replaying that record will clear the flag on the heap tuple that said 
that it was a speculative insertion.

In logical decoding, decode speculative insertions like any other 
insertion. To decode a super-deletion record, scan the reorder buffer 
for the transaction to find the corresponding speculative insertion 
record for the tuple, and remove it.

BTW, that'd work just as well without the new WAL record to finish a 
speculative insertion. Am I missing something?

> --- a/src/include/storage/off.h
> +++ b/src/include/storage/off.h
> @@ -26,6 +26,7 @@ typedef uint16 OffsetNumber;
>  #define InvalidOffsetNumber        ((OffsetNumber) 0)
>  #define FirstOffsetNumber        ((OffsetNumber) 1)
>  #define MaxOffsetNumber            ((OffsetNumber) (BLCKSZ / sizeof(ItemIdData)))
> +#define MagicOffsetNumber        (MaxOffsetNumber + 1)
>  #define OffsetNumberMask        (0xffff)        /* valid uint16 bits */

IMHO it would be nicer if the magic value was more constant, e.g. 0xffff 
or 0xfffe, instead of basing it on MaxOffsetNumber which depends on 
block size. I would rather not include MaxOffsetNumber of anything 
derived from it in the on-disk dormat.

- Heikki




Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-15 17:58:54 +0300, Heikki Linnakangas wrote:
> On 04/15/2015 07:51 AM, Peter Geoghegan wrote:
> >+heap_finish_speculative(Relation relation, HeapTuple tuple, bool conflict)
> >+{
> >+    if (!conflict)
> >+    {
> >+        /*
> >+         * Update the tuple in-place, in the common case where no conflict was
> >+         * detected during speculative insertion.
> >+         *
> >+         * When heap_insert is called in respect of a speculative tuple, the
> >+         * page will actually have a tuple inserted.  However, during recovery
> >+         * replay will add an all-zero tuple to the page instead, which is the
> >+         * same length as the original (but the tuple header is still WAL
> >+         * logged and will still be restored at that point).  If and when the
> >+         * in-place update record corresponding to releasing a value lock is
> >+         * replayed, crash recovery takes the final tuple value from there.
> >+         * Thus, speculative heap records require two WAL records.
> >+         *
> >+         * Logical decoding interprets an in-place update associated with a
> >+         * speculative insertion as a regular insert change.  In other words,
> >+         * the in-place record generated affirms that a speculative insertion
> >+         * completed successfully.
> >+         */
> >+        heap_inplace_update(relation, tuple);
> >+    }
> >+    else
> >+    {
> 
> That's a bizarre solution.

I tend to agree, but for different reasons.

> In logical decoding, decode speculative insertions like any other insertion.
> To decode a super-deletion record, scan the reorder buffer for the
> transaction to find the corresponding speculative insertion record for the
> tuple, and remove it.

Not that easy. That buffer is spilled to disk and such. As discussed.

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 04/15/2015 06:01 PM, Andres Freund wrote:
> On 2015-04-15 17:58:54 +0300, Heikki Linnakangas wrote:
>> On 04/15/2015 07:51 AM, Peter Geoghegan wrote:
>>> +heap_finish_speculative(Relation relation, HeapTuple tuple, bool conflict)
>>> +{
>>> +    if (!conflict)
>>> +    {
>>> +        /*
>>> +         * Update the tuple in-place, in the common case where no conflict was
>>> +         * detected during speculative insertion.
>>> +         *
>>> +         * When heap_insert is called in respect of a speculative tuple, the
>>> +         * page will actually have a tuple inserted.  However, during recovery
>>> +         * replay will add an all-zero tuple to the page instead, which is the
>>> +         * same length as the original (but the tuple header is still WAL
>>> +         * logged and will still be restored at that point).  If and when the
>>> +         * in-place update record corresponding to releasing a value lock is
>>> +         * replayed, crash recovery takes the final tuple value from there.
>>> +         * Thus, speculative heap records require two WAL records.
>>> +         *
>>> +         * Logical decoding interprets an in-place update associated with a
>>> +         * speculative insertion as a regular insert change.  In other words,
>>> +         * the in-place record generated affirms that a speculative insertion
>>> +         * completed successfully.
>>> +         */
>>> +        heap_inplace_update(relation, tuple);
>>> +    }
>>> +    else
>>> +    {
>>
>> That's a bizarre solution.
>
> I tend to agree, but for different reasons.
>
>> In logical decoding, decode speculative insertions like any other insertion.
>> To decode a super-deletion record, scan the reorder buffer for the
>> transaction to find the corresponding speculative insertion record for the
>> tuple, and remove it.
>
> Not that easy. That buffer is spilled to disk and such. As discussed.

Hmm, ok, I've read the "INSERT ... ON CONFLICT UPDATE and logical 
decoding" thread now, and I have to say that IMHO it's a lot more sane 
to handle this in ReorderBufferCommit() like Peter first did, than to 
make the main insertion path more complex like this.

Another idea is to never spill the latest record to disk, at least if it 
was a speculative insertion. Then you would be sure that when you see 
the super-deletion record, the speculative insertion it refers to is 
still in memory. That seems simple.

- Heikki




Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-15 18:53:15 +0300, Heikki Linnakangas wrote:
> Hmm, ok, I've read the "INSERT ... ON CONFLICT UPDATE and logical decoding"
> thread now, and I have to say that IMHO it's a lot more sane to handle this
> in ReorderBufferCommit() like Peter first did, than to make the main
> insertion path more complex like this.

I don't like Peter's way much. For one it's just broken. For another
it's quite annoying to trigger disk reads to figure out what actual type
of record something is.

If we go that way that's the way I think it should be done: Whenever we
encounter a speculative record we 'unlink' it from the changes that will
be reused for spooling from disk and do nothing further. Then we just
continue reading through the records. If the next thing we encounter is
a super-deletion we throw away that record. If it's another type of
change (or even bettter a 'speculative insertion succeeded' record)
insert it. That'll still require some uglyness, but it should not be too
bad.

I earlier thought that'd not be ok because there could be a new catalog
snapshot inbetween, but I was mistaken: The locking on the source
transaction prevents problems.

> Another idea is to never spill the latest record to disk, at least if it was
> a speculative insertion. Then you would be sure that when you see the
> super-deletion record, the speculative insertion it refers to is still in
> memory. That seems simple.

It's not guaranteed to be the last record, there can be records
inbetween (snapshots from other backends at the very least).


Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-15 17:58:54 +0300, Heikki Linnakangas wrote:
> When the speculative insertion is finished, write a new kind of a WAL record
> for that. The record only needs to contain the ctid of the tuple. Replaying
> that record will clear the flag on the heap tuple that said that it was a
> speculative insertion.
>
> In logical decoding, decode speculative insertions like any other insertion.
> To decode a super-deletion record, scan the reorder buffer for the
> transaction to find the corresponding speculative insertion record for the
> tuple, and remove it.
>
> BTW, that'd work just as well without the new WAL record to finish a
> speculative insertion. Am I missing something?

I'm, completely independent of logical decoding, of the *VERY* strong
opinion that 'speculative insertions' should never be visible when
looking with normal snapshots. For one it allows to simplify
considerations around wraparound (which has proven to be a very good
idea, c.f. multixacts + vacuum causing data corruption years after it
was thought to be harmless). For another it allows to reclaim/redefine
the bit after a database restart/upgrade. Given how complex this is and
how scarce flags are that seems like a really good property.

And avoiding those flags to be visible to the outside requires a WAL
record, otherwise it won't be correct on the standby.

Andres



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 04/16/2015 12:18 PM, Andres Freund wrote:
> On 2015-04-15 18:53:15 +0300, Heikki Linnakangas wrote:
>> Hmm, ok, I've read the "INSERT ... ON CONFLICT UPDATE and logical decoding"
>> thread now, and I have to say that IMHO it's a lot more sane to handle this
>> in ReorderBufferCommit() like Peter first did, than to make the main
>> insertion path more complex like this.
>
> I don't like Peter's way much. For one it's just broken. For another
> it's quite annoying to trigger disk reads to figure out what actual type
> of record something is.
>
> If we go that way that's the way I think it should be done: Whenever we
> encounter a speculative record we 'unlink' it from the changes that will
> be reused for spooling from disk and do nothing further. Then we just
> continue reading through the records. If the next thing we encounter is
> a super-deletion we throw away that record. If it's another type of
> change (or even bettter a 'speculative insertion succeeded' record)
> insert it. That'll still require some uglyness, but it should not be too
> bad.

Sounds good to me.

- Heikki




Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Apr 16, 2015 at 2:23 AM, Andres Freund <andres@anarazel.de> wrote:
> I'm, completely independent of logical decoding, of the *VERY* strong
> opinion that 'speculative insertions' should never be visible when
> looking with normal snapshots. For one it allows to simplify
> considerations around wraparound (which has proven to be a very good
> idea, c.f. multixacts + vacuum causing data corruption years after it
> was thought to be harmless). For another it allows to reclaim/redefine
> the bit after a database restart/upgrade. Given how complex this is and
> how scarce flags are that seems like a really good property.
>
> And avoiding those flags to be visible to the outside requires a WAL
> record, otherwise it won't be correct on the standby.

I'm a bit distracted here, and not sure exactly what you mean. What's
a normal snapshot?

Do you just mean that you think that speculative insertions should be
explicitly affirmed by a second record (making it not a speculative
tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has
no chance of seeing a tuple until it was affirmed by a second in-place
modification, regardless of tuple xmin xact commit status?

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Apr 16, 2015 at 2:18 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-04-15 18:53:15 +0300, Heikki Linnakangas wrote:
>> Hmm, ok, I've read the "INSERT ... ON CONFLICT UPDATE and logical decoding"
>> thread now, and I have to say that IMHO it's a lot more sane to handle this
>> in ReorderBufferCommit() like Peter first did, than to make the main
>> insertion path more complex like this.
>
> I don't like Peter's way much. For one it's just broken. For another
> it's quite annoying to trigger disk reads to figure out what actual type
> of record something is.
>
> If we go that way that's the way I think it should be done: Whenever we
> encounter a speculative record we 'unlink' it from the changes that will
> be reused for spooling from disk and do nothing further. Then we just
> continue reading through the records.

You mean we continue iterating through *changes* from ReorderBufferCommit()?

> If the next thing we encounter is
> a super-deletion we throw away that record. If it's another type of
> change (or even bettter a 'speculative insertion succeeded' record)
> insert it.

By "insert it", I gather you mean report to the the logical decoding
plugin as an insert change.

> That'll still require some uglyness, but it should not be too
> bad.

So, to be clear, what you have in mind is sort of a hybrid between my
first and second approaches (mostly my first approach).

We'd have coordination between records originally decoded into
changes, maybe "intercepting" them during xact reassembly, like my
first approach. We'd mostly do the same thing as the first approach,
actually. The major difference would be that there'd be explicit
"speculative affirmation" WAL records. But we wouldn't rely on those
affirmation records within ReorderBufferCommit() - we'd rely on the
*absence* of a super deletion WAL record (to report an insert change
to the decoding plugin). To emphasize, like my first approach, it
would be based on an *absence* of a super deletion WAL record.

However, like my second approach, there would be a "speculative
affirmation" WAL record. The new speculative affirmation WAL record
would however be quite different to what my second approach to logical
decoding (the in-place update record thing that was in V3.3) actually
did. In particular, it would be far more simple, and the tuple would
be built from the original insertion record within logical decoding.

Right now, I'm tired, so bear with me. What I think I'm not quite
getting here is how the new type of "affirmation" record affects
visibility (or anything else, actually). Clearly dirty snapshots need
to see the record to set a speculative insertion token for their
caller to wait on (and HeapTupleSatisfiesVacuum() cannot see the
speculatively inserted tuple as reclaimable, of course). They need
this *before* the speculative insertion is affirmed, of course.

Maybe you mean: If the speculative insertion xact is in progress, then
the tuple is visible to several types of snapshots (dirty, vacuum,
self, any). If it is not, then tuples are not visible because they are
only speculative (and not *confirmed*). If they were confirmed, and
the xact was committed, then those tuples are logically and physically
indistinguishable from tuples that were inserted in the ordinary
manner.

Is that it? Basically, the affirmation records have nothing much to do
with logical decoding in particular. But you still need to super
delete, so that several types of snapshots ((dirty, vacuum, self, any)
*stop* seeing the tuple as visible independent of the inserting xact
being in progress.

> I earlier thought that'd not be ok because there could be a new catalog
> snapshot inbetween, but I was mistaken: The locking on the source
> transaction prevents problems.

I thought this was the case.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Dean Rasheed
Date:
On 9 April 2015 at 22:18, Peter Geoghegan <pg@heroku.com> wrote:
> The big idea (the fine details of which Stephen appeared to be in
> tentative agreement with [1]) is that an UPSERT is a hybrid between an
> INSERT and an UPDATE, and not simply an INSERT and separate UPDATE
> tied together. So at the very least the exact behavior of such a
> hybrid cannot be assumed to be any particular way just from
> generalizing from known behaviors for INSERT and UPDATE (which is a
> usability issue, since the fine details of RLS are already complex
> enough without UPSERT).
>

I think that you're over-complicating this. From a usability point of
view, I think that the best approach is to keep this as simple as
possible and make the behaviour consistent with an INSERT and an
UPDATE tied together, as is suggested by the new syntax. The key point
is that, if you are using the RLS INSERT and UPDATE policies for this
new command, the rule should be that the user has permission to
insert/update a new/existing row if and only if they would have had
permission to do so using a stand-alone INSERT/UPDATE command.

On the other hand, if you believe that the behaviour should be
something other than that, then I think that it would need a new
dedicated kind of RLS policy for this command because, as I will
attempt to explain below, merging together the quals from INSERT and
UPDATE policies leads to logical problems.


> The INSERT policies are only enforced when a tuple is inserted
> because, when the alternative path isn't taken then it's really just
> an INSERT.
>

Agreed.


> For the UPDATE path, where the stickiness/hybridness begins...
> <snip>
> I'd appreciate it if you explicitly outlined what policies you feel
> should be enforce at each of the 3 junctures within an UPSERT (post
> INSERT, pre-UPDATE, post-UPDATE). I would also like you to be very
> explicit about whether or not RLS WITH CHECK policies and USING quals
> (presumably enforced as RLS WITH CHECK policies) from both INSERT and
> UPDATE policies should be enforced at each point. In particular,
> should I ever not treat RLS WCO and USING quals equivalently? (recall
> that we don't want to elide an UPDATE silently, which makes much less
> sense for UPSERT...I had assumed that whatever else, we'd always treat
> WCO and USING quals from UPDATE (and ALL) policies equivalently, but
> perhaps not).

OK, I'll try to explicitly outline how I think this ought to work:

For INSERTs and UPDATEs, there are 3 kinds of RLS quals that come into play:

1). insertCheckQuals - the logical OR of the quals from all INSERT
WITH CHECK policy clauses. These give the user permission to insert
into the table, provided that the new rows satisfy at least one of
these quals.

2). updateUsingQuals - the logical OR of the quals from all UPDATE
USING policy clauses. These give the user permission to update
existing rows in the table, where the existing rows satisfy at least
one of these quals.

3). updateCheckQuals - the logical OR of the quals from all UPDATE
WITH CHECK policy clauses. These give the user permission to update
the table, provided that the new rows satisfy at least one of these
quals.

In general these may all differ from one another.

If we go forward with the idea that RLS quals should be checked before
attempting to insert/update any data, as we do for regular permission
checks, then stand-alone INSERT and UPDATE commands would work
conceptually as follows:

INSERT: 1. Check NEW against insertCheckQuals (error if the result is not true) 2. Do the actual insert of NEW

UPDATE: 1. Check OLD against updateUsingQuals (skip rows that don't match) 2. Check NEW against updateCheckQuals (error
ifthe result is not true) 3. Do the actual update (OLD -> NEW)
 

Of course that's an over-simplification. The updateUsingQuals are
actually merged with the user-supplied WHERE clause in a way dependent
on the presence of leaky functions, but conceptually it matches the
above description.

I think that there is universal agreement that an INSERT .. ON
CONFLICT UPDATE that follows the insert path ought to match the pure
INSERT case, and only check the insertCheckQuals. That just leaves the
question of what to do on the update path, where things get more
complicated because there are 3 tuples involved in the process:

1). t1 - the tuple originally proposed for insertion, but which could
not be inserted due to a conflict with an existing row in the table
(aka EXCLUDED).

2). t2 - the existing row in the table that prevented t1 from being
inserted (aka TARGET).

3). t3 - the final new row resulting from the update path. In general,
we allow this to be quite different from t1 and t2.

If we think of INSERT .. ON CONFLICT UPDATE as an INSERT and an UPDATE
tied together, then logically the following would happen if the update
path were taken:

INSERT .. ON CONFLICT UPDATE (update path): 1. Check t1 against insertCheckQuals (error if not true) 2. Detect that t1
conflictswith t2 3. Test user-supplied auxiliary WHERE clause (skip if not true) 4. Check t2 against updateUsingQuals
(errorif not true) 5. Check t3 against updateCheckQuals (error if not true) 6. Do the actual update (t2 -> t3)
 

Step (4) is the only point where this differs from an INSERT and an
UPDATE tied together, and only in the fact that it throws an error
rather than skipping the row to be updated if the user doesn't have
permission to do so. The important point is that this is consistent
with the rule that it gives the user permission to insert/update a
new/existing row if and only if they would have been allowed to do so
using a stand-alone INSERT/UPDATE command.

Note that there are 3 participating tuples, and each is checked
against precisely one of the 3 relevant policies. That is important,
as I will attempt to explain.

As it stands, your patch classifies WCOs by command type and combines
them in a way that results in 4 additional RLS checks being performed
(not really in this order):
 5.1. Check t2 against insertCheckQuals (error if not true) 5.2. Check t2 against updateCheckQuals (error if not true)
5.3.Check t3 against insertCheckQuals (error if not true) 5.4. Check t3 against updateUsingQuals (error if not true)
 

Some of those additional checks really don't make any sense, e.g.,
(5.2) prevents you from updating an existing tuple if you wouldn't
have had permission to make an update that resulted in that tuple,
which just seems completely backwards.

But the real problem is that by applying multiple checks to the same
tuple, you're implicitly ANDing together the quals from different RLS
policy types and the problem with that is that it's possible for the
quals from different policy types to be completely incompatible (their
logical AND may be identically equivalent to false). So by doing these
additional checks it may be impossible for any tuple to ever satisfy
all the checks at the same time, making the command unusable. So the
general rule, as I alluded to above, should be that no 2 different
kinds of RLS quals should ever be applied to the same tuple.

The nub of the problem is that you're classifying WCOs by command
type, which is wrong because updateUsingQuals have different semantics
from updateCheckQuals, even though they both originate from the UPDATE
policy. If you factor in updatable SB views, which we might hope to
one day have support for INSERT .. ON CONFLICT UPDATE, there will be
additional WCOs arising from the view, which will have different
semantics again (the SQL spec says that they should apply after the
insert/update to ensure that the final result is visible in the view).
These are then another separate kind of WCO (that doesn't depend on
the command type). I think this will actually be quite
straightforward, as long as all the different kinds of WCO aren't
lumped together.

If you take a look at my patch to apply RLS checks before
insert/update, you'll see that I've added a WCOKind enum that allows
each of these kinds of WCOs to be treated differently and checked at
different stages in the executor. I anticipated that INSERT .. ON
CONFLICT UPDATE would extend that by adding a new kind of WCO for
updateUsingQuals checked as if they were WCOs, allowing them to be
checked at a specific point in the new code (step (4) above).

In all of this, I think we should try to keep things as simple as
possible, to give the end user a chance to understand it --- although
I'm not sure I've achieved that with my explanation :-)

Regards,
Dean



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-16 10:25:29 -0700, Peter Geoghegan wrote:
> On Thu, Apr 16, 2015 at 2:18 AM, Andres Freund <andres@anarazel.de> wrote:
> > If we go that way that's the way I think it should be done: Whenever we
> > encounter a speculative record we 'unlink' it from the changes that will
> > be reused for spooling from disk and do nothing further. Then we just
> > continue reading through the records.
>
> You mean we continue iterating through *changes* from ReorderBufferCommit()?

changes, records, not much of a difference here.

> > If the next thing we encounter is
> > a super-deletion we throw away that record. If it's another type of
> > change (or even bettter a 'speculative insertion succeeded' record)
> > insert it.
>
> By "insert it", I gather you mean report to the the logical decoding
> plugin as an insert change.

Yes.

> We'd have coordination between records originally decoded into
> changes, maybe "intercepting" them during xact reassembly, like my
> first approach. We'd mostly do the same thing as the first approach,
> actually. The major difference would be that there'd be explicit
> "speculative affirmation" WAL records. But we wouldn't rely on those
> affirmation records within ReorderBufferCommit() - we'd rely on the
> *absence* of a super deletion WAL record (to report an insert change
> to the decoding plugin). To emphasize, like my first approach, it
> would be based on an *absence* of a super deletion WAL record.

> However, like my second approach, there would be a "speculative
> affirmation" WAL record.

I think there should be one, but it's not required for the approach. The
'pending speculative insertion' can just be completed whenever there's a
insert/update/delete that's not a super deletion.

I.e. in the REORDER_BUFFER_CHANGE_INSERT/... case ReorderBufferCommit()
just add something like:

if (pending_insertion != NULL)
{   if (new_record_is_super_deletion)       ReorderBufferReturnTupleBuf(pending_insertion);   else
rb->apply_change(rb,txn, relation, pending_insertion);
 
}
...

You'll have to be careful to store the pending_insertion *after*
ReorderBufferToastReplace(), but that should not be a problem.


> Right now, I'm tired, so bear with me. What I think I'm not quite
> getting here is how the new type of "affirmation" record affects
> visibility (or anything else, actually). Clearly dirty snapshots need
> to see the record to set a speculative insertion token for their
> caller to wait on (and HeapTupleSatisfiesVacuum() cannot see the
> speculatively inserted tuple as reclaimable, of course). They need
> this *before* the speculative insertion is affirmed, of course.
>
> Maybe you mean: If the speculative insertion xact is in progress, then
> the tuple is visible to several types of snapshots (dirty, vacuum,
> self, any). If it is not, then tuples are not visible because they are
> only speculative (and not *confirmed*). If they were confirmed, and
> the xact was committed, then those tuples are logically and physically
> indistinguishable from tuples that were inserted in the ordinary
> manner.
>
> Is that it? Basically, the affirmation records have nothing much to do
> with logical decoding in particular. But you still need to super
> delete, so that several types of snapshots ((dirty, vacuum, self, any)
> *stop* seeing the tuple as visible independent of the inserting xact
> being in progress.

I have no idea what this has to do with the email you responded to?
Maybe it's more meant as a response to my separate email that I want the
HEAP_SPECULATIVE to be cleared?

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-16 09:43:54 -0700, Peter Geoghegan wrote:
> On Thu, Apr 16, 2015 at 2:23 AM, Andres Freund <andres@anarazel.de> wrote:
> > I'm, completely independent of logical decoding, of the *VERY* strong
> > opinion that 'speculative insertions' should never be visible when
> > looking with normal snapshots. For one it allows to simplify
> > considerations around wraparound (which has proven to be a very good
> > idea, c.f. multixacts + vacuum causing data corruption years after it
> > was thought to be harmless). For another it allows to reclaim/redefine
> > the bit after a database restart/upgrade. Given how complex this is and
> > how scarce flags are that seems like a really good property.
> >
> > And avoiding those flags to be visible to the outside requires a WAL
> > record, otherwise it won't be correct on the standby.
> 
> I'm a bit distracted here, and not sure exactly what you mean. What's
> a normal snapshot?

Normal visibility semantics, i.e. SnapshotMVCC, not SnapshotDirty.

> Do you just mean that you think that speculative insertions should be
> explicitly affirmed by a second record (making it not a speculative
> tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has
> no chance of seeing a tuple until it was affirmed by a second in-place
> modification, regardless of tuple xmin xact commit status?

Yes. I think

a) HEAP_SPECULATIVE should never be visible outside in a
committed transaction. That allows us to redefine what exactly the bit
means and such after a simple restart. On IM Heiki said he wants to
replace this by a bit in the item pointer, but I don't think that
changes things substantially.

b) t_ctid should not contain a speculative token in committed
(i.e. visible to other sessions using mvcc semantics) tuple. Right now
(i.e. master) t_ctid will point to itself for non-updated tuples. I
don't think it's good to have something in there that's not an actual
ctid in there. The number of places that look into t_ctid for
in-progress insertions of other sessions is smaller than the ones that
look at tuples in general.

c) Cleaning the flag/ctid after a completed speculative insertion makes
it far less likely that we end up waiting on a other backend when we
wouldn't have to. If a session inserts a large number of tuples
speculatively (surely *not* a unlikely thing in the long run) it gets
rather likely that tokens are reused. Which means if another backend
touches these in-progress records it's quite possible that the currently
acquired token is the same as the one on a tuple that has actually
finished inserting.

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Stephen Frost
Date:
Dean,

* Dean Rasheed (dean.a.rasheed@gmail.com) wrote:
> In all of this, I think we should try to keep things as simple as
> possible, to give the end user a chance to understand it --- although
> I'm not sure I've achieved that with my explanation :-)

Thanks a lot for this.  It goes along with my thinking also and matches,
I believe, what I had explained to Peter on our call.

Peter, please let me know if you agree.

Dean, I've been working through your patches over the past couple of
days (apologies for the lack of updates, just been busy) and hope to
push them very shortly (ie: by the end of the weekend).

One thing that I was hoping to discuss a bit is that I've gone ahead and
added another set of hooks, so we can have both "permissive" and
"restrictive" policies be provided from the hook.  It's a bit late to
make the grammar and other changes which would be required to add a
"restrictive" policy option to the built-in RLS, but adding the hooks is
relatively low-impact.

I'm also going to be including a test_rls_hooks module into
src/test/modules which will test those hooks and provide an example of
how to use them.

As for the discussion- there was some concern raised about extensions
being able to "override" built-in policies by using the hooks a certain
way.  I don't entirely follow the logic behind that concern as an
extension has the ability to read the files on disk directly from C
code, should it be written to do so, and so not providing a hook to add
"permissive" policies is denying useful functionality for very question
gain, in my view at least.

Thoughts?
Thanks!
    Stephen

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Fri, Apr 17, 2015 at 4:54 AM, Stephen Frost <sfrost@snowman.net> wrote:
> Dean, I've been working through your patches over the past couple of
> days (apologies for the lack of updates, just been busy) and hope to
> push them very shortly (ie: by the end of the weekend).
>
> One thing that I was hoping to discuss a bit is that I've gone ahead and
> added another set of hooks, so we can have both "permissive" and
> "restrictive" policies be provided from the hook.  It's a bit late to
> make the grammar and other changes which would be required to add a
> "restrictive" policy option to the built-in RLS, but adding the hooks is
> relatively low-impact.

I came up with something that is along the lines of what we discussed.
I'll wait for you to push Dean's code, and rebase on top of that
before submitting what I came up with. Maybe this can be rolled into
my next revision if I work through Andres' most recent feedback
without much delay.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Fri, Apr 17, 2015 at 1:38 AM, Andres Freund <andres@anarazel.de> wrote:
>> Do you just mean that you think that speculative insertions should be
>> explicitly affirmed by a second record (making it not a speculative
>> tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has
>> no chance of seeing a tuple until it was affirmed by a second in-place
>> modification, regardless of tuple xmin xact commit status?
>
> Yes. I think

Good.

> a) HEAP_SPECULATIVE should never be visible outside in a
> committed transaction. That allows us to redefine what exactly the bit
> means and such after a simple restart. On IM Heiki said he wants to
> replace this by a bit in the item pointer, but I don't think that
> changes things substantially.

I guess you envisage that HEAP_SPECULATIVE is an infomask2 bit? I
haven't been using one of those for a couple of revisions now,
preferring to use the offset MagicOffsetNumber to indicate a
speculative t_ctid value. I'm slightly surprised that Heikki now wants
to use an infomask2 bit (if that is what you mean), because he wanted
to preserve those by doing the MagicOffsetNumber thing. But I guess
we'd still be preserving the bit under this scheme (since it's always
okay to do something different with the bit after a restart).

Why is it useful to consume an infomask2 bit after all? Why did Heikki
change his mind - due to wanting representational redundancy? Or do I
have it all wrong?

> b) t_ctid should not contain a speculative token in committed
> (i.e. visible to other sessions using mvcc semantics) tuple. Right now
> (i.e. master) t_ctid will point to itself for non-updated tuples. I
> don't think it's good to have something in there that's not an actual
> ctid in there. The number of places that look into t_ctid for
> in-progress insertions of other sessions is smaller than the ones that
> look at tuples in general.

Right. So if a tuple is committed, it should not have set
HEAP_SPECULATIVE/have a t_ctid offset of MagicOffsetNumber. But a
non-ctid t_ctid (a speculative token) remains possible for
non-committed tuples visible to some types of snapshots (in
particular, dirty snapshots).

> c) Cleaning the flag/ctid after a completed speculative insertion makes
> it far less likely that we end up waiting on a other backend when we
> wouldn't have to. If a session inserts a large number of tuples
> speculatively (surely *not* a unlikely thing in the long run) it gets
> rather likely that tokens are reused. Which means if another backend
> touches these in-progress records it's quite possible that the currently
> acquired token is the same as the one on a tuple that has actually
> finished inserting.

It's more important than that, actually. If the inserter fails to
clear its tuple's speculative token, and then releases their token
lmgr lock, it will cause livelock with many upserting sessions.
Coordinating which other session gets to lazily clear the speculative
token (cleaning up after the original inserter) seemed quite hazardous
when I looked into it. Maybe you could fix it by interleaving buffer
locks and lmgr locks, but we can't do that.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 04/17/2015 09:02 PM, Peter Geoghegan wrote:
> I'm slightly surprised that Heikki now wants
> to use an infomask2 bit (if that is what you mean),

No, I don't.

> because he wanted to preserve those by doing the MagicOffsetNumber
> thing.

Yes, that's the way to go.

Glad we cleared that up :-).

- Heikki




Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Fri, Apr 17, 2015 at 11:19 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> because he wanted to preserve those by doing the MagicOffsetNumber
>> thing.
>
>
> Yes, that's the way to go.
>
> Glad we cleared that up :-).

Got it, thanks!


-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Fri, Apr 17, 2015 at 1:30 AM, Andres Freund <andres@anarazel.de> wrote:
>> However, like my second approach, there would be a "speculative
>> affirmation" WAL record.
>
> I think there should be one, but it's not required for the approach. The
> 'pending speculative insertion' can just be completed whenever there's a
> insert/update/delete that's not a super deletion.
>
> I.e. in the REORDER_BUFFER_CHANGE_INSERT/... case ReorderBufferCommit()
> just add something like:
>
> if (pending_insertion != NULL)
> {
>     if (new_record_is_super_deletion)
>         ReorderBufferReturnTupleBuf(pending_insertion);
>     else
>         rb->apply_change(rb, txn, relation, pending_insertion);
> }
> ...
>
> You'll have to be careful to store the pending_insertion *after*
> ReorderBufferToastReplace(), but that should not be a problem.

Okay. It seems like what you're saying is that I should be prepared to
have to deal with a REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change (or
multiple such changes) from within ReorderBufferCommit() between a
speculative insertion and a super deletion, but that I can safely
assume that once some other insert/update/delete is encountered (or
once all changes have been drained from the reorder buffer), I can
then apply the speculative insertion as a regular insertion.

Is that what you're saying, in a nutshell? IOW, when you said this:

"""
I earlier thought that'd not be ok because there could be a new catalog
snapshot inbetween, but I was mistaken: The locking on the source
transaction prevents problems.
"""

What you meant was that you'd decided that this pattern is not broken,
*provided* that the implementation simply account for the fact that a
REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change could come before some
other (non-REORDER_BUFFER_CHANGE_INTERNAL_DELETE, A.K.A.
non-superdelete) change came? And that we might need to be more
"patient" about deciding that a speculative insertion succeeded (more
"patient" than considering only one single non-superdelete change,
that can be anything else)?

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Dean Rasheed
Date:
On 17 April 2015 at 12:54, Stephen Frost <sfrost@snowman.net> wrote:
> Dean, I've been working through your patches over the past couple of
> days (apologies for the lack of updates, just been busy) and hope to
> push them very shortly (ie: by the end of the weekend).
>

Cool. Thanks.

> One thing that I was hoping to discuss a bit is that I've gone ahead and
> added another set of hooks, so we can have both "permissive" and
> "restrictive" policies be provided from the hook.  It's a bit late to
> make the grammar and other changes which would be required to add a
> "restrictive" policy option to the built-in RLS, but adding the hooks is
> relatively low-impact.
>

Sounds interesting. Perhaps that discussion should be moved to a new thread.

> I'm also going to be including a test_rls_hooks module into
> src/test/modules which will test those hooks and provide an example of
> how to use them.
>

Good idea. I had been thinking that it would be good to test RLS hooks.

> As for the discussion- there was some concern raised about extensions
> being able to "override" built-in policies by using the hooks a certain
> way.  I don't entirely follow the logic behind that concern as an
> extension has the ability to read the files on disk directly from C
> code, should it be written to do so, and so not providing a hook to add
> "permissive" policies is denying useful functionality for very question
> gain, in my view at least.
>
> Thoughts?
>

Yeah, perhaps that concern is somewhat overblown and shouldn't stand
in the way of allowing a hook to add permissive policies.

Regards,
Dean



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Fri, Apr 17, 2015 at 1:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
> What you meant was that you'd decided that this pattern is not broken,
> *provided* that the implementation simply account for the fact that a
> REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change could come before some
> other (non-REORDER_BUFFER_CHANGE_INTERNAL_DELETE, A.K.A.
> non-superdelete) change came? And that we might need to be more
> "patient" about deciding that a speculative insertion succeeded (more
> "patient" than considering only one single non-superdelete change,
> that can be anything else)?

Attached patch, V3.4, implements what I believe you and Heikki have in
mind here. There is a minimal, dedicated "affirmation" WAL record now.
Importantly, it is not possible for committed tuples to be speculative
(i.e. to have a magic value in their t_ctid field that indicates being
a speculative tuple - Andres felt very strongly that we ought to be
able to assume that committed tuples are not speculative). There are a
bunch of new assertions to make sure that this actually holds now, all
within tqual.c.

I found Dean's recent argument [1] for mostly following the existing
RLS behaviors convincing. I'm now I'm tracking what Dean called
insertCheckQuals, updateUsingQuals and updateCheckQuals separately
within the executor, while enforcing each (including updateUsingQuals)
in the manner of WCOs (i.e. there are no silent failures when
updateUsingQuals does not pass on a TARGET.* tuple, just as before -
there is a WCO-style error thrown instead). And, as Dean and Stephen
recently suggested, there is one and only one tuple (per ExecInsert()
call) involved in enforcement for each of these three quals (or these
three OR'd list of quals). These three tuples are the post-insert, the
pre-update, and the post-update tuples, for the insertCheckQuals,
updateUsingQuals and updateCheckQuals respectively.

The new implementation has extensive revised tests - the only sane way
to write something like ON CONFLICT UPDATE's RLS support is using
test-driven development, IMV.

There is one intentional difference between what I've done here, and
what I believe Dean wants: I don't bother checking the user-supplied
quals under any circumstances (so I don't "Test user-supplied
auxiliary WHERE clause (skip if not true)", as described in [1]). I
think we should *always* throw an error, and not silently skip the
UPDATE just because the user supplied quals also limits the UPDATE. I
don't want to introduce a complicated further distinction like that,
because it seems like a distinction without a difference. Adding this
behavior will not make something work that would not otherwise work -
it will make a failure to UPDATE silent, and nothing more, AFAICT.
That's much more subtle and much less necessary in the context of an
auxiliary UPDATE (compared to a regular UPDATE).

Other changes:

* Changed magic offset, per Heikki's request.

* RLS documentation updated, in line with new ON CONFLICT UPDATE
behavior (this made it simpler).

* Moved a bit of code from the IGNORE commit to the ON CONFLICT UPDATE
commit (this concerns new possibility of heap_lock_tuple()
HeapTupleInvisible return code introduced by ON CONFLICT UPDATE). It
clearly belongs there.

* RLS error messages advertise what type of command the violation
originated from, and if the WCO originated as a USING security barrier
qual (i.e. whether or not the target tuple that necessitates taking
the UPDATE path was where the violation occurred, or whether it
occurred on the post-update tuple intended to be appended to the
relation). ON CONFLICT UPDATE makes this distinction important, since
it may not be obvious which policy was violated (maybe this should go
as far as naming the policy directly - I'm waiting for Stephen to push
Dean's work, actually, because it will probably bitrot what I have
here). These additional diagnostics were very helpful when writing
those new RLS ON CONFLICT UPDATE tests, and will probably be helpful
generally.

Thoughts?

[1] http://www.postgresql.org/message-id/CAEZATCVDdYRFbF4NMXTF-NKYibbR2VSfNXRWPyyByaCpV1jwEw@mail.gmail.com
--
Peter Geoghegan

Attachment

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Sun, Apr 19, 2015 at 9:37 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Attached patch, V3.4, implements what I believe you and Heikki have in
> mind here.

Any plans to look at this, Svenne? You are signed up as a reviewer on
the commitfest app. A review of the documentation, and interactions
with other features like inheritance, updatable views and postgres_fdw
would be useful at this point. Obviously I've gone to a lot of effort
to document how things fit together at a high level on the UPSERT wiki
page, but these aspects have not been commented on all that much.

Thanks
-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-19 21:37:51 -0700, Peter Geoghegan wrote:
> Attached patch, V3.4, implements what I believe you and Heikki have in
> mind here.

I'm not 100% sure Heikki and I am on exactly the same page here :P

I'm looking at git diff $(git merge-base upstream/master HEAD).. where
HEAD is e1a5822d164db0.

* The logical stuff looks much saner.

* Please add tests for the logical decoding stuff. Probably both a plain regression and and isolationtester test in
contrib/test_decoding.Including one that does spooling to disk.
 

* I don't like REORDER_BUFFER_CHANGE_INTERNAL_INSERT/DELETE as names. Why not _SPECINSERT and _SPECDELETE or such?

* Iff we're going to have the XLOG_HEAP_AFFIRM record, I'd rather have that guide the logical decoding code. Seems
slightlycleaner.
 

* Still not a fan of the name 'arbiter' for the OC indexes.

* Gram.y needs a bit more discussion: * Can anybody see a problem with changing the precedence of DISTINCT &   ON to
nonassoc?Right now I don't see a problem given both are   reserved keywords already.   The reason the conflict exists
AFAICSis because something like   INSERT INTO foo SELECT DISTINCT ON CONFLICT IGNORE;   is allowed by the grammar. The
needfor the nonassoc could be   avoided by requiring DISTINCT to be followed by a column. We   currently *do* enforce
that,just not in the parser (c.f.   transformDistinctClause). That requires one more production in   simple_select, and
anonoptional distinct clause.   I've queued up a commit cleaning this up in my repo, feel free to   merge and polish. *
UpdateInsertStmtis a horrible name. OnConflictUpdateStmt maybe? * '(' index_params where_clause ')' is imo rather
strange.The where   clause is inside the parens? That's quite different from the   original index clause.
 
* SPEC_IGNORE,    /* INSERT of "ON CONFLICT IGNORE" */ looks like a wrongly copied comment.
* The indentation in RoerderBufferCommit is clearly getting out of hand, I've queued up a commit cleaning this up in my
repo,feel free to merge.
 
* I don't think we use the term 'ordinary table' in error messages so far.
* I still think it's unacceptable to redefine XLOG_HEAP_LAST_MULTI_INSERT as XLOG_HEAP_SPECULATIVE_TUPLE like you did.
I'lltry to find something better.
 
* I wonder if we now couldn't avoid changing heap_delete's API - we can always super delete if we find a speculative
insertionnow. It'd be nice not to break out of core callers if not necessary.
 
* breinbaas on IRC just mentioned that it'd be nice to have upsert as a link in the insert. Given that that's the
pervasiveterm that doesn't seem absurd.
 

I think this is getting closer to a commit. Let's get this done.

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-21 16:57:45 +0200, Andres Freund wrote:
> * I still think it's unacceptable to redefine
>   XLOG_HEAP_LAST_MULTI_INSERT as XLOG_HEAP_SPECULATIVE_TUPLE like you
>   did. I'll try to find something better.

I think we should "just" split this into different flag values for
insert/update/delete.

I.e. something like

/* flags for heap insert and multi insert */
#define XLH_INSERT_ALL_VISIBLE_CLEARED
#define XLH_INSERT_LAST_MULTI_INSERT
#define XLH_INSERT_IS_SPECULATIVE
#define XLH_INSERT_CONTAINS_NEW_TUPLE

/* flags for update */
#define XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED
#define XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED
#define XLH_UPDATE_CONTAINS_OLD_TUPLE
#define XLH_UPDATE_CONTAINS_OLD_KEY
#define XLH_UPDATE_CONTAINS_NEW_TUPLE
#define XLH_UPDATE_PREFIX_FROM_OLD
#define XLH_UPDATE_SUFFIX_FROM_OLD

/* flags for delete */
#define XLH_DELETE_ALL_VISIBLE_CLEARED
#define XLH_DELETE_CONTAINS_OLD_TUPLE
#define XLH_DELETE_CONTAINS_OLD_KEY

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Tue, Apr 21, 2015 at 7:57 AM, Andres Freund <andres@anarazel.de> wrote:
> I'm not 100% sure Heikki and I am on exactly the same page here :P
>
> I'm looking at git diff $(git merge-base upstream/master HEAD).. where
> HEAD is e1a5822d164db0.

Merged your stuff into my Github branch. Heikki is pushing changes
there directly now.

> * The logical stuff looks much saner.

Cool.

> * Please add tests for the logical decoding stuff. Probably both a plain
>   regression and and isolationtester test in
>   contrib/test_decoding. Including one that does spooling to disk.

Working on it...hope to push that to Github soon.

> * I don't like REORDER_BUFFER_CHANGE_INTERNAL_INSERT/DELETE as names. Why not
>   _SPECINSERT and _SPECDELETE or such?

Changed that on Github.

> * Iff we're going to have the XLOG_HEAP_AFFIRM record, I'd rather have
>   that guide the logical decoding code. Seems slightly cleaner.

I thought that you didn't think that would always work out? That in
some possible scenario it could break?

> * Still not a fan of the name 'arbiter' for the OC indexes.

What do you prefer? Seems to describe what we're talking about
reasonably well to me.

> * Gram.y needs a bit more discussion:
>   * Can anybody see a problem with changing the precedence of DISTINCT &
>     ON to nonassoc? Right now I don't see a problem given both are
>     reserved keywords already.

Why did you push code that solved this in a roundabout way, but
without actually reverting my original nonassoc changes (which would
now not result in shift/reduce conflicts?). What should we do about
that? Seems your unsure (otherwise you'd have reverted my thing). I
don't like that you seem to have regressed diagnostic output [1].
Surely it's simpler to use the nonassoc approach? I think this works
by giving the relevant keywords an explicit priority lower than '(',
so that a rule with ON CONFLICT '(' will shift rather than reducing a
conflicting rule (CONFLICT is an unreserved keyword). I wrote the code
so long ago that I can't really remember why I thought it was the
right thing, though.

>   * UpdateInsertStmt is a horrible name. OnConflictUpdateStmt maybe?

Done on Github.

>   * '(' index_params where_clause ')' is imo rather strange. The where
>     clause is inside the parens? That's quite different from the
>     original index clause.

I don't know. Maybe I was lazy about fixing shift/reduce conflicts.   :-)

I'll look at this some more.

> * SPEC_IGNORE,  /* INSERT of "ON CONFLICT IGNORE" */ looks like
>   a wrongly copied comment.

Not sure what you mean here. Please clarify.

> * The indentation in RoerderBufferCommit is clearly getting out of hand,
>   I've queued up a commit cleaning this up in my repo, feel free to merge.

Done on Github.

> * I don't think we use the term 'ordinary table' in error messages so
>   far.

Fixed on Github.

> * I still think it's unacceptable to redefine
>   XLOG_HEAP_LAST_MULTI_INSERT as XLOG_HEAP_SPECULATIVE_TUPLE like you
>   did. I'll try to find something better.

I did what you suggested in your follow-up e-mail (changes are on Github [2]).

> * I wonder if we now couldn't avoid changing heap_delete's API - we can
>   always super delete if we find a speculative insertion now. It'd be
>   nice not to break out of core callers if not necessary.

Maybe, but if there is a useful way to break out only a small subset
of heap_delete(), I'm not seeing it. Most of the callers that need a
new NULL argument are heap_insert() callers, actually. There are now 3
heap_delete() callers, up from 2.

> * breinbaas on IRC just mentioned that it'd be nice to have upsert as a
>   link in the insert. Given that that's the pervasive term that doesn't
>   seem absurd.

Not sure what you mean. Where would the link appear? It is kind of
hard to categorize that text so that we're strictly either talking
about INSERT or UPSERT. Might be possible, though.

> I think this is getting closer to a commit. Let's get this done.

Great!

The blockers to committing the IGNORE patch I see are:

* We need to tweak some of the logical decoding stuff a bit more, I
feel. Firm up on the details of whether we treat a confirmation record
as a logical decoding change that is involved in the new dance during
transaction reassembly.

* We need to sort out those issues with the grammar, since that only
really applies to the inference specification. Maybe the WHERE clause
that the inference specification accepts can be broken out. No ON
CONFLICT UPDATE specific issues left there, AFAICT though.

That really seems totally doable in just a couple of days.

The blockers for committing the ON CONFLICT UPDATE patch are larger, but doable:

* We need to figure out the tuple lock strength details. I think this
is doable, but it is the greatest challenge to committing ON CONFLICT
UPDATE at this point. Andres feels that we should require no greater
lock strength than an equivalent UPDATE. I suggest we get to this
after committing the IGNORE variant. We probably need to discuss this
some more.

* At the very least, I need to rebase my RLS patch onto what Stephen
just pushed (Dean's work). It would be nice to get Stephen and/or Dean
to review those aspects, as subject matter experts.

* It would also be nice to get someone like Tom to take a quick look
at what I'm doing in the optimizer, and in particular the rewriter.
Maybe he'd feel that the normalization that I'm doing (the
ExcludedExpr stuff) would be better suited to the optimizer. This
doesn't have to be a blocker to commit - it's just a suggestion.

Do you agree with my assessment? Did I miss a blocker? I'd like to
hear what Heikki has to say here too, now that he is pushing code to
Github. What needs more work before we can commit 1) ON CONFLICT
IGNORE, and 2) ON CONFLICT UPDATE?

Thanks

[1] https://github.com/petergeoghegan/postgres/commit/75d96c23676fd91568e9ec638470250c8b5e5f1b
[2] https://github.com/petergeoghegan/postgres/commit/0cfde636bf1ffca438418fa0c02043805e99f30f
-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Wed, Apr 22, 2015 at 3:23 PM, Peter Geoghegan <pg@heroku.com> wrote:
> * We need to sort out those issues with the grammar, since that only
> really applies to the inference specification. Maybe the WHERE clause
> that the inference specification accepts can be broken out. No ON
> CONFLICT UPDATE specific issues left there, AFAICT though.

I pushed some code that deals with the predicate being within parenthesis:

https://github.com/petergeoghegan/postgres/commit/358854645279523310f998dfc9cb3fe3e165ce1e

(it now follows the attributes/expressions indexes, in the style of
CREATE INDEX).

We still need to reconcile these changes to the grammar with your own,
Andres. I'm going to wait to hear back on what you think about that.
Note that this removal:

-%nonassoc DISTINCT
-%nonassoc ON

was incidental to the commit (this is the code you could have removed
when you modified the grammar, adding a new production, but didn't).
-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-22 15:23:16 -0700, Peter Geoghegan wrote:
> On Tue, Apr 21, 2015 at 7:57 AM, Andres Freund <andres@anarazel.de> wrote:
> > * Iff we're going to have the XLOG_HEAP_AFFIRM record, I'd rather have
> >   that guide the logical decoding code. Seems slightly cleaner.
> 
> I thought that you didn't think that would always work out? That in
> some possible scenario it could break?

I don't think there's a real problem. You obviously have to do it right
(i.e. only abort insertion if there's a insert/update/delete or commit).

Speaking of commits: Without having rechecked: I think you're missing
cleanup of th pending insertion on commit.

> > * Gram.y needs a bit more discussion:
> >   * Can anybody see a problem with changing the precedence of DISTINCT &
> >     ON to nonassoc? Right now I don't see a problem given both are
> >     reserved keywords already.
> 
> Why did you push code that solved this in a roundabout way, but
> without actually reverting my original nonassoc changes (which would
> now not result in shift/reduce conflicts?).

I pushed the changes to a separate repo so you could polish them ;)

> What should we do about that?

I'm prety sure we should not do the %nonassoc stuff.

> I don't like that you seem to have regressed
> diagnostic output [1].

Meh^5. This is the same type of errors we get in literally hundreds of
other syntax errors. And in contrast to the old error it'll actually
have a correct error possition.

> Surely it's simpler to use the nonassoc approach?

I think it's much harder to forsee all consequences of that.

> > * SPEC_IGNORE,  /* INSERT of "ON CONFLICT IGNORE" */ looks like
> >   a wrongly copied comment.
> 
> Not sure what you mean here. Please clarify.

I'm not sure either ;)

> > * I wonder if we now couldn't avoid changing heap_delete's API - we can
> >   always super delete if we find a speculative insertion now. It'd be
> >   nice not to break out of core callers if not necessary.
> 
> Maybe, but if there is a useful way to break out only a small subset
> of heap_delete(), I'm not seeing it.

I think you misread my statement: I'm saying we don't need the new
argument anymore, even if we still do the super-deletion in
heap_delete(). Now that the speculative insertion will not be visible
(as in seen on a tuple they could delete) to other backends we can just
do the super deletion if we see that the tuple is a promise one.

> > * breinbaas on IRC just mentioned that it'd be nice to have upsert as a
> >   link in the insert. Given that that's the pervasive term that doesn't
> >   seem absurd.
> 
> Not sure what you mean. Where would the link appear?

The index, i.e. it'd just be another indexterm. It seems good to give
people a link.

> * We need to figure out the tuple lock strength details. I think this
> is doable, but it is the greatest challenge to committing ON CONFLICT
> UPDATE at this point. Andres feels that we should require no greater
> lock strength than an equivalent UPDATE. I suggest we get to this
> after committing the IGNORE variant. We probably need to discuss this
> some more.

I want to see a clear way forward before we commit parts. It doesn't
have to be completely iron-clad, but the way forward should be pretty
clear. What's the problem you're seeing right now making this work? A
couple days on jabber you seemed to see a way doing this?

The reason I think this has to use the appropriate lock level is that
it'll otherwise re-introduce the deadlocks that fk locks resolved. Given
how much pain we endured to get fk locks, that seems like a bad deal.

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-22 16:40:07 -0700, Peter Geoghegan wrote:
> On Wed, Apr 22, 2015 at 3:23 PM, Peter Geoghegan <pg@heroku.com> wrote:
> > * We need to sort out those issues with the grammar, since that only
> > really applies to the inference specification. Maybe the WHERE clause
> > that the inference specification accepts can be broken out. No ON
> > CONFLICT UPDATE specific issues left there, AFAICT though.
>
> I pushed some code that deals with the predicate being within parenthesis:
>
> https://github.com/petergeoghegan/postgres/commit/358854645279523310f998dfc9cb3fe3e165ce1e

And the way you've used nonassoc here doesn't look correct. You're
hiding legitimate ambiguities in the grammar. UPDATE is a unreserved
keyword, so for

... ON CONFLICT '(' index_params ')' where_clause OnConflictUpdateStmt

it won't be able to discern whether an UPDATE in the WHERE clause is
part of the where_clause or OnConflictUpdate.

This is legal:
SELECT * FROM (SELECT true as update) f WHERE update;
i.e. 'update' can be the last part of a WHERE clause.

Essentially what you're trying to do with the nonassic is hiding that
UPDATE and IGNORE need to be reserved keywords with the syntax you're
proposing. We can either make them reserved or change the syntax.

One way to avoid making them reserved keywords - which would be somewhat
painful - is to add a 'DO' before the IGNORE/UPDATE. I.e. something like
 ON CONFLICT opt_conflict_expr DO OnConflictUpdateStmt
| ON CONFLICT opt_conflict_expr DO IGNORE

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Geoff Winkless
Date:
<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Apologies for butting
inbut can I (as a user) express a preference as a user against DO? </div><div class="gmail_default"
style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default"
style="font-family:verdana,sans-serif;font-size:small">Firstly,it looks horrible. And what's to stop me having "SELECT
trueAS do" in the where clause (as per your UPDATE objection)?</div><div class="gmail_default"
style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default"
style="font-family:verdana,sans-serif;font-size:small">Shouldn'tUPDATE be a reserved keyword anyway? AIUI ANSI suggests
so.</div><divclass="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><div
class="gmail_default"style="style"><font face="verdana, sans-serif"><a
href="http://developer.mimer.se/validator/sql-reserved-words.tml">http://developer.mimer.se/validator/sql-reserved-words.tml</a></font><br
/></div><divclass="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><div
class="gmail_default"style="font-family:verdana,sans-serif;font-size:small">I had always assumed it was; anyone who
produceda query for me that contained update in an unusual context would get slapped heavily.</div><div
class="gmail_default"style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default"
style="font-family:verdana,sans-serif;font-size:small">Geoff</div></div><divclass="gmail_extra"><br /><div
class="gmail_quote">On23 April 2015 at 11:54, Andres Freund <span dir="ltr"><<a href="mailto:andres@anarazel.de"
target="_blank">andres@anarazel.de</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex">On 2015-04-22 16:40:07 -0700, Peter Geoghegan wrote:<br /> > On
Wed,Apr 22, 2015 at 3:23 PM, Peter Geoghegan <<a href="mailto:pg@heroku.com">pg@heroku.com</a>> wrote:<br /> >
>* We need to sort out those issues with the grammar, since that only<br /> > > really applies to the
inferencespecification. Maybe the WHERE clause<br /> > > that the inference specification accepts can be broken
out.No ON<br /> > > CONFLICT UPDATE specific issues left there, AFAICT though.<br /> ><br /> > I pushed
somecode that deals with the predicate being within parenthesis:<br /> ><br /> > <a
href="https://github.com/petergeoghegan/postgres/commit/358854645279523310f998dfc9cb3fe3e165ce1e"
target="_blank">https://github.com/petergeoghegan/postgres/commit/358854645279523310f998dfc9cb3fe3e165ce1e</a><br/><br
/>And the way you've used nonassoc here doesn't look correct. You're<br /> hiding legitimate ambiguities in the
grammar.UPDATE is a unreserved<br /> keyword, so for<br /><br /> ... ON CONFLICT '(' index_params ')' where_clause
OnConflictUpdateStmt<br/><br /> it won't be able to discern whether an UPDATE in the WHERE clause is<br /> part of the
where_clauseor OnConflictUpdate.<br /><br /> This is legal:<br /> SELECT * FROM (SELECT true as update) f WHERE
update;<br/> i.e. 'update' can be the last part of a WHERE clause.<br /><br /> Essentially what you're trying to do
withthe nonassic is hiding that<br /> UPDATE and IGNORE need to be reserved keywords with the syntax you're<br />
proposing.We can either make them reserved or change the syntax.<br /><br /> One way to avoid making them reserved
keywords- which would be somewhat<br /> painful - is to add a 'DO' before the IGNORE/UPDATE. I.e. something like<br
/><br/>   ON CONFLICT opt_conflict_expr DO OnConflictUpdateStmt<br /> | ON CONFLICT opt_conflict_expr DO IGNORE<br
/><br/> Greetings,<br /><br /> Andres Freund<br /><span class="HOEnZb"><font color="#888888"><br /><br /> --<br /> Sent
viapgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> To
makechanges to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></font></span></blockquote></div><br /></div> 

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Petr Jelinek
Date:
On 23/04/15 14:34, Geoff Winkless wrote:
> Apologies for butting in but can I (as a user) express a preference as a
> user against DO?
>
> Firstly, it looks horrible. And what's to stop me having "SELECT true AS
> do" in the where clause (as per your UPDATE objection)?
>

DO is already reserved keyword. There is also some precedence for this 
in CREATE RULE. But I agree that it's not ideal syntax.

> Shouldn't UPDATE be a reserved keyword anyway? AIUI ANSI suggests so.
>
> http://developer.mimer.se/validator/sql-reserved-words.tml
>
> I had always assumed it was; anyone who produced a query for me that
> contained update in an unusual context would get slapped heavily.

Postgres currently has UPDATE as unreserved keyword and more importantly 
IGNORE is not keyword at all so making it a new reserved keyword is not 
nice at all.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On April 23, 2015 3:34:07 PM GMT+03:00, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
>Apologies for butting in but can I (as a user) express a preference as
>a
>user against DO?

Sure. If you propose an alternative ;)

>Firstly, it looks horrible. And what's to stop me having "SELECT true
>AS
>do" in the where clause (as per your UPDATE objection)?

A syntax error. DO is a reserved keyword. Update is just unreserved (and thus can be used as a column label). Ignore is
unreservedwith the patch and was unreserved before.  We obviously can make both reserved, but of so we have to do it
forreal, not by hiding the conflicts 
 

>Shouldn't UPDATE be a reserved keyword anyway? AIUI ANSI suggests so.
>
>http://developer.mimer.se/validator/sql-reserved-words.tml

It's not one right now. And ignore isn't a keyword at all atm.

(Please don't top post)

Andres


--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Geoff Winkless
Date:
On 23 April 2015 at 13:51, Andres Freund <andres@anarazel.de> wrote:
On April 23, 2015 3:34:07 PM GMT+03:00, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
​>​
 And what's to stop me having "SELECT true
AS
>do" in the where clause (as per your UPDATE objection)?

A syntax error. DO is a reserved keyword. Update is just unreserved (and thus can be used as a column label). Ignore is unreserved with the patch and was unreserved before.  We obviously can make both reserved, but of so we have to do it for real, not by hiding the conflicts

Sorry, I misunderstood: so it's not the fact that it can't be used as a column label (because it can) but the fact that it can't then be referenced within a WHERE clause without quoting
. Which is in itself utterly horrible, but that's a separate argument and I can at least now understand your point.​

So I could end up with

INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE update UPDATE update=1

but I would have to do

INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE "do" UPDATE "do"=1

?

>Apologies for butting in but can I (as a user) express a preference as
>a
​ 
user against DO?

Sure. If you propose an alternative ;)

​Maybe I'm misreading it, but isn't index_predicate meant to be inside the brackets?


certainly states that.

It's not one right now. And ignore isn't a keyword at all atm.

​As I said, it's my personal belief that anyone using keywords (of any kind) unquoted deserves what they get, but I see your point.​
 

I think I object to the fact that you're talking about adding extra syntactic sugar to work around a parser-implementation problem, not an actual syntax problem (since UPDATE SET is unambiguous, isn't it?).

(Please don't top post)

Mea culpa. I blame google :)​

FWIW "DO IGNORE" just reads disgustingly. If you do finally go down the DO route, perhaps "DO NOTHING"? :)
 
Geoff

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-23 14:34:02 +0100, Geoff Winkless wrote:
> > A syntax error. DO is a reserved keyword. Update is just unreserved (and
> > thus can be used as a column label). Ignore is unreserved with the patch
> > and was unreserved before.  We obviously can make both reserved, but of so
> > we have to do it for real, not by hiding the conflicts
> >
> 
> Sorry, I misunderstood: so it's not the fact that it can't be used as a
> column label (because it can) but the fact that it can't then be referenced
> within a WHERE clause without quoting

Meh. You can use any keyword in quotes - because then they're not
keywords anymore.


> INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE
> update UPDATE update=1
> 
> but I would have to do
> 
> INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE
> "do" UPDATE "do"=1

Yes.

> ​Maybe I'm misreading it, but isn't index_predicate meant to be inside the
> brackets?
> 
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html

That has changed since. And for good reason: It's pretty to have the
WHERE clause inside the brackets when that's not the case for CREATE
INDEX. But more importantly with multiple columns for stuff like ON
CONFLICT (a, b WHERE foo) it's unclear where the WHERE is actually
attached to. We have that problem with aggregates and it repeatedly
caused confusion.

> ​As I said, it's my personal belief that anyone using keywords (of any
> kind) unquoted deserves what they get, but I see your point.​

Given that IGNORE isn't even a keyword right now (9.5 master) that
policy isn't particularly meaningful anyway.

> I think I object to the fact that you're talking about adding extra
> syntactic sugar to work around a parser-implementation problem, not an
> actual syntax problem (since UPDATE SET is unambiguous, isn't it?).

I fail to see the point of such an objection. We have an LALR parser
(generated by bison). That implies a certain expressiveness. You're
suggesting that we change to a different kind of parser?

I don't think it's necessarily unambiguous. I'm not particularly
motivated to prove it though - the point is that we rely on bison to
prevent ambiguities. That only works if we're listening. And not if
we're silencing warnings about ambiguities over the whole grammar.

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Geoff Winkless
Date:
On 23 April 2015 at 14:50, Andres Freund <andres@anarazel.de> wrote:
> ​Maybe I'm misreading it, but isn't index_predicate meant to be inside the
> brackets?
>
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html

That has changed since. 

​Oh, helpful. :)​
 
​I'll shut up. I have a feeling that my objection is really with the very idea of unreserved keywords and I have a feeling that there will be rather more people shouting me down if I go off on that particular rant; meanwhile it's 20 years since I used yacc in earnest and it's too hazy to be able to argue about what it is or isn't capable of.

When I set out I was really only hoping to express a preference as a user; on balance I would really rather not have DO IGNORE, if it were possible to avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just about cope with (and means you don't need to add IGNORE as a keyword, win!), although it still mildly pains me that there's an additional unnecessary word. 

But I certainly don't object enough to hold up you guys doing the actual work for my benefit (among others, obviously!).

G

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-23 15:52:40 +0100, Geoff Winkless wrote:
> When I set out I was really only hoping to express a preference as a user;
> on balance I would really rather not have DO IGNORE, if it were possible to
> avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just
> about cope with (and means you don't need to add IGNORE as a keyword,
> win!), although it still mildly pains me that there's an additional
> unnecessary word.

Yea, DO NOTHING is a good alternative. And I do like we're adding one
keyword less (which is also good for the parser's
size/performance).

DO {UPDATE ... | NOTHING | LOCK} doesn't sound too bad to me (yes, LOCK
doesn't exist yet, except by writing UPDATE .. WHERE false ;)).

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Bruce Momjian
Date:
On Thu, Apr 23, 2015 at 05:02:19PM +0200, Andres Freund wrote:
> On 2015-04-23 15:52:40 +0100, Geoff Winkless wrote:
> > When I set out I was really only hoping to express a preference as a user;
> > on balance I would really rather not have DO IGNORE, if it were possible to
> > avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just
> > about cope with (and means you don't need to add IGNORE as a keyword,
> > win!), although it still mildly pains me that there's an additional
> > unnecessary word.
> 
> Yea, DO NOTHING is a good alternative. And I do like we're adding one
> keyword less (which is also good for the parser's
> size/performance).

No question that DO IGNORE sounds awkward.  DO NOTHING also matches
CREATE RULE --- another plus.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 04/20/2015 07:37 AM, Peter Geoghegan wrote:
>
>             if (wco->commandType == CMD_INSERT)
>                 command = "INSERT-applicable ";
>             else if (wco->commandType == CMD_UPDATE)
>                 command = "UPDATE-applicable ";
>             else if (wco->commandType == CMD_DELETE)
>                 command = "DELETE-applicable ";
>             else if (wco->commandType == CMD_SELECT)
>                 command = "SELECT-applicable ";
>
>             ereport(ERROR,
>                     (errcode(ERRCODE_WITH_CHECK_OPTION_VIOLATION),
>                  errmsg("new row violates %sWITH CHECK OPTION %sfor \"%s\"",
>                         command ? command : "",
>                         wco->secBarrier ? "(originally security barrier) ":"",
>                         wco->viewname),
>                     val_desc ? errdetail("Failing row contains %s.", val_desc) :
>                                0));

That code in ExecWithCheckOptions is not translatable. See style guide: 
http://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES

- Heikki




Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Apr 23, 2015 at 9:02 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> That code in ExecWithCheckOptions is not translatable. See style guide:
> http://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES

It's probably going to need to change when I rebase on top of
Dean's/Stephen's work, anyway.


-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund <andres@anarazel.de> wrote:
> I think you misread my statement: I'm saying we don't need the new
> argument anymore, even if we still do the super-deletion in
> heap_delete(). Now that the speculative insertion will not be visible
> (as in seen on a tuple they could delete) to other backends we can just
> do the super deletion if we see that the tuple is a promise one.

I disagree. I think it's appropriate that the one and only "super"
heap_delete() caller make a point of indicating that that's what it's
doing. The cost is only that the two other such callers must say that
they're not doing it. Super deletion is a special thing, that logical
decoding knows all about for example, and I think it's appropriate
that callers ask explicitly.

>> > * breinbaas on IRC just mentioned that it'd be nice to have upsert as a
>> >   link in the insert. Given that that's the pervasive term that doesn't
>> >   seem absurd.
>>
>> Not sure what you mean. Where would the link appear?
>
> The index, i.e. it'd just be another indexterm. It seems good to give
> people a link.

Oh, okay. Sure. Done on Github.

>> * We need to figure out the tuple lock strength details. I think this
>> is doable, but it is the greatest challenge to committing ON CONFLICT
>> UPDATE at this point. Andres feels that we should require no greater
>> lock strength than an equivalent UPDATE. I suggest we get to this
>> after committing the IGNORE variant. We probably need to discuss this
>> some more.
>
> I want to see a clear way forward before we commit parts. It doesn't
> have to be completely iron-clad, but the way forward should be pretty
> clear. What's the problem you're seeing right now making this work? A
> couple days on jabber you seemed to see a way doing this?

I was really just identifying it as the biggest problem the patch now
faces, and I want to face those issues down ASAP. Of course, that's
good, because as you say it is a small problem in an absolute sense.
The second most significant open item is rebasing on top of the recent
RLS changes, IMV.

I can see yourself and Heikki continuing to change small stylistic
things, which I expect to continue for a little while. That's fine,
but naturally I want to be aggressive about closing off these open
issues that are not just general clean-up, but have some small level
of risk of becoming more significant blockers.

> The reason I think this has to use the appropriate lock level is that
> it'll otherwise re-introduce the deadlocks that fk locks resolved. Given
> how much pain we endured to get fk locks, that seems like a bad deal.

Right.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-23 12:45:59 -0700, Peter Geoghegan wrote:
> On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund <andres@anarazel.de> wrote:
> > I think you misread my statement: I'm saying we don't need the new
> > argument anymore, even if we still do the super-deletion in
> > heap_delete(). Now that the speculative insertion will not be visible
> > (as in seen on a tuple they could delete) to other backends we can just
> > do the super deletion if we see that the tuple is a promise one.
> 
> I disagree. I think it's appropriate that the one and only "super"
> heap_delete() caller make a point of indicating that that's what it's
> doing. The cost is only that the two other such callers must say that
> they're not doing it. Super deletion is a special thing, that logical
> decoding knows all about for example, and I think it's appropriate
> that callers ask explicitly.

Unconvinced. Not breaking an API has its worth.

> The second most significant open item is rebasing on top of the recent
> RLS changes, IMV.

Not sure I agree. That part seems pretty mechanical to me.

* The docs aren't suitable for endusers. I think this will take a fair amount of work.

* We're not yet sure about the syntax yet. In addition to the keyword issue I'm unconvinced about having two WHERE
clausesin the same statement. I think that'll end up confusing users a fair bit. Might still be the best way forward.
 
* The executor integration still isn't pretty, although Heikki is making strides there
* I don't think anybody seriously has looked at the planner side yet.

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Apr 23, 2015 at 12:53 PM, Andres Freund <andres@anarazel.de> wrote:
> Unconvinced. Not breaking an API has its worth.

Yeah, and I acknowledge that - but it's not something that it's
appropriate to encapsulate IMV.

Let's just leave it to Heikki...I'd say he has the deciding vote,
especially as the reviewer that is more in charge of the executor
stuff than anything else.

>> The second most significant open item is rebasing on top of the recent
>> RLS changes, IMV.
>
> Not sure I agree. That part seems pretty mechanical to me.

Hopefully so.

> * The docs aren't suitable for endusers. I think this will take a fair
>   amount of work.

It's hard to explain why something like the collation field in the
inference specification matters to users...because it's only supported
at all due to forwards compatibility concerns. It's hard to document
certain things like that in an accessible way. In general, much of the
effort of the last year was making the feature play nice, and
considering a bunch of usability edge cases that are unlikely to come
up, but still must be documented.

> * We're not yet sure about the syntax yet. In addition to the keyword
>   issue I'm unconvinced about having two WHERE clauses in the same
>   statement. I think that'll end up confusing users a fair bit. Might
>   still be the best way forward.

My previous approach to allowing an index predicate did at least
clearly show that the predicate belonged to the inference
specification, since it appeared between parenthesis. Maybe we should
do that after all? I don't think it much matters if the inference
specification is not identical to CREATE INDEX. I don't want to give
up inference of partial indexes, and I don't want to give up allowing
the UPDATE to have a limited WHERE clause, and we can't just spell
those differently here. So what alternative does that leave?

Anyone else have an opinion?

> * The executor integration still isn't pretty, although Heikki is making
>   strides there

That's just clean-up, though. I'm not worried about the risk of Heikki
not succeeding at that.

> * I don't think anybody seriously has looked at the planner side yet.

Good point. That certainly needs to be looked at (and I include the
rewriter parts in that). It's really not that much code, but (ideally)
a subject matter expert should look into the whole ExcludedExpr dance
in particular.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 04/23/2015 10:53 PM, Andres Freund wrote:
> On 2015-04-23 12:45:59 -0700, Peter Geoghegan wrote:
>> On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund <andres@anarazel.de> wrote:
>>> I think you misread my statement: I'm saying we don't need the new
>>> argument anymore, even if we still do the super-deletion in
>>> heap_delete(). Now that the speculative insertion will not be visible
>>> (as in seen on a tuple they could delete) to other backends we can just
>>> do the super deletion if we see that the tuple is a promise one.
>>
>> I disagree. I think it's appropriate that the one and only "super"
>> heap_delete() caller make a point of indicating that that's what it's
>> doing. The cost is only that the two other such callers must say that
>> they're not doing it. Super deletion is a special thing, that logical
>> decoding knows all about for example, and I think it's appropriate
>> that callers ask explicitly.
>
> Unconvinced. Not breaking an API has its worth.

The heapam API is not that stable, we've added arguments to those 
functions every once in a while, and I don't recall any complaints. So 
I'm with Peter that super-deletion should be requested explicitly by the 
caller.

That said, I'd actually like to see a separate heap_super_delete() 
function for that, rather than piggybacking on heap_delete(). It's a 
quite different operation. There'll be some duplication, but seems 
better than a maze of if-else's in heap_delete.

- Heikki




Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Andres Freund
Date:
On 2015-04-23 23:08:34 +0300, Heikki Linnakangas wrote:
> The heapam API is not that stable, we've added arguments to those functions
> every once in a while, and I don't recall any complaints.

I heard some, but not too many, that's true. I know that I've written
code that'd be broken/needed even more ifdefs ;)

> That said, I'd actually like to see a separate heap_super_delete() function
> for that, rather than piggybacking on heap_delete(). It's a quite different
> operation. There'll be some duplication, but seems better than a maze of
> if-else's in heap_delete.

+many. I've requested that changes a couple times now.

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Apr 23, 2015 at 1:08 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> That said, I'd actually like to see a separate heap_super_delete() function
> for that, rather than piggybacking on heap_delete(). It's a quite different
> operation. There'll be some duplication, but seems better than a maze of
> if-else's in heap_delete.

I found a bug that seems to be down to commit
e3144183562d08e347f832f0b29daefe8bac617b on Github:

"""
commit e3144183562d08e347f832f0b29daefe8bac617b
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   Thu Apr 23 18:38:11 2015 +0300

    Minor cleanup of check_exclusion_or_unique_constraint.

    To improve readability.

"""

At least, that's what a "git bisect" session showed.

Basically, at high client counts (128 clients only), after a few
iterations of the B-Tree test, the latest version of jjanes_upsert
would see this happen (error originates fromExecOnConflictUpdate(),
with custom instrumentation added to identify invisible tuple):

2015-04-23 15:10:48 PDT [ txid:  0 ]: LOG:  database system was shut
down at 2015-04-23 15:10:09 PDT
2015-04-23 15:10:48 PDT [ txid:  0 ]: LOG:  database system is ready
to accept connections
2015-04-23 15:12:55 PDT [ txid:  472418 ]: ERROR:  attempted to lock
invisible tuple (140,39)
2015-04-23 15:12:55 PDT [ txid:  472418 ]: STATEMENT:  insert into
upsert_race_test (index, filler, count) values ('3896',
random_characters(), '1') on conflict (index)
             update set count=TARGET.count + EXCLUDED.count, filler =
EXCLUDED.filler
             where TARGET.index = EXCLUDED.index
             returning count

This seemed to only show up when fsync = on on my laptop, whereas in
the past some race conditions that I've found were easier to recreate
with fsync = off.

I attach some notes from my bisect/debugging session, including
pg_xlogdump output (from a psql session - I like to manipulate
pg_xlogdump output using SQL). That's probably not all that
interesting, but I attach it all the same. Hopefully this is something
that Heikki can easily debug, since the commit implicated here only
concerned readability. A simple oversight? If you want, Heikki, I can
try and debug it, but it seems like something we're better off having
you sign off on if possible.

Thanks
--
Peter Geoghegan

Attachment

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Apr 23, 2015 at 12:45 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> * We need to figure out the tuple lock strength details. I think this
>>> is doable, but it is the greatest challenge to committing ON CONFLICT
>>> UPDATE at this point. Andres feels that we should require no greater
>>> lock strength than an equivalent UPDATE. I suggest we get to this
>>> after committing the IGNORE variant. We probably need to discuss this
>>> some more.
>>
>> I want to see a clear way forward before we commit parts. It doesn't
>> have to be completely iron-clad, but the way forward should be pretty
>> clear. What's the problem you're seeing right now making this work? A
>> couple days on jabber you seemed to see a way doing this?
>
> I was really just identifying it as the biggest problem the patch now
> faces, and I want to face those issues down ASAP. Of course, that's
> good, because as you say it is a small problem in an absolute sense.
> The second most significant open item is rebasing on top of the recent
> RLS changes, IMV.


OK, I pushed out code to do this a while ago. I must admit that I
probably significantly over-estimated the difficulty of doing this.

With Heikki's problematic commit reverted this works fine  (I have not
actually reverted it myself...I'll leave it to Heikki to clean that up
when he gets around to it). The usually jjanes_upsert stress tests
show up no problems.

Curious about what you think about my proposal to go back to putting
the inference specification WHERE clause within the parenthesis, since
this solves several problems, including clarifying to users that the
predicate is part of the inference clause.
-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Heikki Linnakangas
Date:
On 04/24/2015 02:52 AM, Peter Geoghegan wrote:
> I found a bug that seems to be down to commit
> e3144183562d08e347f832f0b29daefe8bac617b on Github:
>
> """
> commit e3144183562d08e347f832f0b29daefe8bac617b
> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date:   Thu Apr 23 18:38:11 2015 +0300
>
>      Minor cleanup of check_exclusion_or_unique_constraint.
>
>      To improve readability.
>
> """
>
> At least, that's what a "git bisect" session showed.

Ok, I see now that I totally screwed up the conditions on "waitMode" in 
that commit. I just pushed a fix to your github repository.

- Heikki




Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Fri, Apr 24, 2015 at 12:31 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Ok, I see now that I totally screwed up the conditions on "waitMode" in that
> commit. I just pushed a fix to your github repository.

I can confirm that the commit you just pushed prevents the
implementation from attempting to lock an invisible tuple, where
previously it would reliably fall given the same testcase.

Thanks
-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Stephen Frost
Date:
Peter,

* Peter Geoghegan (pg@heroku.com) wrote:
> I came up with something that is along the lines of what we discussed.
> I'll wait for you to push Dean's code, and rebase on top of that
> before submitting what I came up with. Maybe this can be rolled into
> my next revision if I work through Andres' most recent feedback
> without much delay.

This is done (finally!).  Please take a look and let me know if you have
any questions or concerns.  Happy to chat again also, of course.
Thanks!
    Stephen

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Fri, Apr 24, 2015 at 5:50 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Peter Geoghegan (pg@heroku.com) wrote:
>> I came up with something that is along the lines of what we discussed.
>> I'll wait for you to push Dean's code, and rebase on top of that
>> before submitting what I came up with. Maybe this can be rolled into
>> my next revision if I work through Andres' most recent feedback
>> without much delay.
>
> This is done (finally!).  Please take a look and let me know if you have
> any questions or concerns.  Happy to chat again also, of course.

Great, thanks.

I didn't actually wait for you (as earlier indicated) before posting
the new approach to RLS in V3.4. However, I have some decent tests for
the new behaviors (I did test-driven development), so I think that
when I rebase on top of the new RLS stuff tomorrow, I'll find that it
won't be too difficult.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Stephen Frost
Date:
* Peter Geoghegan (pg@heroku.com) wrote:
> On Fri, Apr 24, 2015 at 5:50 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Peter Geoghegan (pg@heroku.com) wrote:
> >> I came up with something that is along the lines of what we discussed.
> >> I'll wait for you to push Dean's code, and rebase on top of that
> >> before submitting what I came up with. Maybe this can be rolled into
> >> my next revision if I work through Andres' most recent feedback
> >> without much delay.
> >
> > This is done (finally!).  Please take a look and let me know if you have
> > any questions or concerns.  Happy to chat again also, of course.
>
> Great, thanks.
>
> I didn't actually wait for you (as earlier indicated) before posting
> the new approach to RLS in V3.4. However, I have some decent tests for
> the new behaviors (I did test-driven development), so I think that
> when I rebase on top of the new RLS stuff tomorrow, I'll find that it
> won't be too difficult.

Fantastic, glad to hear it!
Thanks!
    Stephen

Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From
Peter Geoghegan
Date:
On Thu, Apr 23, 2015 at 6:07 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Curious about what you think about my proposal to go back to putting
> the inference specification WHERE clause within the parenthesis, since
> this solves several problems, including clarifying to users that the
> predicate is part of the inference clause.

I've *provisionally* pushed code that goes back to the old way,
Andres: https://github.com/petergeoghegan/postgres/commit/2a5d80b27d2c5832ad26dde4651c64dd2004f401

Perhaps this is the least worst way, after all.
-- 
Peter Geoghegan