Thread: BUG #16714: INSERT ON CONFLICT DO UPDATE fails to infer constraint if it's not at top-level partition

The following bug has been logged on the website:

Bug reference:      16714
Logged by:          Andy S
Email address:      gatekeeper.mail@gmail.com
PostgreSQL version: 11.2
Operating system:   Gentoo Linux
Description:

create table tbl (
    part_key1 int,
    part_key2 int,
    part_key3 int,
    part_key4 int,
    values_columns varchar
) partition by list(part_key1);

create table tbl_p1 partition of tbl for values in (1) partition by
list(part_key2);
create table tbl_p1_p2 partition of tbl_p1 for values in (1) partition by
range (part_key3);
create table tbl_p1_p2_p3 partition of tbl_p1_p2  for values from (0) to
(100) partition by HASH(part_key4);

create table tbl_p1_p2_p3_p4_0 partition of tbl_p1_p2_p3 (constraint
tbl_p1_p2_p3_0_pkey primary key (part_key4)) for values with (modulus 4,
remainder 0);
create table tbl_p1_p2_p3_p4_1 partition of tbl_p1_p2_p3 (constraint
tbl_p1_p2_p3_1_pkey primary key (part_key4)) for values with (modulus 4,
remainder 1);
create table tbl_p1_p2_p3_p4_2 partition of tbl_p1_p2_p3 (constraint
tbl_p1_p2_p3_2_pkey primary key (part_key4)) for values with (modulus 4,
remainder 2);
create table tbl_p1_p2_p3_p4_3 partition of tbl_p1_p2_p3 (constraint
tbl_p1_p2_p3_3_pkey primary key (part_key4)) for values with (modulus 4,
remainder 3);

insert into tbl values (1, 1, 1, 1, 'a');

insert into tbl values (1, 1, 1, 1, 'b');
-- ERROR:  duplicate key value violates unique constraint
"tbl_p1_p2_p3_0_pkey"
-- DETAIL:  Key (part_key4)=(1) already exists.

insert into tbl values (1, 1, 1, 1, 'b') on conflict (part_key4) do update
set values_columns = excluded.values_columns;
-- ERROR:  there is no unique or exclusion constraint matching the ON
CONFLICT specification


PG Bug reporting form <noreply@postgresql.org> writes:
> insert into tbl values (1, 1, 1, 1, 'b') on conflict (part_key4) do update
> set values_columns = excluded.values_columns;
> -- ERROR:  there is no unique or exclusion constraint matching the ON
> CONFLICT specification

I see no bug here.  The partitioned table indeed does not have any
such index.  Moreover, if you had tried to make one, you would have
gotten

ERROR:  unique constraint on partitioned table must include all partitioning columns
DETAIL:  PRIMARY KEY constraint on table "tbl" lacks column "part_key1" which is part of the partition key.

The short answer here is that uniqueness constraints on the individual
partitions are not a substitute for a constraint on the whole partitioned
table.

            regards, tom lane



That's then either a docs bug. The documentation states:
For each individual row proposed for insertion, either the insertion proceeds, or, if an arbiter constraint or index specified by conflict_target is violated, the alternative conflict_action is taken

The constraint raised, it's specification equals and matches. It is ignored and raised exception is passed through. No, it's a bug;

Nevertheless, suppose I have a top-level partition bound primary key which covers all 4 columns and then my INSERT has all 4 of them in ON CONFLICT specification. This works as intended but: the exact index raised was not the one inferred since the top-level partition itself is not a default partition hence always empty hence it's index is always empty; an empty index cannot raise uniqueness violation exceptions then the one raised must be the leaf partition's (whichever it is). How could it not be a bug? Also: the certain partition to which data is to be inserted is computed at query planning stage the very stage where the planner could also find out if an index matching the given specification could be inferred since it's the only index that matters.

Also:
> The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory.
That is exactly what this partitioning scheme is made for. Yet ridiculous constraint matching punishes for benefiting from partitioning.
> Updating the partition key of a row might cause it to be moved into a different partition where this row satisfies the partition bounds.
> BEFORE ROW triggers, if necessary, must be defined on individual partitions, not the partitioned table.
What else could witness even better the top level table constraint has no sense since it is never even examined, but instead the leaf partition exact computed definition matters.


On Fri, Nov 13, 2020 at 7:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> insert into tbl values (1, 1, 1, 1, 'b') on conflict (part_key4) do update
> set values_columns = excluded.values_columns;
> -- ERROR:  there is no unique or exclusion constraint matching the ON
> CONFLICT specification

I see no bug here.  The partitioned table indeed does not have any
such index.  Moreover, if you had tried to make one, you would have
gotten

ERROR:  unique constraint on partitioned table must include all partitioning columns
DETAIL:  PRIMARY KEY constraint on table "tbl" lacks column "part_key1" which is part of the partition key.

The short answer here is that uniqueness constraints on the individual
partitions are not a substitute for a constraint on the whole partitioned
table.

                        regards, tom lane
Andy S <gatekeeper.mail@gmail.com> writes:
> That's then either a docs bug. The documentation states:
>> For each individual row proposed for insertion, either the insertion
>> proceeds, or, if an *arbiter* constraint or index specified by
>> *conflict_target* is violated, the alternative *conflict_action* is taken

Yeah, but you did not specify a valid arbiter constraint.  ON CONFLICT is
not a get-out-of-jail-free card; it will not trap any error whatsoever,
only a detected unique-constraint violation.

What you probably want here is to declare (part_key1, part_key4)
as the primary key of the partitioned table, and use that in the
ON CONFLICT spec.  That will save you from having to make constraints
on the individual partitions, too.

            regards, tom lane



Again that's not true:

insert into tbl values (1, 1, 1, 1, 'b');
-- ERROR:  duplicate key value violates unique constraint
"tbl_p1_p2_p3_0_pkey"
-- DETAIL:  Key (part_key4)=(1) already exists.

The exact primary key violation triggers, this exact primary key spec is then defined to no success. Yes, I insist this arbiter constraint spec equals to the one in error message. In my actual DDL primary key is defined at the level prior to the leaf partition set (so I don't mess with the PK each leaf partition), though the exact PK on columns 3 and 4 is a very common selection criteria within the application so it helps. Adding other columns to PK would add no selectivity (though could poison the index) and rather wastes space within each index instance since a single index instance uniqueness of columns 1 and 2 values is guaranteed by DDL.

Requiring a user to define a wide value range index when it does not help to guarantee uniqueness and/or querying at least violates the description of partitioning I previously cited.

On Fri, Nov 13, 2020 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy S <gatekeeper.mail@gmail.com> writes:
> That's then either a docs bug. The documentation states:
>> For each individual row proposed for insertion, either the insertion
>> proceeds, or, if an *arbiter* constraint or index specified by
>> *conflict_target* is violated, the alternative *conflict_action* is taken

Yeah, but you did not specify a valid arbiter constraint.  ON CONFLICT is
not a get-out-of-jail-free card; it will not trap any error whatsoever,
only a detected unique-constraint violation.

What you probably want here is to declare (part_key1, part_key4)
as the primary key of the partitioned table, and use that in the
ON CONFLICT spec.  That will save you from having to make constraints
on the individual partitions, too.

                        regards, tom lane
Hi,

On 2020-11-13 21:30:39 +0300, Andy S wrote:
> No, it's a bug

At best it is a potentially desirable feature. One that isn't trivial to
implement. We'd have to scan the entire partition tree to ensure there's
a matching constraint on all partitions. And add some more complicated
way to do plan invalidation, because somebody could add a new partition
without a matching constraint - which'd not be detected in a trivial
implementation, because the arbiter determination happens at parse
analysis time.


> Also:
> the certain partition to which data is to be inserted is computed at query
> planning stage the very stage where the planner could also find out if an
> index matching the given specification could be inferred since it's the
> only index that matters.

That is not generally the case. You can have parametrized values that
are inserted, and you can have multi-row inserts. In both cases you
cannot make this decision at plan time.

- Andres



On Fri, Nov 13, 2020 at 11:40 AM Andy S <gatekeeper.mail@gmail.com> wrote:
That's then either a docs bug.
[...] 
Also:
> The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory.
That is exactly what this partitioning scheme is made for. Yet ridiculous constraint matching punishes for benefiting from partitioning.

Please don't top-post.  The convention here is to inline or bottom-post.

I agree that the overview statement quoted should be qualified.  The existing behavior is that the table listed in the insert is the table upon which the ON CONFLICT arbiter constraint must exist [1].  This effectively negates the benefit of leveraging the partitioning scheme as a form of uniqueness enforcement in lieu of indexed columns.  IIUC, the additional partition constraint is still useful for select queries so its creation still has merit and leverages that particular benefit of partitioning.  But today, you cannot fully benefit from that dynamic if you wish to leverage ON CONFLICT during data insertion.  An improvement there isn't something that could be back-patched.

David J.

[1] From the INSERT documentation: "All table_name unique indexes that, without regard to order, contain exactly the conflict_target-specified columns/expressions are inferred (chosen) as arbiter indexes."

On Fri, Nov 13, 2020 at 11:40 AM Andy S <gatekeeper.mail@gmail.com> wrote:
This works as intended but: the exact index raised was not the one inferred since the top-level partition itself is not a default partition hence always empty hence it's index is always empty; an empty index cannot raise uniqueness violation exceptions then the one raised must be the leaf partition's (whichever it is).

There seems to be a mix-up regarding the uniqueness restriction (a constraint) and the way it is implemented (unique indexes, wherever they may exist).  That might be partly the documentation's fault for using index in improper places but I haven't fully checked that out.  In any case, the unique constraint on the partitioned table can very well be enforced even if that named relation doesn't actually have data nor a backing unique index.  The point of the partitioned table is to abstract that away from the user.

It would be nice if the ON CONFLICT behavior could be abstracted away as well, and you are right in your expectation that it would.  But it is not a bug that our implementation is lacking in that area, though it could maybe be better documented, as evidenced by this thread.

David J.

On Sat, Nov 14, 2020 at 3:28 AM Andres Freund <andres@anarazel.de> wrote:
At best it is a potentially desirable feature. One that isn't trivial to
implement. We'd have to scan the entire partition tree to ensure there's
a matching constraint on all partitions. 
- BEFORE ROW triggers...
So, the engine does exactly what you say for each row to determine if the leaf partition has any BEFORE ROW triggers defined. Why not to use the chance to determine a certain applicable constraint set?

 
That is not generally the case. You can have parametrized values that
are inserted, and you can have multi-row inserts. In both cases you
cannot make this decision at plan time.

OK. Then I've mistaken, sorry for that. Still `uniq constraint violation` triggers even when it's not defined for the very table named in query.
 
Oh! What a gem I mined in sources:

branch: REL_11_STABLE
src/backend/executor/nodeModifyTable.c:
         * BEFORE ROW INSERT Triggers.
         *
         * Note: We fire BEFORE ROW TRIGGERS for every attempted insertion in an
         * INSERT ... ON CONFLICT statement.  We cannot check for constraint
         * violations before firing these triggers, because they can change the
         * values to insert.  Also, they can run arbitrary user-defined code with
         * side-effects that we can't cancel by just not inserting the tuple.
         */

Still not the place where ON CONFLICT could be validated on per-row basis?
Hi,

On 2020-11-19 02:49:20 +0300, Andy S wrote:
> Oh! What a gem I mined in sources:
> 
> branch: REL_11_STABLE
> src/backend/executor/nodeModifyTable.c:
>          * BEFORE ROW INSERT Triggers.
>          *
>          * Note: We fire BEFORE ROW TRIGGERS for every attempted insertion
> in an
>          * INSERT ... ON CONFLICT statement.  We cannot check for constraint
>          * violations before firing these triggers, because they can change
> the
>          * values to insert.  Also, they can run arbitrary user-defined
> code with
>          * side-effects that we can't cancel by just not inserting the
> tuple.
>          */
> 
> Still not the place where ON CONFLICT could be validated on per-row
> basis?

On conflict arbiter determination happens much earlier - and that can't,
as I have explained earlier, really be changed. We need to know which
index etc this applies to during parse analysis, not during execution
time. This happens as part of transformOnConflictClause() called from
transformInsertStmt().

Greetings,

Andres Freund