Thread: BUG #15556: Duplicate key violations even when using ON CONFLICT DOUPDATE

BUG #15556: Duplicate key violations even when using ON CONFLICT DOUPDATE

From

PG Bug reporting form

Date:

17 December 2018, 09:54:05

The following bug has been logged on the website:

Bug reference:      15556
Logged by:          Feike Steenbergen
Email address:      feikesteenbergen@gmail.com
PostgreSQL version: 11.1
Operating system:   CentOS Linux release 7.5.1804 (Core)
Description:

On a table that has a primary key and multiple unique indexes, we have
recently
seen a duplicate key violation, even though we use ON CONFLICT DO UPDATE.

The values for the separate columns are dependent on each other, therefore
a
unique violation on one of the columns should also violate the other
indexes.

Below a self reproducing testcase for this behaviour.

I would have expected the statement to either insert or update, but
receiving
the duplicate key violation is not something that I expected.

Should I expect a duplicate key violation for the given table and
statements?

regards,

Feike


Reproducing testcase



#!/bin/bash
# within 1 second, I have multiple of these errors:
# ERROR:  duplicate key value violates unique constraint "bugtest_pkey"
# DETAIL:  Key (i)=(76) already exists.

PARALLEL=10
TIMEOUT=1

TMPFILE=$(mktemp)
trap 'rm -f ${TMPFILE}' EXIT QUIT INT TERM

psql -AtXq <<__EOT__
CREATE OR REPLACE FUNCTION slow_integer(i integer)
RETURNS integer LANGUAGE sql
AS 'SELECT i FROM pg_sleep(random()/1000000);'
IMMUTABLE;

DROP TABLE IF EXISTS bugtest;
CREATE TABLE bugtest(
    i int primary key,
    j int not null,
    k int not null,
    counter bigint not null default 1
);
CREATE UNIQUE INDEX ON bugtest (slow_integer(j));
CREATE UNIQUE INDEX ON bugtest (slow_integer(k));
__EOT__


cat > "${TMPFILE}" <<__EOT__
INSERT INTO bugtest(i, j, k)
SELECT i, i, i
  FROM CAST(random()*100 AS int) AS sub(i)
ON CONFLICT (slow_integer(k))
DO UPDATE
SET counter=bugtest.counter+1;

\watch 0.0000001
__EOT__

seq ${PARALLEL} | xargs --max-args=1 --max-procs=${PARALLEL} \
    timeout ${TIMEOUT} psql -AtXq -f "${TMPFILE}" --set

Re: BUG #15556: Duplicate key violations even when using ON CONFLICTDO UPDATE

From

Peter Geoghegan

Date:

17 December 2018, 19:08:43

On Mon, Dec 17, 2018 at 1:55 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> Below a self reproducing testcase for this behaviour.
>
> I would have expected the statement to either insert or update, but
> receiving
> the duplicate key violation is not something that I expected.
>
> Should I expect a duplicate key violation for the given table and
> statements?

The problem is that unique index inference isn't sophisticated enough
to recognize that the primary key ought to be inferred alongside the
two other unique indexes, which are expression indexes. This is hardly
surprising -- why would an expression index need to be created that
was exactly equivalent to the primary key? Actually, it's impossible
to make this work in principle, because nothing obligates you to
insert the same thing into column i as column j or k -- you've merely
done it that way this one time.

Inference is clever enough to not differentiate based on irrelevant
factors like column ordering among available, equivalent unique
indexes. It's already very sophisticated for a mechanism that's just
there to handle edge cases. I don't see a need for it to become more
sophisticated.

-- 
Peter Geoghegan

Re: BUG #15556: Duplicate key violations even when using ON CONFLICTDO UPDATE

From

Peter Geoghegan

Date:

17 December 2018, 19:29:14

On Mon, Dec 17, 2018 at 11:08 AM Peter Geoghegan <pg@bowt.ie> wrote:
> The problem is that unique index inference isn't sophisticated enough
> to recognize that the primary key ought to be inferred alongside the
> two other unique indexes, which are expression indexes. This is hardly
> surprising -- why would an expression index need to be created that
> was exactly equivalent to the primary key?

By the way, EXPLAIN ANALYZE INSERT ... ON CONFLICT will actually show
you which unique indexes/constraints have been inferred from the
target columns/expressions that appear in parenthesis. If two unique
indexes use different columns, or are otherwise equivalent based on
convention rather than on the semantics, then they're definitely not
going to be recognized as equivalent by the inference process.

You're not supposed to be able to UPDATE on a conflict on more than
one unique index, really. The general idea with inference is to avoid
unpleasant surprises when there are two indexes that enforce basically
the same constraint, such as when a bloated unique index is replaced
by creating a new index with CREATE UNIQUE INDEX CONCURRENTLY, before
the original is dropped. That's what I meant about this being an edge
case -- this hardly ever happens. So, yes, you can have multiple
unique indexes inferred, but it doesn't matter which one you take the
alternative UPDATE path on, because the rules of inference ensure that
it cannot matter. We can inferred multiple indexes precisely because
they'll all have the same conflicts.

I got asked about multiple inference specifications in one statement
quite a few times back when ON CONFLICT originally went in. That's not
how it's supposed to be used -- what happens when *both* constraints
are violated at once, in different ways? Just use multiple INSERT ...
ON CONFLICT statements instead.

-- 
Peter Geoghegan