Violation of a unique key? - Mailing list pgsql-bugs

From Vladimir Ryabtsev
Subject Violation of a unique key?
Date
Msg-id CAMqTPqk=qQyrPGURHJ3X=iYPhTRk2LpPE__Kg1b+CMUo__uHrw@mail.gmail.com
Whole thread Raw
List pgsql-bugs
Sorry, fixing title...

Hi,

Somehow I got duplicate values in a column covered by a unique key.

CREATE TABLE myschema.vocabulary
(
phrase_id integer NOT NULL DEFAULT
nextval('onto_oper.vocabulary_phraseid_seq'::regclass),
phrase text COLLATE pg_catalog."default" NOT NULL,
is_to_sync boolean NOT NULL DEFAULT true,
last_modified timestamp without time zone NOT NULL DEFAULT
timezone('utc'::text, now()),
CONSTRAINT pk_vocabulary PRIMARY KEY (phrase_id),
CONSTRAINT vocabulary_phrase_key UNIQUE (phrase)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;

Today I noticed a strange behaviour in a regularly executed routine. The
problem seemed to be absolutely irrelevant:

ERROR: duplicate key value violates unique constraint
"vocabulary_phrase_key"
DETAIL: Key (phrase)=(Ã…sa) already exists.
CONTEXT: SQL statement "update myschema.vocabulary set is_to_sync = false
where is_to_sync = true"

It was unclear how an update of an independent column could cause such an
error, but I started searching duplicate entries. GROUP BY on phrase column
did not give any results HAVING count(*) > 1. Then I copied the data into
another table, without indices. This gave the answer there is a duplicated
value 'Åsa' (I guess the same could be done by switching off index scan).

From the copy I found the IDs of problem entries: 5850963 and 29883128.
Some funny queries:

select
(select phrase from myschema.vocabulary where phrase_id = 5850963) =
(select phrase from myschema.vocabulary where phrase_id = 29883128)

Gives 'true'.

select phrase_id from myschema.vocabulary
where phrase = (select phrase from myschema.vocabulary where phrase_id =
5850963)

Gives 29883128.

Is it some misconfiguration problem in my database? Since I see some
non-ASCII Unicode characters, I suppose it might be encoding conversion
problem. I inspected both values to the best of my abilities and found them
equal byte-to byte.

My version() is PostgreSQL 10.10 (Ubuntu 10.10-1.pgdg18.04+1) on
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0,
64-bit

With best wishes,
Vladimir

pgsql-bugs by date:

Previous
From: Vladimir Ryabtsev
Date:
Subject:
Next
From: Peter Geoghegan
Date:
Subject: Re: