Thread: Inexplicable duplicate rows with unique constraint

Inexplicable duplicate rows with unique constraint

From

Richard van der Hoff

Date:

16 January 2020, 19:50:25

I'm trying to track down the cause of some duplicate rows in a table 
which I would expect to be impossible due to a unique constraint. I'm 
hoping that somebody here will be able to suggest something I might have 
missed.

The problem relates to a bug filed against our application 
(https://github.com/matrix-org/synapse/issues/6696). At first I put this 
down to random data corruption on a single user's postgres instance, but 
I've now seen three separate reports in as many days and am wondering if 
there is more to it.

We have a table whose schema is as follows:

synapse=# \d current_state_events
Table "public.current_state_events"
    Column   | Type | Modifiers
------------+------+-----------
  event_id   | text | not null
  room_id    | text | not null
  type       | text | not null
  state_key  | text | not null
  membership | text |
Indexes:
     "current_state_events_event_id_key" UNIQUE CONSTRAINT, btree (event_id)
     "current_state_events_room_id_type_state_key_key" UNIQUE 
CONSTRAINT, btree (room_id, type, state_key)
     "current_state_events_member_index" btree (state_key) WHERE type = 
'm.room.member'::text

Despite the presence of the 
current_state_events_room_id_type_state_key_key constraint, several 
users have reported seeing errors which suggest that their tables have 
duplicate rows for the same (room_id, type, state_key) triplet and 
indeed querying confirms that to be the case:

synapse=> select count(*), room_id, type, state_key from 
current_state_events group by 2,3,4 order by count(*) DESC LIMIT 2;
  count |              room_id              |     type      |   state_key
-------+-----------------------------------+---------------+-------------------------------------
      3 | !ueLfVrSWYGMnFnoCbT:darkfasel.net | m.room.member | 
@irc_ebi_:darkfasel.net
      3 | !HwocBmCtBcHQhILtYQ:matrix.org    | m.room.member | 
@freenode_AlmarShenwan_:matrix.org
(2 rows)


Further investigation suggests that these are genuinely separate rows 
rather than duplicate entries in an index.

The index appears to consider itself valid:

synapse=> select i.* from pg_class c join pg_index i on 
i.indexrelid=c.oid  where 
relname='current_state_events_room_id_type_state_key_key';
  indexrelid | indrelid | indnatts | indisunique | indisprimary | 
indisexclusion | indimmediate | indisclustered | indisvalid | 
indcheckxmin | indisready | indislive | indisreplident | indkey | 
indcollation |    indclass    | indoption | indexprs | indpred

------------+----------+----------+-------------+--------------+----------------+--------------+----------------+------------+--------------+------------+-----------+----------------+--------+--------------+----------------+-----------+----------+---------
       17023 |    16456 |        3 | t           | f            | f 
         | t            | f              | t          | f            | t 
          | t         | f              | 2 3 4  | 100 100 100  | 3126 
3126 3126 | 0 0 0     |          |
(1 row)

So, question: what could we be doing wrong to get ourselves into this 
situation?

Some other datapoints which may be relevant:
  * this has been reported by one user on postgres 9.6.15 and one on 
10.10, though it's hard to be certain of the version that was running 
when the duplication occurred
  * the constraint is added when the table is first created (before any 
data is added)
  * At least one user reports that he has recently migrated his database 
from one server to another via a `pg_dump -C` and later piping into psql.

Re: Inexplicable duplicate rows with unique constraint

From

Adrian Klaver

Date:

16 January 2020, 20:05:06

On 1/16/20 8:50 AM, Richard van der Hoff wrote:
> I'm trying to track down the cause of some duplicate rows in a table 
> which I would expect to be impossible due to a unique constraint. I'm 
> hoping that somebody here will be able to suggest something I might have 
> missed.
> 
> The problem relates to a bug filed against our application 
> (https://github.com/matrix-org/synapse/issues/6696). At first I put this 
> down to random data corruption on a single user's postgres instance, but 
> I've now seen three separate reports in as many days and am wondering if 
> there is more to it.
> 
> We have a table whose schema is as follows:
> 
> synapse=# \d current_state_events
> Table "public.current_state_events"
>     Column   | Type | Modifiers
> ------------+------+-----------
>   event_id   | text | not null
>   room_id    | text | not null
>   type       | text | not null
>   state_key  | text | not null
>   membership | text |
> Indexes:
>      "current_state_events_event_id_key" UNIQUE CONSTRAINT, btree 
> (event_id)
>      "current_state_events_room_id_type_state_key_key" UNIQUE 
> CONSTRAINT, btree (room_id, type, state_key)
>      "current_state_events_member_index" btree (state_key) WHERE type = 
> 'm.room.member'::text
> 
> Despite the presence of the 
> current_state_events_room_id_type_state_key_key constraint, several 
> users have reported seeing errors which suggest that their tables have 
> duplicate rows for the same (room_id, type, state_key) triplet and 
> indeed querying confirms that to be the case:
> 
> synapse=> select count(*), room_id, type, state_key from 
> current_state_events group by 2,3,4 order by count(*) DESC LIMIT 2;
>   count |              room_id              |     type      |   state_key
> -------+-----------------------------------+---------------+------------------------------------- 
> 
>       3 | !ueLfVrSWYGMnFnoCbT:darkfasel.net | m.room.member | 
> @irc_ebi_:darkfasel.net
>       3 | !HwocBmCtBcHQhILtYQ:matrix.org    | m.room.member | 
> @freenode_AlmarShenwan_:matrix.org
> (2 rows)
> 

I'm assuming the above are obfuscated?

> 
> Further investigation suggests that these are genuinely separate rows 
> rather than duplicate entries in an index.

If you use length() on the values are they the same?

> 
> The index appears to consider itself valid:
> 
> synapse=> select i.* from pg_class c join pg_index i on 
> i.indexrelid=c.oid  where 
> relname='current_state_events_room_id_type_state_key_key';
>   indexrelid | indrelid | indnatts | indisunique | indisprimary | 
> indisexclusion | indimmediate | indisclustered | indisvalid | 
> indcheckxmin | indisready | indislive | indisreplident | indkey | 
> indcollation |    indclass    | indoption | indexprs | indpred
>
------------+----------+----------+-------------+--------------+----------------+--------------+----------------+------------+--------------+------------+-----------+----------------+--------+--------------+----------------+-----------+----------+---------

> 
>        17023 |    16456 |        3 | t           | f            | f 
>          | t            | f              | t          | f            | t 
>           | t         | f              | 2 3 4  | 100 100 100  | 3126 
> 3126 3126 | 0 0 0     |          |
> (1 row)
> 
> So, question: what could we be doing wrong to get ourselves into this 
> situation?
> 
> Some other datapoints which may be relevant:
>   * this has been reported by one user on postgres 9.6.15 and one on 
> 10.10, though it's hard to be certain of the version that was running 
> when the duplication occurred
>   * the constraint is added when the table is first created (before any 
> data is added)
>   * At least one user reports that he has recently migrated his database 
> from one server to another via a `pg_dump -C` and later piping into psql.
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Inexplicable duplicate rows with unique constraint

From

Tom Lane

Date:

16 January 2020, 20:08:38

Richard van der Hoff <richard@matrix.org> writes:
> I'm trying to track down the cause of some duplicate rows in a table 
> which I would expect to be impossible due to a unique constraint. I'm 
> hoping that somebody here will be able to suggest something I might have 
> missed.

Since these are text columns, one possibility you should be looking into
is that the indexes have become corrupt due to a change in the operating
system's sorting rules for the underlying locale.  I don't recall details
at the moment, but I do remember that a recent glibc update changed the
sorting rules for some popular locale settings.  If an installation had
applied such an update underneath an existing database, you'd have a
situation where existing entries in an index are not in-order according
to the new behavior of the text comparison operators, leading to havoc
because btree searching relies on the entries being correctly sorted.

Unless you happen to notice searches failing to find rows you know are
there, the first visible symptom is often appearance of "impossible"
duplicate rows, after the search to verify uniqueness of a new entry
fails to find the old entry.

>   * At least one user reports that he has recently migrated his database 
> from one server to another via a `pg_dump -C` and later piping into psql.

Dump-and-restore wouldn't cause this (and, indeed, is one way to clean up
the mess).  But this is suspicious anyway because it suggests there may
have been some general system upgrades going on in the vicinity.

Reindexing all text indexes is the recommended remediation procedure
if you suspect a locale behavior change.  There's some work afoot to
make PG notice the need for this automatically, but it's not done yet.

            regards, tom lane

Re: Inexplicable duplicate rows with unique constraint

From

"Daniel Verite"

Date:

16 January 2020, 20:11:24

    Richard van der Hoff wrote:

> So, question: what could we be doing wrong to get ourselves into this
> situation?

OS/libc upgrades without reindexing come to mind.
See https://wiki.postgresql.org/wiki/Collations

>  * At least one user reports that he has recently migrated his database
> from one server to another via a `pg_dump -C` and later piping into psql.

This kind of migration recreates indexes (since it recreates everything)
so it's not subject to the same hazard as an OS/libc upgrade without
reindexing. In fact it would detect the problem if it existed,
as the creation of the unique constraint in the new db would fail if the
data in the dump did not satisfy it.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Inexplicable duplicate rows with unique constraint

From

Magnus Hagander

Date:

16 January 2020, 20:12:30

On Thu, Jan 16, 2020 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Richard van der Hoff <richard@matrix.org> writes:
> > I'm trying to track down the cause of some duplicate rows in a table
> > which I would expect to be impossible due to a unique constraint. I'm
> > hoping that somebody here will be able to suggest something I might have
> > missed.
>
> Since these are text columns, one possibility you should be looking into
> is that the indexes have become corrupt due to a change in the operating
> system's sorting rules for the underlying locale.  I don't recall details
> at the moment, but I do remember that a recent glibc update changed the
> sorting rules for some popular locale settings.  If an installation had
> applied such an update underneath an existing database, you'd have a
> situation where existing entries in an index are not in-order according
> to the new behavior of the text comparison operators, leading to havoc
> because btree searching relies on the entries being correctly sorted.

See https://wiki.postgresql.org/wiki/Locale_data_changes for hints on
which linux distros updated when.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Inexplicable duplicate rows with unique constraint

From

Richard van der Hoff

Date:

16 January 2020, 20:24:02

On 16/01/2020 17:12, Magnus Hagander wrote:
> On Thu, Jan 16, 2020 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Richard van der Hoff <richard@matrix.org> writes:
>>> I'm trying to track down the cause of some duplicate rows in a table
>>> which I would expect to be impossible due to a unique constraint. I'm
>>> hoping that somebody here will be able to suggest something I might have
>>> missed.
>>
>> Since these are text columns, one possibility you should be looking into
>> is that the indexes have become corrupt due to a change in the operating
>> system's sorting rules for the underlying locale.  I don't recall details
>> at the moment, but I do remember that a recent glibc update changed the
>> sorting rules for some popular locale settings.  If an installation had
>> applied such an update underneath an existing database, you'd have a
>> situation where existing entries in an index are not in-order according
>> to the new behavior of the text comparison operators, leading to havoc
>> because btree searching relies on the entries being correctly sorted.
> 
> See https://wiki.postgresql.org/wiki/Locale_data_changes for hints on
> which linux distros updated when.
> 
Right, thanks to all who have suggested this.

It seems like a plausible explanation but it's worth noting that all the 
indexed data here is (despite being in text columns), plain ascii. I'm 
surprised that a change in collation rules would change the sorting of 
such strings, and hence that it could lead to this problem. Am I naive?

To answer Adrian's question: the lengths of the values in the indexed 
columns are identical between the duplicated rows.

Re: Inexplicable duplicate rows with unique constraint

From

"Daniel Verite"

Date:

16 January 2020, 20:24:48

    Richard van der Hoff wrote:

> synapse=> select count(*), room_id, type, state_key from
> current_state_events group by 2,3,4 order by count(*) DESC LIMIT 2;
>  count |              room_id              |     type      |   state_key
> -------+-----------------------------------+---------------+-------------------------------------
>      3 | !ueLfVrSWYGMnFnoCbT:darkfasel.net | m.room.member |
> @irc_ebi_:darkfasel.net
>      3 | !HwocBmCtBcHQhILtYQ:matrix.org    | m.room.member |
> @freenode_AlmarShenwan_:matrix.org

Looking at these columns which are of type text but do not
contain words of any particular language, there's probably
no point in using a linguistic-aware collation for them.

If you maintain the database schema, what you could do to avoid
the dependency on the OS collation and stay clear of the particular
upgrade difficulty of collations is to use COLLATE "C" for this kind of
field, as opposed to the default collation of the database.
As a bonus, operations with the "C" collations tend to be faster,
sometimes even much faster.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Inexplicable duplicate rows with unique constraint

From

Adrian Klaver

Date:

16 January 2020, 20:27:25

On 1/16/20 9:24 AM, Richard van der Hoff wrote:
> 
> On 16/01/2020 17:12, Magnus Hagander wrote:
>> On Thu, Jan 16, 2020 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>> Richard van der Hoff <richard@matrix.org> writes:
>>>> I'm trying to track down the cause of some duplicate rows in a table
>>>> which I would expect to be impossible due to a unique constraint. I'm
>>>> hoping that somebody here will be able to suggest something I might 
>>>> have
>>>> missed.
>>>
>>> Since these are text columns, one possibility you should be looking into
>>> is that the indexes have become corrupt due to a change in the operating
>>> system's sorting rules for the underlying locale.  I don't recall 
>>> details
>>> at the moment, but I do remember that a recent glibc update changed the
>>> sorting rules for some popular locale settings.  If an installation had
>>> applied such an update underneath an existing database, you'd have a
>>> situation where existing entries in an index are not in-order according
>>> to the new behavior of the text comparison operators, leading to havoc
>>> because btree searching relies on the entries being correctly sorted.
>>
>> See https://wiki.postgresql.org/wiki/Locale_data_changes for hints on
>> which linux distros updated when.
>>
> Right, thanks to all who have suggested this.
> 
> It seems like a plausible explanation but it's worth noting that all the 
> indexed data here is (despite being in text columns), plain ascii. I'm 
> surprised that a change in collation rules would change the sorting of 
> such strings, and hence that it could lead to this problem. Am I naive?

In psql who does:

\l the_database_name

show?

> 
> To answer Adrian's question: the lengths of the values in the indexed 
> columns are identical between the duplicated rows.
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Inexplicable duplicate rows with unique constraint

From

Richard van der Hoff

Date:

16 January 2020, 20:28:46


On 16/01/2020 17:24, Daniel Verite wrote:
> Looking at these columns which are of type text but do not
> contain words of any particular language, there's probably
> no point in using a linguistic-aware collation for them.
> 
> If you maintain the database schema, what you could do to avoid
> the dependency on the OS collation and stay clear of the particular
> upgrade difficulty of collations is to use COLLATE "C" for this kind of
> field, as opposed to the default collation of the database.
> As a bonus, operations with the "C" collations tend to be faster,
> sometimes even much faster.

Good to know, thanks Daniel!

Re: Inexplicable duplicate rows with unique constraint

From

Richard van der Hoff

Date:

16 January 2020, 20:31:38

On 16/01/2020 17:27, Adrian Klaver wrote:
> On 1/16/20 9:24 AM, Richard van der Hoff wrote:
>>
>> It seems like a plausible explanation but it's worth noting that all 
>> the indexed data here is (despite being in text columns), plain ascii. 
>> I'm surprised that a change in collation rules would change the 
>> sorting of such strings, and hence that it could lead to this problem. 
>> Am I naive?
> 
> In psql who does:
> 
> \l the_database_name
> 
> show?

synapse=> \l synapse
                                  List of databases
   Name   |  Owner   | Encoding |   Collate   |    Ctype    |   Access 
privileges
---------+----------+----------+-------------+-------------+-----------------------
  synapse | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
postgres=CTc/postgres+
          |          |          |             |             | 
=Tc/postgres         +
          |          |          |             |             | 
synapse=CTc/postgres
(1 row)

Re: Inexplicable duplicate rows with unique constraint

From

Tom Lane

Date:

16 January 2020, 20:48:06

Richard van der Hoff <richard@matrix.org> writes:
> On 16/01/2020 17:12, Magnus Hagander wrote:
>> See https://wiki.postgresql.org/wiki/Locale_data_changes for hints on
>> which linux distros updated when.

> It seems like a plausible explanation but it's worth noting that all the 
> indexed data here is (despite being in text columns), plain ascii. I'm 
> surprised that a change in collation rules would change the sorting of 
> such strings, and hence that it could lead to this problem. Am I naive?

Unfortunately, strings containing punctuation do sort differently
after these changes, even with all-ASCII data.  The example given
on that wiki page demonstrates this.

RHEL6 (old glibc):

$ ( echo "1-1"; echo "11" ) | LC_COLLATE=en_US.utf8 sort
11
1-1

Fedora 30 (new glibc):

$ ( echo "1-1"; echo "11" ) | LC_COLLATE=en_US.utf8 sort
1-1
11

I concur with Daniel's suggestion that maybe "C" locale is
the thing to use for this data.

            regards, tom lane

Re: Inexplicable duplicate rows with unique constraint

From

Richard van der Hoff

Date:

16 January 2020, 21:06:04

On 16/01/2020 17:48, Tom Lane wrote:
> Richard van der Hoff <richard@matrix.org> writes:
>> On 16/01/2020 17:12, Magnus Hagander wrote:
>>> See https://wiki.postgresql.org/wiki/Locale_data_changes for hints on
>>> which linux distros updated when.
> 
>> It seems like a plausible explanation but it's worth noting that all the
>> indexed data here is (despite being in text columns), plain ascii. I'm
>> surprised that a change in collation rules would change the sorting of
>> such strings, and hence that it could lead to this problem. Am I naive?
> 
> Unfortunately, strings containing punctuation do sort differently
> after these changes, even with all-ASCII data.  The example given
> on that wiki page demonstrates this.

Thank you Tom. I should learn to read properly.

I think this very much explains the symptoms we're seeing. Thanks very 
much for the detailed and very helpful answers!

Re: Inexplicable duplicate rows with unique constraint

From

Susan Hurst

Date:

16 January 2020, 21:12:15

That's why I created a virtual_string function to squeeze out everything 
but alpha characters and numbers 0-9 from any varchar or text columns 
that I want to use as business key columns.  For example, if I have a 
column named job_name, I will have a companion column named v_job_name.  
The v_ column is to replicate Oracle's virtual column, since postgres 
doesn't have it.  You don't put any values in the v_ column directly. I 
simply have a trigger on insert or update to put the value in the 
v_job_name column using the virtual_string(new.job_name) function.  It's 
the v_job_name column that use in my unique constraint so that I avoid 
any unexpected sorting.  Meanwhile, my job_name column is still human 
readable with whatever characters I want to see, including diacritics.

Here is my function, if you want to try it out:

create or replace function store.virtual_string(string_in text)
returns text as
$body$
    declare
        l_return text;
    begin
        l_return := regexp_replace 
(lower(unaccent(string_in)),'[^0-9a-z]','','g');
        return l_return;
    end;
$body$
   language plpgsql volatile security definer
;

Sue

---
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Susan E Hurst
Principal Consultant
Brookhurst Data LLC
Email: susan.hurst@brookhurstdata.com
Mobile: 314-486-3261

On 2020-01-16 11:48, Tom Lane wrote:
> Richard van der Hoff <richard@matrix.org> writes:
>> On 16/01/2020 17:12, Magnus Hagander wrote:
>>> See https://wiki.postgresql.org/wiki/Locale_data_changes for hints on
>>> which linux distros updated when.
> 
>> It seems like a plausible explanation but it's worth noting that all 
>> the
>> indexed data here is (despite being in text columns), plain ascii. I'm
>> surprised that a change in collation rules would change the sorting of
>> such strings, and hence that it could lead to this problem. Am I 
>> naive?
> 
> Unfortunately, strings containing punctuation do sort differently
> after these changes, even with all-ASCII data.  The example given
> on that wiki page demonstrates this.
> 
> RHEL6 (old glibc):
> 
> $ ( echo "1-1"; echo "11" ) | LC_COLLATE=en_US.utf8 sort
> 11
> 1-1
> 
> Fedora 30 (new glibc):
> 
> $ ( echo "1-1"; echo "11" ) | LC_COLLATE=en_US.utf8 sort
> 1-1
> 11
> 
> I concur with Daniel's suggestion that maybe "C" locale is
> the thing to use for this data.
> 
>             regards, tom lane

Re: Inexplicable duplicate rows with unique constraint

From

Laurenz Albe

Date:

16 January 2020, 21:51:24

On Thu, 2020-01-16 at 16:50 +0000, Richard van der Hoff wrote:
> I'm trying to track down the cause of some duplicate rows in a table 
> which I would expect to be impossible due to a unique constraint. I'm 
> hoping that somebody here will be able to suggest something I might have 
> missed.
> 
> The problem relates to a bug filed against our application 
> (https://github.com/matrix-org/synapse/issues/6696). At first I put this 
> down to random data corruption on a single user's postgres instance, but 
> I've now seen three separate reports in as many days and am wondering if 
> there is more to it.

[...]

> So, question: what could we be doing wrong to get ourselves into this 
> situation?
> 
> Some other datapoints which may be relevant:
>   * this has been reported by one user on postgres 9.6.15 and one on 
> 10.10, though it's hard to be certain of the version that was running 
> when the duplication occurred
>   * the constraint is added when the table is first created (before any 
> data is added)
>   * At least one user reports that he has recently migrated his database 
> from one server to another via a `pg_dump -C` and later piping into psql.

I see no hint that this may be the problem, but I have seen corruption
like this because of changes in the collations of the C library (which
PostgreSQL uses).

This only happens with collations other than C, and it cannot be caused
by dump/restore.

It may, however, be caused by the following:

- Upgrading the operating system where PostgreSQL is running to a
  different glibc.

- Streaming replication between machines with different glibc version,
  and failing over to the standby.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com