Re: BUG #15952: UNIQUE CONSTRAINT does not fulfill its' purpose - Mailing list pgsql-bugs
From | Thomas Munro |
---|---|
Subject | Re: BUG #15952: UNIQUE CONSTRAINT does not fulfill its' purpose |
Date | |
Msg-id | CA+hUKG+8sChHBccNdud=z2rCaWzojz5dMBkpHQhKVmvPfuBNcg@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #15952: UNIQUE CONSTRAINT does not fulfill its' purpose (Thomas Munro <thomas.munro@gmail.com>) |
List | pgsql-bugs |
On Tue, Aug 13, 2019 at 10:25 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Tue, Aug 13, 2019 at 8:23 AM PG Bug reporting form > <noreply@postgresql.org> wrote: > > "cimek_irsz_telepules_telepulesresz_kozternev_kozterjelleg_h_key" UNIQUE > > CONSTRAINT, btree (irsz, telepules, telepulesresz, kozternev, kozterjelleg, > > hazszam, hazszambetujel, hazszamvege, lepcsohaz, em, ajto, ajtobetujel) > How old is the duplicate data? I guess you are using the Hungarian > locale as the default collation for your database (see "Collate" in > the output of \l, or check for explicit collations on the relevant > columns). I think that is one of the collations that has moved around > a bit in recent years in glibc. For example: > > https://sourceware.org/bugzilla/show_bug.cgi?id=13547 ... and to give a little context about why that is relevant, please see these links: https://lists.debian.org/debian-glibc/2019/03/msg00030.html https://postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html They're discussing a big change that affects almost everyone upgrading their glibc (and similar things happen with non-GNU OSes' libc too), but in the past there have been upgrades that affected specific individual locales. German and Hungarian are known examples from recent memory, like the commits referenced in that Bugzilla ticket; you could probably figure out which glibc versions those landed in and see if that matches your Ubuntu upgrade history. For example, you can see that they changed their minds about the sort order of "ssz" vs "szsz" (you can probably see that by piping a file containing "kasza" and "kassza" through the sort command on an old enough and new enough Ubuntu release with LANG=hu_HU.UTF-8), and the hypothesis is that some rule change like that caused us to take a wrong turn while descending a btree and then conclude that your duplicated street name wasn't already in the index, when in fact it was. Oops. As Christoph mentioned on the Debian glibc mailing list, we'd like to be able to be able to handle this sort of thing better; we're entirely dependent on the OS vendor maintaining a stable sort order so that we can keep our indexes in the right order, but POSIX offers us no way to know when they've changed (a problem I'm contemplating raising with the Austin Group/POSIX maintainers). One option is to use ICU collations, because we have some limited ability to track when definitions changed, potentially invalidating an index, but we haven't finished adding ICU support yet: for now you can't use an ICU collation as a database default. You can still use it explicitly though. It's an unfortunate problem; we looked into whether we could query the glibc version, but it turned out that some of the distributions back-patch the locale changes without changing the glibc version. About the only easy way you can detect this problem is to run the 'amcheck' index checker, or just assume the worst and REINDEX, after an OS upgrade. -- Thomas Munro https://enterprisedb.com
pgsql-bugs by date: