Re: Strangeness with UNIQUE indexes and UTF-8 - Mailing list pgsql-hackers
From | Omar Kilani |
---|---|
Subject | Re: Strangeness with UNIQUE indexes and UTF-8 |
Date | |
Msg-id | CA+8F9hjw6y_tzjFWFjCy228dC4HyQRDArhB1d86LZhZGY=Z8ig@mail.gmail.com Whole thread Raw |
In response to | Re: Strangeness with UNIQUE indexes and UTF-8 (Omar Kilani <omar.kilani@gmail.com>) |
Responses |
Re: Strangeness with UNIQUE indexes and UTF-8
|
List | pgsql-hackers |
I was able to restore a snapshot where the database was fully consistent. 2021-06-06 14:52:34.748 UTC [0/48529] LOG: database system was interrupted while in recovery at log time 2021-06-06 06:57:27 UTC 2021-06-06 14:52:34.748 UTC [0/48529] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2021-06-06 14:52:40.847 UTC [0/48529] LOG: database system was not properly shut down; automatic recovery in progress 2021-06-06 14:52:40.856 UTC [0/48529] LOG: invalid record length at 8849/32000098: wanted 24, got 0 2021-06-06 14:52:40.856 UTC [0/48529] LOG: redo is not required 2021-06-06 14:52:40.865 UTC [0/48529] LOG: checkpoint starting: end-of-recovery immediate 2021-06-06 14:52:40.909 UTC [0/48529] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.006 s, sync=0.001 s, total=0.050 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB 2021-06-06 14:52:40.964 UTC [0/48527] LOG: database system is ready to accept connections I'm running pg_verify_checksums on the cluster, but the database is many TB so it'll be a bit. I missed one of your questions before -- no, it wasn't created with CREATE INDEX CONCURRENTLY. That index was created by 11.2's pg_restore roughly 2 years ago. On Sun, Jun 6, 2021 at 6:53 AM Omar Kilani <omar.kilani@gmail.com> wrote: > > I just remembered, I have… many… snapshots of the on disk data prior to starting 11.12. > > It should be possible to start at a specific point in time with the index in the state it was in prior to the insert. > > How do I prove or disprove… hardware issues? > > Also… I ran the select on 3 of our standby servers and they all had the same issue. Presumably the index would be “corrupt”in the same way across multiple very different machines from WAL apply? > > On Sun, Jun 6, 2021 at 6:41 AM Omar Kilani <omar.kilani@gmail.com> wrote: >> >> Hey David, >> >> Hmmm… it wasn’t init on 11.x. >> >> This is a very old database (2004) that has moved forward via pg_upgrade. I think we did a pg_dump and pg_restore everytime we hit some major incompatibility like float vs integer date times. >> >> The current DB started as a pg_restore into 10.x. Then was pg_upgrade’d to 11.2. Has been minor upgraded a bunch of timessince and we upgraded to 11.12… just before this happened. >> >> As in, we just restarted our cluster on 11.12. Everything was working fine (and the index was working) and then the INSERThappened. >> >> I have checksums on and I did a VACUUM on the table just before the REINDEX. >> >> I’m 99.99999% confident the hardware isn’t bad. >> >> The only time we’ve seen this is with Unicode input. >> >> Regards, >> Omar >> >> On Sun, Jun 6, 2021 at 4:59 AM David Rowley <dgrowleyml@gmail.com> wrote: >>> >>> On Sun, 6 Jun 2021 at 22:55, Omar Kilani <omar.kilani@gmail.com> wrote: >>> > There seems to be a weird bug in Postgres (last tested 11.12) where it >>> > allows an INSERT into a table with a UNIQUE / UNIQUE CONSTRAINT index >>> > on a TEXT/VARCHAR when there's already a value present in that index, >>> > but only for UTF-8 input. >>> >>> It would be good to know a bit about the history of this instance. >>> Was the initdb done on 11.12? Or some other 11.x version? Or was this >>> instance pg_upgraded from some previous major version? >>> >>> There was a bug fixed in 11.11 that caused CREATE INDEX CONCURRENTLY >>> possibly to miss rows that were inserted by a prepared transaction. >>> Was this index created with CREATE INDEX CONCURRENTLY? >>> >>> > What I sort of don't get is... before we insert anything into these >>> > tables, we always check to see if a value already exists. And Postgres >>> > must be returning no results for some reason. So it goes to insert a >>> > duplicate value which somehow succeeds despite the unique index, but >>> > then a reindex says it's a duplicate. Pretty weird. >>> >>> That does not seem that weird to me. If the index is corrupt and >>> fails to find the record you're searching for using a scan of that >>> index, then it seems pretty likely that the record would also not be >>> found in the index when doing the INSERT. >>> >>> The reindex will catch the problem because it uses the heap as the >>> source of truth to build the new index. It simply sounds like there >>> are two records in the heap because a subsequent one was added and a >>> corrupt index didn't find the original record either because it was >>> either missing from the index or because the index was corrupt in some >>> way that the record was just not found. >>> >>> David
pgsql-hackers by date: