Re: Strangeness with UNIQUE indexes and UTF-8 - Mailing list pgsql-hackers

From Omar Kilani
Subject Re: Strangeness with UNIQUE indexes and UTF-8
Date
Msg-id CA+8F9hjw6y_tzjFWFjCy228dC4HyQRDArhB1d86LZhZGY=Z8ig@mail.gmail.com
Whole thread Raw
In response to Re: Strangeness with UNIQUE indexes and UTF-8  (Omar Kilani <omar.kilani@gmail.com>)
Responses Re: Strangeness with UNIQUE indexes and UTF-8
List pgsql-hackers
I was able to restore a snapshot where the database was fully consistent.

2021-06-06 14:52:34.748 UTC [0/48529] LOG:  database system was
interrupted while in recovery at log time 2021-06-06 06:57:27 UTC
2021-06-06 14:52:34.748 UTC [0/48529] HINT:  If this has occurred more
than once some data might be corrupted and you might need to choose an
earlier recovery target.
2021-06-06 14:52:40.847 UTC [0/48529] LOG:  database system was not
properly shut down; automatic recovery in progress
2021-06-06 14:52:40.856 UTC [0/48529] LOG:  invalid record length at
8849/32000098: wanted 24, got 0
2021-06-06 14:52:40.856 UTC [0/48529] LOG:  redo is not required
2021-06-06 14:52:40.865 UTC [0/48529] LOG:  checkpoint starting:
end-of-recovery immediate
2021-06-06 14:52:40.909 UTC [0/48529] LOG:  checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.006 s, sync=0.001 s, total=0.050 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB
2021-06-06 14:52:40.964 UTC [0/48527] LOG:  database system is ready
to accept connections

I'm running pg_verify_checksums on the cluster, but the database is
many TB so it'll be a bit.

I missed one of your questions before -- no, it wasn't created with
CREATE INDEX CONCURRENTLY. That index was created by 11.2's pg_restore
roughly 2 years ago.

On Sun, Jun 6, 2021 at 6:53 AM Omar Kilani <omar.kilani@gmail.com> wrote:
>
> I just remembered, I have… many… snapshots of the on disk data prior to starting 11.12.
>
> It should be possible to start at a specific point in time with the index in the state it was in prior to the insert.
>
> How do I prove or disprove… hardware issues?
>
> Also… I ran the select on 3 of our standby servers and they all had the same issue. Presumably the index would be
“corrupt”in the same way across multiple very different machines from WAL apply? 
>
> On Sun, Jun 6, 2021 at 6:41 AM Omar Kilani <omar.kilani@gmail.com> wrote:
>>
>> Hey David,
>>
>> Hmmm… it wasn’t init on 11.x.
>>
>> This is a very old database (2004) that has moved forward via pg_upgrade. I think we did a pg_dump and pg_restore
everytime we hit some major incompatibility like float vs integer date times. 
>>
>> The current DB started as a pg_restore into 10.x. Then was pg_upgrade’d to 11.2. Has been minor upgraded a bunch of
timessince and we upgraded to 11.12… just before this happened. 
>>
>> As in, we just restarted our cluster on 11.12. Everything was working fine (and the index was working) and then the
INSERThappened. 
>>
>> I have checksums on and I did a VACUUM on the table just before the REINDEX.
>>
>> I’m 99.99999% confident the hardware isn’t bad.
>>
>> The only time we’ve seen this is with Unicode input.
>>
>> Regards,
>> Omar
>>
>> On Sun, Jun 6, 2021 at 4:59 AM David Rowley <dgrowleyml@gmail.com> wrote:
>>>
>>> On Sun, 6 Jun 2021 at 22:55, Omar Kilani <omar.kilani@gmail.com> wrote:
>>> > There seems to be a weird bug in Postgres (last tested 11.12) where it
>>> > allows an INSERT into a table with a UNIQUE / UNIQUE CONSTRAINT index
>>> > on a TEXT/VARCHAR when there's already a value present in that index,
>>> > but only for UTF-8 input.
>>>
>>> It would be good to know a bit about the history of this instance.
>>> Was the initdb done on 11.12? Or some other 11.x version?  Or was this
>>> instance pg_upgraded from some previous major version?
>>>
>>> There was a bug fixed in 11.11 that caused CREATE INDEX CONCURRENTLY
>>> possibly to miss rows that were inserted by a prepared transaction.
>>> Was this index created with CREATE INDEX CONCURRENTLY?
>>>
>>> > What I sort of don't get is... before we insert anything into these
>>> > tables, we always check to see if a value already exists. And Postgres
>>> > must be returning no results for some reason. So it goes to insert a
>>> > duplicate value which somehow succeeds despite the unique index, but
>>> > then a reindex says it's a duplicate. Pretty weird.
>>>
>>> That does not seem that weird to me.  If the index is corrupt and
>>> fails to find the record you're searching for using a scan of that
>>> index, then it seems pretty likely that the record would also not be
>>> found in the index when doing the INSERT.
>>>
>>> The reindex will catch the problem because it uses the heap as the
>>> source of truth to build the new index.  It simply sounds like there
>>> are two records in the heap because a subsequent one was added and a
>>> corrupt index didn't find the original record either because it was
>>> either missing from the index or because the index was corrupt in some
>>> way that the record was just not found.
>>>
>>> David



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Strangeness with UNIQUE indexes and UTF-8
Next
From: Tom Lane
Date:
Subject: Re: Misplaced superuser check in pg_log_backend_memory_contexts()