[PATCH] Improve amcheck to also check UNIQUE constraint in btree index. - Mailing list pgsql-hackers

From Pavel Borisov
Subject [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Date
Msg-id CALT9ZEHRn5xAM5boga0qnrCmPV52bScEK2QnQ1HmUZDD301JEg@mail.gmail.com
Whole thread Raw
Responses Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
List pgsql-hackers
Hi, hackers!

It seems that if btree index with a unique constraint is corrupted by duplicates, amcheck now can not catch this. Reindex becomes impossible as it throws an error but otherwise the index will let the user know that it is corrupted, and amcheck will tell that the index is clean. So I'd like to propose a short patch to improve amcheck for checking the unique constraint. It will output tid's of tuples that are duplicated in the index (i.e. more than one tid for the same index key is visible) and the user can easily investigate and delete corresponding table entries.

0001 - is the actual patch, and 
0002 - is a temporary hack for testing. It will allow inserting duplicates in a table even if an index with the exact name "idx" has a unique constraint (generally it is prohibited to insert). Then a new amcheck will tell us about these duplicates. It's pity but testing can not be done automatically, as it needs a core recompile. For testing I'd recommend a protocol similar to the following:

- Apply patch 0002
- Set autovaccum = off in postgresql.conf

create table tbl2 (a varchar(50), b varchar(150), c bytea, d varchar(50));
create unique index idx on tbl2(a,b);
insert into tbl2 select i::text::varchar, i::text::varchar, i::text::bytea, i::text::varchar from generate_series(0,3) as i, generate_series(0,10000) as x;

 
So we'll have a generous amount of duplicates in the table and index. Then:

create extension amcheck;
select bt_index_check('idx', true);

There will be output about each pair of duplicates: index tid's, position in a posting list (if the index item is deduplicated) and table tid's.

WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6) posting 218 and posting 219 (point to heap tid=(126,93) and tid=(126,94)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6) posting 219 and posting 220 (point to heap tid=(126,94) and tid=(126,95)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6) posting 220 and posting 221 (point to heap tid=(126,95) and tid=(126,96)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6) posting 221 and tid=(26,7) posting 0 (point to heap tid=(126,96) and tid=(126,97)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,7) posting 0 and posting 1 (point to heap tid=(126,97) and tid=(126,98)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,7) posting 1 and posting 2 (point to heap tid=(126,98) and tid=(126,99)) page lsn=0/1B3D420.

Things to notice in the test output:
- cross-page duplicates (when some of them on the one index page and the other are on the next). (Under patch 0002 they are marked by an additional message "INFO:  cross page equal keys" to catch them among the other)

- If we delete table entries corresponding to some duplicates in between the other duplicates (vacuum should be off), the message for this entry should disappear but the other duplicates should be detected as previous.
delete from tbl2 where ctid::text='(126,94)';
select bt_index_check('idx', true);

WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6) posting 218 and posting 220 (point to heap tid=(126,93) and tid=(126,95)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6) posting 220 and posting 221 (point to heap tid=(126,95) and tid=(126,96)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6) posting 221 and tid=(26,7) posting 0 (point to heap tid=(126,96) and tid=(126,97)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,7) posting 0 and posting 1 (point to heap tid=(126,97) and tid=(126,98)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,7) posting 1 and posting 2 (point to heap tid=(126,98) and tid=(126,99)) page lsn=0/1B3D420.

Caveat: if the first entry on the next index page has a key equal to the key on a previous page AND all heap tid's corresponding to this entry are invisible, currently cross-page check can not detect unique constraint violation between previous index page entry and 2nd, 3d and next current index page entries. In this case, there would be a message that recommends doing VACUUM to remove the invisible entries from the index and repeat the check. (Generally, it is recommended to do vacuum before the check, but for the testing purpose I'd recommend turning it off to check the detection of visible-invisible-visible duplicates scenarios)

Your feedback is very much welcome, as usual.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com
Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [POC] verifying UTF-8 using SIMD instructions
Next
From: Amit Kapila
Date:
Subject: Re: Single transaction in the tablesync worker?