The following bug has been logged on the website:
Bug reference: 17245
Logged by: Andrew Gierth
Email address: andrew@tao11.riddles.org.uk
PostgreSQL version: 14.0
Operating system: FreeBSD 13.0-RELEASE
Description:
From a report from IRC, from a user running a mediawiki instance on PG:
select ctid, page_title from "page" WHERE page_title = 'Isokaze' and
page_namespace = 0;
returned two rows rather than the expected 1:
(181,53) | Ying_Swei
(722,3) | Isokaze
Disabling index scans restored the expected output of 1 row.
Inspecting the index revealed this:
000114a0: 2f47 616c 6c65 7279 0000 1000 0320 2860 /Gallery..... (`
000114b0: 1149 736f 6b61 7a65 0000 7600 1800 0000 .Isokaze..v.....
000114c0: b500 0900 0000 d202 3800 0000 0000 0000 ........8.......
000114d0: 0000 c501 2d00 1840 1749 7365 2f51 756f ....-..@.Ise/Quo
which appears to me to be a deduplicated index entry pointing to heap rows
(181,9) and (722,56). Inspecting the heap showed that both of those ctids
are just pointer entries, to (181,53) and (722,3) respectively, which
explains the 2 row result. What I don't know is how this could have happened
in the first place.
The definition of the affected index is:
CREATE INDEX page_main_title ON mediawiki.page USING btree (page_title
text_pattern_ops) WHERE (page_namespace = 0);
The real entries for 'Ying_Swei' in the index are not in the same page or
even any nearby page, and it seems unlikely that the page_title was
updated.
The original reporter (who I will CC on a followup message) still has a
snapshot of the corrupt data. (A REINDEX naturally fixed the data).