On Thu, Mar 21, 2024 at 2:16 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> Most of these messages look similar, except last one: “cross page item order invariant violated for index”. Indeed,
indexscans were hanging in a cycle.
> I could not locate problem in WAL yet, because a lot of other stuff is going on. But I have no other ideas, but
suspectthat posting list redo is corrupting index in case of a crash.
Some of these errors seem unrelated to posting lists. For example, this one:
2024-03-01 11:54:08,162 ERROR : Corrupted index: 96066
webhooks_webhookresponse_webhook_id_db49ebcd XX002 ERROR: item order
invariant violated for index
"webhooks_webhookresponse_webhook_id_db49ebcd" DETAIL: Lower index
tid=(522,24) (points to heap tid=(73981,1)) higher index tid=(522,25)
(points to heap tid=(73981,1)) page lsn=31B/E522B640.
Notice that there are duplicate heap TIDs here, but no posting list.
This is almost certainly a symptom of heap related corruption -- often
a problem with recovery. Do the posting lists that are corrupt
(reported on elsewhere) also have duplicate TIDs?
Such problems tend to first get noticed when inserts fail with
"posting list split failed" errors -- but that's just a symptom. It
just so happens that the hardening added to places like
_bt_swap_posting() and _bt_binsrch_insert() is much more likely to
visibly break than anything else, at least in practice.
--
Peter Geoghegan