Orphan page in _bt_split - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Orphan page in _bt_split |
Date | |
Msg-id | 566dacaf-5751-47e4-abc6-73de17a5d42a@garret.ru Whole thread Raw |
Responses |
Re: Orphan page in _bt_split
Re: Orphan page in _bt_split |
List | pgsql-hackers |
Hi hacker, The function _bt_split creates new (right) page to split into. The comment says: /* * Acquire a new right page to split into, now that left page has a new * high key. From here on, it's not okay to throw an error without * zeroing rightpage first. This coding rule ensures that we won't * confuse future VACUUM operations, which might otherwise try to re-find * a downlink to a leftover junk page as the page undergoes deletion. * * It would be reasonable to start the critical section just after the new * rightpage buffer is acquired instead; that would allow us to avoid * leftover junk pages without bothering to zero rightpage. We do it this * way because it avoids an unnecessary PANIC when either origpage or its * existing sibling page are corrupt. */ So we should zero this page in case of reporting error to "not confuse vacuum. It is done for all places where this function reports error before entering critical section and wal-logging this page. But before it there is call to _bt_getbuf: /* * We have to grab the original right sibling (if any) and update its prev * link. We are guaranteed that this is deadlock-free, since we couple * the locks in the standard order: left to right. */ if (!isrightmost) { sbuf = _bt_getbuf(rel, oopaque->btpo_next, BT_WRITE); _bp_getbuf just reads page in a shared buffer. But ReadBuffer actually can invoke half of Postgres code (locating buffer, evicting victim, writing to SMGR,..., So there are a lot of places where error can be thrown (including even such error as statement timeout). And in this case current transaction will be aborted without zeroing the right page. So vacuum will be confused. It can be quite easily demonstrated by inserting error here: /* * We have to grab the original right sibling (if any) and update its prev * link. We are guaranteed that this is deadlock-free, since we couple * the locks in the standard order: left to right. */ if (!isrightmost) { sbuf = _bt_getbuf(rel, oopaque->btpo_next, BT_WRITE); elog(ERROR, "@@@ Terminated"); And doing something like this: postgres=# create table t3(pk integer primary key, sk integer, payload integer); CREATE TABLE postgres=# create index on t3(sk); CREATE INDEX postgres=# insert into t3 values (generate_series(1,1000000),random()*1000000,0); ERROR: @@@ Terminated postgres=# vacuum t3; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. log: 2025-09-01 07:46:37.459 EEST [89290] ERROR: @@@ Terminated 2025-09-01 07:46:37.459 EEST [89290] STATEMENT: insert into t3 values (generate_series(1,1000000),random()*1000000,0); 2025-09-01 07:46:42.391 EEST [89406] LOG: failed to re-find parent key in index "t3_sk_idx" for deletion target page 4 2025-09-01 07:46:42.391 EEST [89406] CONTEXT: while vacuuming index "t3_sk_idx" of relation "public.t3" TRAP: failed Assert("false"), File: "nbtpage.c", Line: 2849, PID: 89406 0 postgres 0x0000000104bf2f64 ExceptionalCondition + 216 1 postgres 0x00000001043ace94 _bt_lock_subtree_parent + 248 2 postgres 0x00000001043aaf98 _bt_mark_page_halfdead + 508 3 postgres 0x00000001043aaaec _bt_pagedel + 1068 4 postgres 0x00000001043b56cc btvacuumpage + 2116 5 postgres 0x00000001043b4dd4 btvacuumscan + 564 This code is not changed for quite long time so I wonder why nobody noticed this error before? Proposed patch is attached. It creates copy of right page in the same way as for left (original) page. And updates the page only in critical section.
Attachment
pgsql-hackers by date: