On Thu, Oct 10, 2019 at 12:48 PM Daniel Wood <hexexpert@comcast.net> wrote:
> Update query stuck in a loop. Looping in _bt_moveright().
You didn't say which PostgreSQL versions were involved, and if the
database was ever upgraded using pg_upgrade. Those details could
matter.
> ExecInsertIndexTuples->btinsert->_bt_doinsert->_bt_search->_bt_moveright
>
> Mid Tree Node downlink path taken by _bt_search points to a BTP_DELETED Leaf.
This should hardly ever happen -- it is barely possible for an index
scan to land on a BTP_DELETED leaf page (or a half-dead page) when
following a downlink in its parent. Recall that nbtree uses Lehman &
Yao's design, so _bt_search() does not "couple" buffer locks on the
way down. It would probably be impossible to observe this happening
without carefully setting breakpoints in multiple sessions.
If this happens reliably for you, which it sounds like, then you can
already assume that the index is corrupt.
> btpo_next is also DELETED but not in the tree.
>
> btpo_next->btpo_next is NOT deleted but in the mid tree as a lesser key value.
>
> Thus creating an endless loop in moveright.
Offhand, these other details sound normal. The side links are still
needed in fully deleted (BTP_DELETED) pages. And, moving right and
finding lesser key values (not greater key values) is normal with
deleted pages, since page deletion makes the keyspace move right, not
left (moving the keyspace left is how the source Lanin & Shasha paper
does it, though).
Actually, I take it back -- the looping part is not normal. The
btpo_next->btpo_next page has no business linking back to the
original/first deleted page you mentioned. That's just odd.
Can you provide me with a dump of the page images? The easiest way of
getting a page dump is described here:
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump
If I had to guess, I'd guess that this was due to a generic storage problem.
--
Peter Geoghegan