Re: BTP_DELETED leaf still in tree - Mailing list pgsql-hackers

From Daniel Wood
Subject Re: BTP_DELETED leaf still in tree
Date
Msg-id 1430933023.970983.1570751086396@connect.xfinity.com
Whole thread Raw
In response to Re: BTP_DELETED leaf still in tree  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: BTP_DELETED leaf still in tree
List pgsql-hackers
> On October 10, 2019 at 1:18 PM Peter Geoghegan <pg@bowt.ie> wrote:
> 
> 
> On Thu, Oct 10, 2019 at 12:48 PM Daniel Wood <hexexpert@comcast.net> wrote:
> > Update query stuck in a loop.  Looping in _bt_moveright().
> 
> You didn't say which PostgreSQL versions were involved, and if the
> database was ever upgraded using pg_upgrade. Those details could
> matter.

PG_VERSION says 10.  I suspect we are running 10.9.  I have no idea if pg_upgrade was ever done.

> > ExecInsertIndexTuples->btinsert->_bt_doinsert->_bt_search->_bt_moveright
> >
> > Mid Tree Node downlink path taken by _bt_search points to a BTP_DELETED Leaf.
> 
> This should hardly ever happen -- it is barely possible for an index
> scan to land on a BTP_DELETED leaf page (or a half-dead page) when
> following a downlink in its parent. Recall that nbtree uses Lehman &
> Yao's design, so _bt_search() does not "couple" buffer locks on the
> way down. It would probably be impossible to observe this happening
> without carefully setting breakpoints in multiple sessions.
> 
> If this happens reliably for you, which it sounds like, then you can
> already assume that the index is corrupt.
> 
> > btpo_next is also DELETED but not in the tree.
> >
> > btpo_next->btpo_next is NOT deleted but in the mid tree as a lesser key value.
> >
> > Thus creating an endless loop in moveright.
> 
> Offhand, these other details sound normal. The side links are still
> needed in fully deleted (BTP_DELETED) pages. And, moving right and
> finding lesser key values (not greater key values) is normal with
> deleted pages, since page deletion makes the keyspace move right, not
> left (moving the keyspace left is how the source Lanin & Shasha paper
> does it, though).
> 
> Actually, I take it back -- the looping part is not normal. The
> btpo_next->btpo_next page has no business linking back to the
> original/first deleted page you mentioned. That's just odd.

btpo_next->btpo_next does NOT link directly back to the 1st deleted page.  It simply links to some in-use page which is
50or so leaf pages back in the tree.  Eventually we do reach the two deleted pages again.  Only the first one is in the
'tree'.

> Can you provide me with a dump of the page images? The easiest way of
> getting a page dump is described here:

Customer data.  Looks like meaningless customer data (5 digit key values).  But too much paperwork. :-)

The hard part for me to understand isn't just why the DELETED leaf node is still referenced in the mid tree node.
It is that the step which sets BTP_DELETED should have also linked its leaf and right siblings together.  But this
hasn'tbeen done.
 

Could the page have already have been dirty, but because of "target != leafblkno", we didn't stamp a new LSN on it.
Couldthis allow us to write the DELETED dirty page without the XLOG_BTREE_MARK_PAGE_HALFDEAD and XLOG_BTREE_UNLINK_PAGE
beingflushed?  Of course, I don't understand the "target != leafblkno".
 

In any case, thanks.

>
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump
> 
> If I had to guess, I'd guess that this was due to a generic storage problem.
> 
> -- 
> Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BRIN index which is much faster never chosen by planner
Next
From: David Rowley
Date:
Subject: Re: BRIN index which is much faster never chosen by planner